[CSEE Talk] PhD defense: Varish Mulwad -- Inferring the Semantics of Tables

Tim Finin finin at cs.umbc.edu
Mon Dec 29 16:56:23 EST 2014


                         Dissertation Defense

              TABEL -- A Domain Independent and Extensible
             Framework for Inferring the Semantics of Tables

                       Varish Vyankatesh Mulwad

             8:00am Thursday, 8 January 2015, ITE325b, UMBC

Tables are an integral part of documents, reports and Web pages in
many scientific and technical domains, compactly encoding important
information that can be difficult to express in text. Table-like
structures outside documents, such as spreadsheets, CSV files, log
files and databases, are widely used to represent and share
information. However, tables remain beyond the scope of regular text
processing systems which often treat them like free text.

This dissertation presents TABEL -- a domain independent and
extensible framework to infer the semantics of tables and represent
them as RDF Linked Data. TABEL captures the intended meaning of a
table by mapping header cells to classes, data cell values to existing
entities and pair of columns to relations from an given ontology and
knowledge base. The core of the framework consists of a module that
represents a table as a graphical model to jointly infer the semantics
of headers, data cells and relation between headers. We also introduce
a novel Semantic Message Passing scheme, which incorporates semantics
into message passing, to perform joint inference over the
probabilistic graphical model. We also develop and explore a
"human-in-the-loop" paradigm, presenting plausible models of user
interaction with our framework and its impact on the quality of
inferred semantics.

We present techniques that are both extensible and domain
agnostic. Our framework supports the addition of preprocessing modules
without affecting existing ones, making TABEL extensible. It also
allows background knowledge bases to be adapted and changed based on
the domains of the tables, thus making it domain independent. We
demonstrate the extensibility and domain independence of our
techniques by developing an application of TABEL in the healthcare
domain. We develop a proof of concept for an application to generate
meta-analysis reports automatically, which is built on top of the
semantics inferred from tables found in medical literature.

A thorough evaluation with experiments over dataset of tables from the
Web and medical research reports presents promising results.

Committee: Drs. Tim Finin (chair), Tim Oates, Anupam Joshi, Yun Peng,
Indrajit Bhattacharya (IBM Research) and L. V. Subramaniam (IBM
Research)


More information about the CSEE-colloquium-out mailing list