[CSEE Talk] talk: Topic Modeling with Structured Priors for Text-Driven Science, 12p Mon 3/2 ITE325

Tim Finin finin at cs.umbc.edu
Sun Mar 1 10:28:50 EST 2015


              Computer Science and Electrical Engineering
                University of Maryland, Baltimore County

     Topic Modeling with Structured Priors for Text-Driven Science

                           Michael Paul, JHU

               12:00pm-1:00pm, Monday 2 March 2015, ITE

Many scientific disciplines are being revolutionized by the explosion
of public data on the web and social media, particularly in health and
social sciences. For instance, by analyzing social media messages, we
can instantly measure public opinion, understand population behaviors,
and monitor events such as disease outbreaks and natural disasters.
Taking advantage of these data sources requires tools that can make
sense of massive amounts of unstructured and unlabeled text. Topic
models, statistical models that describe low-dimensional
representations of data, can uncover interesting latent structure in
large text datasets and are popular tools for automatically
identifying prominent themes in text. However, to be useful in
scientific analyses, topic models must learn interpretable patterns
that accurately correspond to real-world concepts of interest.

In this talk, I will introduce Sprite, a family of topic models that
can encode additional structures such as hierarchies, factorizations,
and correlations, and can incorporate supervision and domain
knowledge.  Sprite extends standard topic models by formulating the
Bayesian priors over parameters as functions of underlying components,
which can be constrained in various ways to induce different
structures. This creates a unifying representation that generalizes
several existing topic models, while creating a powerful framework for
building new models. I will describe a few specific instantiations of
Sprite and show how these models can be used in various scientific
applications, including extracting self-reported information about
drugs from web forums, analyzing healthcare quality in online reviews,
and summarizing public opinion in social media on issues such as gun
control.

Michael Paul (http://bit.ly/mjpjhu) is a PhD candidate in Computer
Science at Johns Hopkins University. He earned an M.S.E. in CS from
Johns Hopkins University in 2012 and a B.S. in CS from the University
of Illinois at Urbana-Champaign in 2009. He has received PhD
fellowships from Microsoft Research, the National Science Foundation,
and the Johns Hopkins University Whiting School of Engineering. His
research focuses on exploratory machine learning and natural language
processing for the web and social media, with applications to
computational epidemiology and public health informatics.

Host: Tim Finin

     -- more information and directions: http://bit.ly/UMBCtalks --


More information about the CSEE-colloquium-out mailing list