[CSEE Talk] talk: Topic Modeling for Analyzing Document Collection

Tim Finin finin at cs.umbc.edu
Thu May 12 16:50:29 EDT 2016


                           CHMPR Lecture Series

           Topic Modeling for Analyzing Document Collection

                           Mitsunori Ogihara
          Department of Computer Science, University of Miami

              11:00am Monday, 16 May 2016, ITE 325b, UMBC

Topic modeling (in particular, Latent Dirichlet Analysis) is a
technique for analyzing a large collection of documents.  In topic
modeling we view each document as a frequency vector over a vocabulary
and each topic as a static distribution over the vocabulary.  Given a
desired number, K, of document classes, a topic modeling algorithm
attempts to estimate concurrently K static distributions and for each
document how much each K class contributes.  Mathematically, this is
the problem of approximating the matrix generated by stacking the
frequency vectors into the product of two non-negative matrices, where
both the column dimension of the first matrix and the row dimension of
the second matrix are equal to K.  Topic modeling is gaining
popularity recently, for analyzing large collections of documents.

In this talk I will present some examples of applying topic modeling:
(1) a small sentiment analysis of a small collection of short patient
surveys, (2) exploratory content analysis of a large collection of
letters, (3) document classification based upon topics and other
linguistic features, and (4) exploratory analysis of a large
collection of literally works.  I will speak not only the exact topic
modeling steps but also all the preprocessing steps for preparing the
documents for topic modeling.


Mitsunori Ogihara is a Professor of Computer Science at the University
of Miami, Coral Gables, Florida.  There he directs the Data Mining
Group in the Center for Computational Science, a university-wide
organization for providing resources and consultation for large-scale
computation.  He has published three books and approximately 190
papers in conferences and journals. He is on the editorial board for
Theory of Computing Systems and International Journal of Foundations
of Computer Science.  Ogihara received a Ph.D. in Information Sciences
from Tokyo Institute of Technology in 1993 and was a
tenure-track/tenured faculty member in the Department of Computer
Science at the University of Rochester from 1994 to 2007.


More information about the CSEE-colloquium-out mailing list