[CSEE-colloq] talk: Barsky on Suffix trees for very large strings, 1pm Fri 3/30 ITE325 UMBC

Tim Finin finin at cs.umbc.edu
Fri Mar 16 12:53:25 EDT 2012


                  Suffix trees for very large strings

                           Dr. Marina Barsky
               University of Illinois at Urbana-Champaign

              1:00pm Friday, 30 March 2012, ITE 325b, UMBC


The seminar is dedicated to the construction of suffix trees in
external memory. A suffix tree is a compact index of all substrings of
a given text. While being asymptotically linear in the size of the
input, in practice, suffix trees can easily be 50 times larger than
the input. As such, suffix trees often exceed typical main memory
sizes, even when the input does not. As most existing algorithms are
designed for RAM, their performance severely degrades when the tree
and/or input do not fit in main memory. So far, this has prevented the
wide application of suffix trees for the analysis of massive string
collections.

We will look at new advanced methods of suffix tree construction which
circumvent memory concerns and allow us to construct suffix trees for
inputs of any size using secondary storage (magnetic disks). We will
also discuss how this disk-based index can be used for facilitating
the pattern discovery in sequential data.


Dr. Marina Barsky is currently a Post-Doctoral fellow in the
Department of Computer Science at the University of Illinois at
Urbana-Champaign, IL. She received her PhD in Computer Science from
the University of Victoria, British Columbia, Canada in 2010. A large
part of her post-graduate research was dedicated to pattern discovery
in string data, primarily in massive DNA databases. She is currently
expanding her expertise in database systems to new areas such as index
management and data mining. Her research interests include data mining
of sequential data, information networks, and teaching of computer
science through interactive interfaces.


Host: Richard Chang
See http://csee.umbc.edu/talks for more information


More information about the CSEE-colloquium-out mailing list