[CSEE-colloq] talk: Correlation Aware Optimizations for Analytic Databases, 1pm Fri 3/9, UMBC

Tim Finin finin at cs.umbc.edu
Thu Mar 8 10:14:33 EST 2012


         Correlation Aware Optimizations for Analytic Databases

                    Hideaki Kimura, Brown University

               1:00pm Friday 9 March 2012, ITE 325b, UMBC

Recent years have seen that the analysis of large data-sets is
crucially important in a wide range of business, governmental, and
scientific applications. For example, research projects in astronomy
need to analyze petabytes of image data taken from
telescopes. Providing a fast and scalable analytical data management
system for such users has become increasingly important.

The major bottlenecks for analytics on such big data are disk- and
network-I/O. Because the data is too large to fit in RAM, each query
causes substantial disk I/O. Traditional database systems provide
indexes to speed up disk reads, but many analytic queries do not
benefit from indexes because data is scattered over a large number of
disk blocks and disk seeks are prohibitively expensive. Furthermore,
such huge data sets need to be partitioned and distributed over
hundreds or many thousands of nodes. When a query requires more than
one data at once, such as a query involving a JOIN operation, the data
management system must transmit a large amount of data over the
network. For example, the Shuffle phase in Map-Reduce systems copies
file blocks over the network and causes a significant bottleneck in
many cases.

Our approach to tackling these challenges in big data analytics is to
exploit correlations. I will describe our correlation-aware indexing,
replication, and data placement which make big data analytics faster
and more scalable.  Finally, if time allows, I will also introduce
another on-going project to develop a scalable transactional
processing system on modern hardware in collaboration with
Hewlett-Packard Laboratories.


Hideaki Kimura (http://bit.ly/hkimura) is a doctoral candidate in the
Computer Science Department at Brown University.  His main research
interests are in data management systems. His dissertation research
with Prof. Stan Zdonik is on correlation-based optimizations for large
analytic databases. He also worked on transaction processing systems
exploiting modern hardware at HP Labs.


Host: Tim Finin
See http://csee.umbc.edu/talks for more information



More information about the CSEE-colloquium-out mailing list