Interdisciplinary Seminar Series

Title: Faster buffering in large databases

Speaker: Mihai Patrascu, AT&T Labs

Date: Monday, April 4, 2011 12:00 - 1:00 pm

Location: DIMACS Center, CoRE Bldg, Room 431, Rutgers University, Busch Campus, Piscataway, NJ


Imagine a large (disk-resident) database that needs to support a fast stream of updates. The updates may be natural, or, more likely, self-inflicted (if we want to maintain many indexes, one logical update translates into many).

If we are worried about keeping up with the updates, the simplest thing to do is to just record the updates in an unsorted log, in which case we can support an update rate near disk speed. Of course, queries will not be particularly efficient.

Buffer trees are one of the classic external memory data structures. They offer a way to record updates at a rate logarithmically slower than raw disk speed, and then support lookups for a particular key in reasonable (logarithmic) time.

I will present a new, improved data structure that can support updates with just a log-log slowdown compared to disk speed, and still implement lookups in logarithmic time. This is the first data structure that aggressively uses hashing in the external memory model; in particular, it is the first to break outside the classic "indivisibility model".

DIMACS/CCICADA Interdisciplinary Series, Spring 2011