A MULTIGRAPH MODEL FOR MASSIVE DATA

James Abello

                       Communication Information Systems 
                             AT&T Labs -- Research 
                             Florham Park, NJ 07932
                             abello@research.att.com
  

Today, there is a tremendous proliferation of applications that need to process large corpus of data. It is becoming the rule rather than the exception that the associated data does not fit into main memory. Since CPU speeds are increasing at a rate substantially higher than disk transfer rates, the I/O between main and external memory is becoming an increasingly significant bottleneck.

Extraction of semantics from this massive amounts of data is offering new challenges at all levels of processing-i.e. storage, access, manipulation, control, presentation and usage. Limitations on the expressibility power of the relational algebra suggest the need for the adoption of higher order concepts as the atomic notions of information processing. It is here that External Dynamic Weighted Multigraphs can play an important role.

We are currently developing a collection of new techniques that take the I/O bottleneck into account and lead to the construction of I/O efficient dynamic multigraph algorithms. The techniques blend graph theoretical notions with statistical methods in a multiresolution framework. This in turn is directly tied to a set of multilinked views managed by a high power visualizer. We will discuss some of the main algorithmic and computational challenges that need to be overcome before the suggested framework becomes mainstream. The presentation will be guided by our current findings on the exploration of data sets containing on the order of 70 billion records.