Making Graph Sketching Practical

November 03, 2021, 11:00 AM - 12:00 PM

Location:

Online Event

David Tench, Rutgers University

Graphs are ubiquitous in science and engineering, and these graphs are often too big to store in RAM, may be distributed across many machines, and may be dynamic (meaning they change over time). The dynamic graph streaming model addresses these challenges: algorithms in this model use asymptotically small space and require one sequential pass (or a small number of passes) over the graph data. All known dynamic streaming algorithms use linear sketches, and space-efficient graph sketching algorithms are known for many problems including connectivity, sparsification, triangle counting, and matching. However, most of these asympotically impressive results have not seen use in graph processing systems. How do you make a graph sketching algorithm practical? In this talk I will answer this question for the problem of computing the connected components of a graph defined by a stream of edge insertions and deletions. While an asymptotically space-optimal algorithm for this problem is known, I will show how any real implementation of said algorithm is too big and too slow to be useful on modern hardware. I will then present a linear sketching algorithm which allows for much faster stream processing, and an I/O-optimal external memory algorithm which allows these linear sketches to be stored on disk instead of RAM without sacrificing performance. Finally I'll present GraphZeppelin, a graph streaming system which implements this improved algorithm and outperforms existing graph processing tools on dense graph streams.

 

Special Note: The Theory of Computing Seminar is being held online. Contact the organizers for the link to the seminar. 

See: https://theory.cs.rutgers.edu/theory_seminar