Compact summaries over large datasets, May 2015. Invited tutorial in PODS 2015 and BICOD 2015.

A fundamental challenge in processing the massive quantities of information generated by modern applications is in extracting suitable representations of the data that can be stored, manipulated and interrogated on a single machine. A promising approach is in the design and analysis of compact summaries: data structures which capture key features of the data, and which can be created effectively over distributed data sets. Popular summary structures include the count distinct algorithms, which compactly approximate item set cardinalities, and sketches which allow vector norms and products to be estimated. These are very attractive, since they can be computed in parallel and combined to yield a single, compact summary of the data. This tutorial introduces the concepts and examples of compact summaries.

bib | slides | .pdf ] Back

This file was generated by bibtex2html 1.92.