DIMACS TR: 2003-20

Improved Data Stream Summaries: The Count-Min Sketch and its Applications



Authors: Graham Cormode and S. Muthukrishnan

ABSTRACT

We introduce a new small space data structure -- the Count-Min Sketch -- for summarizing data streams. Our sketch allows fundamental queries in data stream summarization such as point, range and inner product queries to be approximately answered quickly and efficiently; in addition it can eb applied to solve several important problems in data stream such as finding quantiles, frequent items, etc. In both cases, the time and space bounds we show using the CM sketch significantly improves those known for previous sketch constructions -- typically from 1/epsilon^2 to 1/epsilon in factor.

Paper Available at: ftp://dimacs.rutgers.edu/pub/dimacs/TechnicalReports/TechReports/2003/2003-20.ps.gz
DIMACS Home Page