DIMACS TR: 2004-52

Approximate Histogram and Wavelet Summaries of Streaming Data

Authors: S. Muthukrishan and Martin J. Strauss

We consider several algorithms for building histograms that are optimal or nearly optimal, from streaming data. Specifically, we consider a vector A given either as a stream of aggregate values A[0],A[1],... or a stream of updates of the form "add x to A[i]." We then build a histogram H; that is, a piecewise-constant vector with a given number B of pieces. We also consider the related representation formats of piecewise-linear representations and Haar wavelet representations.

We consider first point queries to the data. The point query i should ideally be answered by A[i]; instead, it will be answered by H[i], leading to square error (A[i]-H[i])^2. We want to minimize or nearly minimize thus sum square error. Next, we consider range queries, of the form (ell,r), which should ideally be answered by the sum of A[i] over i in the range ell to r. Finally, we consider some of these questions with regard to two-dimensional signals A[i,j] indexed by two (or more) integers instead of just a single integer.

Paper Available at: ftp://dimacs.rutgers.edu/pub/dimacs/TechnicalReports/TechReports/2004/2004-52.ps.gz

DIMACS Home Page