## DIMACS TR: 2004-52

##
Approximate Histogram and Wavelet Summaries of Streaming Data

### Authors: S. Muthukrishan and Martin J. Strauss

**
ABSTRACT
**
We consider several algorithms for building histograms that are
optimal or nearly optimal, from streaming data. Specifically, we
consider a vector A given either as a stream of aggregate values
A[0],A[1],... or a stream of updates of the form "add x to A[i]." We
then build a histogram H; that is, a piecewise-constant vector with a
given number B of pieces. We also consider the related representation
formats of piecewise-linear representations and Haar wavelet
representations.
We consider first point queries to the data. The point query i should
ideally be answered by A[i]; instead, it will be answered by H[i],
leading to square error (A[i]-H[i])^2. We want to minimize or nearly
minimize thus sum square error. Next, we consider range queries, of
the form (ell,r), which should ideally be answered by the sum of A[i]
over i in the range ell to r. Finally, we consider some of these
questions with regard to two-dimensional signals A[i,j] indexed by two
(or more) integers instead of just a single integer.

Paper Available at:
ftp://dimacs.rutgers.edu/pub/dimacs/TechnicalReports/TechReports/2004/2004-52.ps.gz

DIMACS Home Page