## R. Berinde, G. Cormode, P. Indyk, and M. Strauss.
Space-optimal heavy hitters with strong error bounds.
In *ACM Principles of Database Systems (PODS)*, 2009.

The problem of finding heavy-hitters and approximating the frequencies
of items is at the heart of many problems in data stream analysis. It
has been observed that several proposed solutions to this problem can
outperform their worst-case guarantees on real data. This leads to the
question of whether some stronger bounds can be guaranteed. We answer
this in the positive by showing that a class of “counter-based
algorithms” (including the popular and very space-efficient Frequent
and Spacesaving algorithms) provide much stronger approximation
guarantees than previously known. Specifically, we show that errors in
the approximation of individual elements do not depend on the
frequencies of the most frequent elements, but only on the frequency
of the remaining “tail”.
This shows that counter-based methods are the
most space-efficient (in fact, space-optimal) algorithms having this
strong error bound.
This tail guarantee allows these algorithms to solve the “sparse
recovery” problem. Here, the goal is to recover a faithful
representation of the vector of frequencies, *f*. We prove that using
space *O*(*k*), the algorithms construct an approximation *f*^{*} to the
frequency vector *f* so that the L1 error ||*f*-*f*^{*}||_{1} is close to the best
possible error min_{f'} ||*f*' - *f*||_{1}, where *f*' ranges over all vectors
with at most *k* non-zero entries. This improves the previously best
known space bound of about *O*(*k* log*n*) for streams without element
deletions. Other consequences of the tail guarantees are results for skewed
(Zipfian) data, and guarantees for accuracy of merging multiple
summarized streams.

[ bib |
slides |
.pdf ]
Back

*This file was generated by
bibtex2html 1.92.*