DIMACS TR: 2005-33

Estimating Entropy and Entropy Norm on Data Streams



Authors: Amit Chakrabarti, Khanh Do Ba and S. Muthukrishnan

ABSTRACT

We consider the problem of computing information theoretic functions such as entropy on a data stream, using sublinear space. Our first result deals with a measure we call the ``entropy norm'' of an input stream: it is closely related to entropy but is structurally similar to the well-studied notion of frequency moments. We give a polylogarithmic space one-pass algorithm for estimating this norm under certain conditions on the input stream. We also prove a lower bound that rules out such an algorithm if these conditions do not hold. Our second result is a sublinear space one-pass algorithm for estimating the empirical entropy of an input stream. For a stream of $m$ items and a given real parameter $\alpha$, our algorithm uses space $\widetilde{O}(m^{2\alpha})$ and provides an approximation of $O(1/\alpha)$ in the worst case and $(1+\eps)$ in ``most'' cases. All our algorithms are quite simple.

Paper Available at: ftp://dimacs.rutgers.edu/pub/dimacs/TechnicalReports/TechReports/2005/2005-33.pdf
DIMACS Home Page