S. Ganguly and G. Cormode. On estimating frequency moments of data streams. In Proceedings of RANDOM, 2007.

Space-economical estimation of the pth frequency moments, defined as Fp = Σi=1n |fi|p, for p > 0, are of interest in estimating all-pairs distances in a large data matrix, machine learning, and in data stream computation. Random sketches formed by the inner product of the frequency vector f1, ..., fn with a suitably chosen random vector were pioneered by Alon, Matias and Szegedy, and have since played a central role in estimating Fp and for data stream computations in general. The concept of p-stable sketches formed by the inner product of the frequency vector with a random vector whose components are drawn from a p-stable distribution, was proposed by Indyk for estimating Fp, for 0 < p < 2, and has been further studied by Li.

In this paper, we consider the problem of estimating Fp, for 0 < p < 2. A disadvantage of the stable sketches technique and its variants is that they require O((1)/(ε2)) inner-products of the frequency vector with dense vectors of stable (or nearly stable) random variables to be maintained. This means that each stream update can be quite time-consuming. We present algorithms for estimating Fp, for 0 < p < 2, that does not require the use of stable sketches or its approximations. Our technique is elementary in nature, in that, it uses simple randomization in conjunction with well-known summary structures for data streams, such as the CM sketch sketch and the Count sketch structure. Our algorithms require space O ((1)/(ε2+p)) to estimate Fp to within 1 +/- ε factors and requires expected time O(logF1 log (1)/(δ)) to process each update. Thus, our technique trades an O((1)/(εp)) factor in space for much more efficient processing of stream updates. We also present a stand-alone iterative estimator for F1.

bib | .pdf ] Back

This file was generated by bibtex2html 1.92.