Conference Publications : GangulyCormode07

S. Ganguly and G. Cormode. On estimating frequency moments of data streams. In Proceedings of RANDOM, 2007.

Space-economical estimation of the pth frequency moments, defined as F_p = Σ_i=1ⁿ |f_i|^p, for p > 0, are of interest in estimating all-pairs distances in a large data matrix, machine learning, and in data stream computation. Random sketches formed by the inner product of the frequency vector f₁, ..., f_n with a suitably chosen random vector were pioneered by Alon, Matias and Szegedy, and have since played a central role in estimating F_p and for data stream computations in general. The concept of p-stable sketches formed by the inner product of the frequency vector with a random vector whose components are drawn from a p-stable distribution, was proposed by Indyk for estimating F_p, for 0 < p < 2, and has been further studied by Li.
In this paper, we consider the problem of estimating F_p, for 0 < p < 2. A disadvantage of the stable sketches technique and its variants is that they require O((1)/(ε²)) inner-products of the frequency vector with dense vectors of stable (or nearly stable) random variables to be maintained. This means that each stream update can be quite time-consuming. We present algorithms for estimating F_p, for 0 < p < 2, that does not require the use of stable sketches or its approximations. Our technique is elementary in nature, in that, it uses simple randomization in conjunction with well-known summary structures for data streams, such as the CM sketch sketch and the Count sketch structure. Our algorithms require space O ((1)/(ε^2+p)) to estimate F_p to within 1 +/- ε factors and requires expected time O(logF₁ log (1)/(δ)) to process each update. Thus, our technique trades an O((1)/(ε^p)) factor in space for much more efficient processing of stream updates. We also present a stand-alone iterative estimator for F₁.

[ bib | .pdf ] Back

This file was generated by bibtex2html 1.92.