Many applications generate massive data streams. Summarizing such massive data requires fast, small space algorithms to support post-hoc queries and mining. An important observation is that such streams are rarely uniform, and real data sources typically exhibit significant skewness. These are well modeled by Zipf distributions, which are characterized by a parameter,z, that captures the amount of skew.We present a data stream summary that can answer point queries with ε accuracy and show that the space needed is only

O(ε^{-min{1,1/z}}). This is the firsto(1/ε) space algorithm for this problem, and we show it is essentially tight for skewed distributions. We show that the same data structure can also estimate theL_{2}norm of the stream ino(1/ε^{2}) space forz>1/2, another improvement over the existing Ω(1/ε^{2}) methods.We support our theoretical results with an experimental study over a large variety of real and synthetic data. We show that significant skew is present in both textual and telecommunication data. Our methods give strong accuracy, significantly better than other methods, and behave exactly in line with their analytic bounds.

*This file was generated by
bibtex2html 1.92.*