G. Cormode, F. Korn, S. Muthukrishnan, and D. Srivastava. Finding hierarchical heavy hitters in data streams. In International Conference on Very Large Data Bases (VLDB), pages 464-475, 2003.

Aggregation along hierarchies is a critical summary technique in a large variety of online applications including decision support, and network management (e.g., IP clustering, denial-of-service attack monitoring). Despite the amount of recent study that has been dedicated to online aggregation on sets (e.g., quantiles, hot items), surprisingly little attention has been paid to summarizing hierarchical structure in stream data.

The problem we study in this paper is that of finding Hierarchical Heavy Hitters (HHH): given a hierarchy and a fraction phi, we want to find all HHH nodes that have a total number of descendants in the data stream larger than phi of the total number of elements in the data stream, after discounting the descendant nodes that are HHH nodes. The resulting summary gives a topological 'cartogram' of the hierarchical data. We present deterministic and randomized algorithms for finding HHHs, which builds upon existing techniques by incorporating the hierarchy into the algorithms. Our experiments demonstrate several factors of improvement in accuracy over the straightforward approach, which is due to making algorithms hierarchy-aware

bib | Alternate Version | .pdf ] Back

This file was generated by bibtex2html 1.92.