DIMACS TR: 2002-35

Estimating Dominance Norms of Multiple Data Streams



Authors: Graham Cormode and S. Muthukrishnan

ABSTRACT

Data streams often consist of multiple signals. Consider a stream of multiple signals (i,a_i,j) where i's correspond to the domain, j's index the different signals and a_i,j >= 0 to the value of the j'th signal at point i. We study the problem of determining the dominance norms over the multiple signals, in particular the max-dominance norm, defined as sum_i max_j {a_i,j}. It is used in applications to estimate the ``worst case influence'' of multiple processes, for example in IP traffic analysis, electrical grid monitoring and financial domain. Besides finding many applications, it is a natural measure: it generalizes the notion of union of data streams and may be alternately thought of as estimating the L1 norm of the upper envelope of multiple signals.

We present the first known data stream algorithm for estimating max-dominance of multiple signals. In particular, we use workspace and time-per-item that are both sublinear (in fact, poly-logarithmic) in the input size. The algorithm is simple and implementable; its analysis relies on using properties of stable random distributions with small parameter alpha, which may be a technique of independent interest. In contrast we show that other dominance norms -- min-dominance sum_i min_j {a_i,j}, count-dominance (|{i | a_i > b_i}|) or relative-dominance (sum_i a_i/max{1,b_i}), are all impossible to estimate accurately with sublinear space.

Paper Available at: ftp://dimacs.rutgers.edu/pub/dimacs/TechnicalReports/TechReports/2002/2002-35.ps.gz


DIMACS Home Page