DIMACS TR: 2002-35
Estimating Dominance Norms of Multiple Data Streams
Authors: Graham Cormode and S. Muthukrishnan
ABSTRACT
Data streams often consist of multiple signals. Consider a stream of
multiple signals (i,a_i,j) where i's correspond to the domain, j's index
the different signals and a_i,j >= 0 to the value of the j'th signal at
point i. We study the problem of determining the dominance norms over the
multiple signals, in particular the max-dominance norm, defined as sum_i
max_j {a_i,j}. It is used in applications to estimate the ``worst case
influence'' of multiple processes, for example in IP traffic analysis,
electrical grid monitoring and financial domain. Besides finding many
applications, it is a natural measure: it generalizes the notion of union
of data streams and may be alternately thought of as estimating the L1
norm of the upper envelope of multiple signals.
We present the first known data stream algorithm for estimating
max-dominance of multiple signals. In particular, we use workspace and
time-per-item that are both sublinear (in fact, poly-logarithmic) in the
input size. The algorithm is simple and implementable; its analysis
relies on using properties of stable random distributions with small
parameter alpha, which may be a technique of independent interest. In
contrast we show that other dominance norms -- min-dominance sum_i min_j
{a_i,j}, count-dominance (|{i | a_i > b_i}|) or relative-dominance (sum_i
a_i/max{1,b_i}), are all impossible to estimate accurately with sublinear
space.
Paper Available at:
ftp://dimacs.rutgers.edu/pub/dimacs/TechnicalReports/TechReports/2002/2002-35.ps.gz
DIMACS Home Page