DIMACS TR: 2007-10
Time-Decaying Aggregates in Out-of-order Streams
Authors: Graham Cormode, Flip Korn and Srikanta Tirthapura
ABSTRACT
Processing large data streams is now a major topic in data management.
The data involved can be truly massive, and the required analyses
complex. In a stream of sequential events such as stock feeds, sensor
readings, or IP traffic measurements, tuples pertaining to recent
events are typically more important than older ones. This can be
formalized via time decay functions, which assign weights based on
age. Decay functions such as sliding windows and exponential decay
have been well studied under the assumption of well-ordered arrivals,
i.e., data arrives in the order of increasing time stamps. However,
data quality issues are prevalent in massive streams (due to network
asynchrony and delays or possibly due to features inherent to the
measurement process), and correct order of arrival cannot be
guaranteed.
We focus on the computation of decayed aggregates such as range
queries, quantiles, and heavy hitters on out-of-order streams, where
elements do not necessarily arrive in increasing order of timestamps.
We give the first deterministic algorithms for approximating these
aggregates under the sliding window, exponential and polynomial decay
functions. Our techniques are based on extending existing data stream
summaries, such as the q-digest. We show that the overhead for
allowing out-of-order arrivals is small when compared to the case of
well-ordered arrivals. Our experimental study confirms that these
methods can be applied in practice and investigates how the specific
decay function impacts performance.
Paper Available at:
ftp://dimacs.rutgers.edu/pub/dimacs/TechnicalReports/TechReports/2007/2007-10.pdf
DIMACS Home Page