We present first-known algorithms for identifying deviants on massive data streams. Our algorithms monitor streams using very small space (polylogarithmic in data size) and are able to quickly find deviants at any instant, as the data stream evolves over time. For all versions of this problem---uni- vs multivariate time series, optimal vs near-optimal vs heuristic solutions, offline vs streaming---our algorithms have the same framework of maintaining a hierarchical set of candidate deviants that are updated as the time series data gets progressively revealed.
We show experimentally using real network traffic data (SNMP aggregate time series) as well as synthetic data that our algorithm is not only remarkably accurate in determining the deviants, but also that evolution of deviants over time reveals interesting artifacts in data streams.
Paper available at ftp://dimacs.rutgers.edu/pub/dimacs/TechnicalReports/TechReports/2003/2003-43.ps.gz