## G. Cormode and M. Garofalakis.
Histograms and wavelets on probabilistic data.
Technical Report arXiv:0806.1071, arXiv, 2008.

There is a growing realization that uncertain information is a first-class citizen in modern
database management. As such, we need techniques to correctly and efficiently process
uncertain data in database systems. In particular, data reduction techniques that can produce
concise, accurate synopses of large probabilistic relations are crucial. Similar to their
deterministic relation counterparts, such compact probabilistic data synopses can form the
foundation for human understanding and interactive data exploration, probabilistic query
planning and optimization, and fast approximate query processing in probabilistic database
systems.
In this paper, we introduce definitions and algorithms for building histogram- and
wavelet-based synopses on probabilistic data. The core problem is to choose a set of histogram
bucket boundaries or wavelet coefficients to optimize the accuracy of the approximate
representation of a collection of probabilistic tuples under a given error metric. For a
variety of different error metrics, we devise efficient algorithms that construct optimal or
near optimal B-term histogram and wavelet synopses. This requires careful analysis of the
structure of the probability distributions, and novel extensions of known
dynamic-programming-based techniques for the deterministic domain. Our experiments show that
this approach clearly outperforms simple ideas, such as building summaries for samples drawn
from the data distribution, while taking equal or less time.

[ bib |
http ]
Back

*This file was generated by
bibtex2html 1.92.*