The volume of data generated in modern applications can be massive, overwhelming our abilities to conveniently transmit, store, and index. For many scenarios, it is desirable to instead build a compact summary of a dataset that is vastly smaller. In exchange for some approximation, we obtain flexible and efficient tools that can answer a range of different types of query over the data. This book provides a comprehensive introduction to the topic data summarization, showcasing the algorithms, their behavior, and the mathematical underpinnings of their operation. The coverage starts with simple sums and approximate counts, building to more advanced probabilistic structures such as the Bloom Filter, distinct value summaries, sketches, and quantile summaries. Summaries are described for specific types of data, such as geometric data, graphs, and vectors and matrices. Throughout, examples, pseudocode and applications are given to enhance understanding.
[ bib | http ] Back
This file was generated by bibtex2html 1.92.