## G. Cormode and A. McGregor.
Approximation algorithms for clustering uncertain data.
In *ACM Principles of Database Systems (PODS)*, 2008.

There is an increasing quantity of data with uncertainty arising from
applications such as sensor network measurements, record linkage, and
as output of mining algorithms.
This uncertainty is typically formalized as probability density
functions over tuple values.
Beyond storing and processing such data in a DBMS, it is necessary to
perform other data analysis tasks such as data mining.
We study the core mining problem of *clustering* on uncertain
data, and define appropriate natural generalizations of standard
clustering optimization criteria.
Two variations arise, depending on whether a point is automatically
associated with its optimal center, or whether it must be assigned to
a fixed cluster no matter where it is actually located.
For uncertain versions of *k*-means and *k*-median, we show reductions
to their corresponding weighted versions on data with no
uncertainties.
These are simple in the unassigned case, but require some care for the
assigned version.
Our most interesting results are for uncertain *k*-center,
which generalizes both traditional *k*-center and *k*-median objectives.
We show a variety of bicriteria approximation algorithms.
One picks *poly*(log*n*) more centers and achieves a
(1+ε) approximation to the best uncertain *k*-centers.
Another picks 2*k* centers and achieves a constant factor
approximation.
Collectively, these results are the first known guaranteed
approximation algorithms for the problems of clustering uncertain
data.

[ bib |
slides |
.pdf ]
Back

*This file was generated by
bibtex2html 1.92.*