Incremental Clustering and Dynamic Information Retrieval

Moses Charikar (Stanford University)



Motivated by applications such as document and image classification in
information retrieval, we consider the problem of clustering dynamic
point sets in a metric space. The goal is to efficiently maintain k
clusters of small diameter as new points are inserted. We propose a
model called incremental clustering based on a careful analysis of the
requirements of the information retrieval application, and which
should also be useful in other applications.  We analyze several
natural greedy algorithms and demonstrate that they perform poorly.
We propose new deterministic and randomized incremental clustering
algorithms which have a provably good performance, and which we
believe should also perform well in practice.  We complement our
positive results with lower bounds on the performance of incremental
algorithms. Finally, we consider the dual clustering problem where the
clusters are of fixed diameter, and the goal is to minimize the number
of clusters.