DIMACS Theoretical Computer Science Seminar


Title: A fast k-means implementation using coresets

Speaker: Christian Sohler, University of Paderborn, Germany

Date: April 4, 2006 2:00-3:00pm

Location: CoRE A, Room 301, Rutgers University, Busch Campus, Piscataway, NJ


Abstract:

In this talk we introduce a simple coreset construction for clustering problems in the Euclidean space like k-median and k-means clustering. In k-median clustering we are given a set of point P in the Euclidean space R^d and the goal is to find k centers such that the sum of distances over all input points to the nearest center is minimized. A coreset is a small weighted set of points such that for every set C of k centers, the (weighted) cost for the coreset points is approximately the same as the cost for P.

Using this construction we develop a polylog-space semi-dynamic (insertion-only) data structure that maintains a (1+epsilon)-approximation to the k-median and k-means clustering. Using the same construction we can also obtain a polylog-space dynamic data structure that maintains a (1+epsilon)-approximation for these problems.

Finally, we show that one can use the coreset construction to obtain a fast implementation of a k-means clustering algorithm for low dimensional input instances.