### DIMACS Theoretical Computer Science Seminar

Title: A fast k-means implementation using coresets

Speaker: **Christian Sohler**, University of Paderborn, Germany

Date: April 4, 2006 2:00-3:00pm

Location: CoRE A, Room 301, Rutgers University, Busch Campus, Piscataway, NJ

Abstract:

In this talk we introduce a simple coreset construction for clustering
problems in the Euclidean space like k-median and k-means
clustering. In k-median clustering we are given a set of point P in the
Euclidean space R^d and the goal is to find k centers such that
the sum of distances over all input points to the nearest center is
minimized. A coreset is a small weighted set of points such that for
every set C of k centers, the (weighted) cost for the coreset points is
approximately the same as the cost for P.

Using this construction we develop a polylog-space semi-dynamic
(insertion-only) data structure that maintains a (1+epsilon)-approximation
to the k-median and k-means clustering. Using the same construction we can
also obtain a polylog-space dynamic data structure that
maintains a (1+epsilon)-approximation for these problems.

Finally, we show that one can use the coreset construction to obtain a
fast implementation of a k-means clustering algorithm for low
dimensional input instances.