Fully Scalable MPC Algorithms for Clustering in High Dimension

May 09, 2024, 10:45 AM - 11:15 AM

Location:

DIMACS Center

Rutgers University

CoRE Building

96 Frelinghuysen Road

Piscataway, NJ 08854

Click here for map.

Robert Krauthgamer, Weizmann Institute of Science

We design new algorithms for k-clustering in high-dimensional Euclidean spaces. These algorithms run in the Massively Parallel Computation (MPC) model, and are fully scalable, meaning that the local memory in each machine is an arbitrarily small poly(n) for input size n, which importantly may be substantially smaller than k. Our algorithms are fast, i.e., take O(1) rounds, and achieve O(1)-bicriteria approximation for k-Median and for k-Means, while previous work achieves only polylog(n)-approximation or handles special cases.

Our results rely on a fast MPC algorithm for O(1)-approximation of facility location. A primary technical tool that we develop, and may be of independent interest, is a new MPC primitive for geometric aggregation, namely, computing certain statistics on an approximate neighborhood of every data point, which includes range counting and nearest-neighbor search.

Joint work with Artur Czumaj, Guichen Gao, Shaofeng H.-C. Jiang, and Pavel Vesely.

[Video]