2006-09: Probabilistic Distance Clustering

DIMACS TR: 2006-09

Probabilistic Distance Clustering

Authors: Adi Ben-Israel and Cem Iyigun

ABSTRACT
We present a new iterative method for probabilistic clustering of data. Given clusters, their centers, and the distances of data points from these centers, the probability of cluster membership at any point is assumed to depend on its distances from all centers. We state this assumption as a principle that probability is inversely proportional to distance.

The method is based on the above principle, and on the joint distance function, a measure of distance from all cluster centers, that evolves during the iterations, and captures the data in its low contours.

At each iteration, the distances (Euclidean, Mahalanobis, etc.) from the cluster centers are computed for all data points, and the centers are updated as stationary points of the joint distance function. The initial centers are arbitrary and computations stop when the centers stop moving.

The method is simple, fast (requiring a small number of cheap iterations) and gives a high percentage of correct classifications. It converges to the true cluster centers for all initial solutions, and is not sensitive to outliers.

Keywords: Distance clustering, probabilistic clustering, Euclidean distance, Mahalanobis distance

Paper Available at: ftp://dimacs.rutgers.edu/pub/dimacs/TechnicalReports/TechReports/2006/2006-09.ps.gz

DIMACS Home Page