DIMACS TR: 2009-13

Global Ordering For Multi-Dimensional Data: Comparison with K-means Clustering

Authors: Baiyang Liu, Casimir Kulikowski and Ilya Muchnik


This paper describes a novel approach to estimate the quality of clustering based on finding a linear ordering for multi-dimensional data by which the clusters of the data fall into intervals on the ordering scale. This permits assessing the result of such local clustering methods like K-means so as to filter inhomogeneous or outlier clusters that can be produced. Preliminary results reported here indicate that the method is valuable to determine, in two dimensions, the number of visually perceived clusters generated by a mixture of Gaussian distribution model, corresponding to the number of actual generating distributions when the means are far apart, but corresponding to the reduced number of clusters arising from the perceived admixture of overlapping distributions when means are chosen to be close.

Paper Available at: ftp://dimacs.rutgers.edu/pub/dimacs/TechnicalReports/TechReports/2009/2009-13.pdf
DIMACS Home Page