DIMACS Series in
Discrete Mathematics and Theoretical Computer Science

TITLE: Partitioning Data Sets
EDITORS: Ingemar Cox, Pierre Hansen, Bela Julesz
Published by the American Mathematical Society

Ordering Information

This volume may be obtained from the AMS or through bookstores in your area.

To order through AMS contact the AMS Customer Services Department, P.O. Box 6248, Providence, Rhode Island 02940-6248 USA. For Visa, Mastercard, Discover, and American Express orders call 1-800-321-4AMS.

You may also visit the AMS Bookstore and order directly from there. DIMACS does not distribute or sell these books.


Partitioning data sets into disjoint groups is a problem arising in many domains. The mathematical theory of Cluster Analysis aims at finding such groups which are both homogeneous, i.e., such that entities in the same group are similar, and well separated, i.e., such that entities in different groups are dissimilar. Work on the axiomatic foundations and the computational complexity of such problems as well as on the design and analysis of exact or heuristic algorithms to solve them is rapidly expanding. Homogeneity and separation can be made precise in a variety of ways, leading to a variety of clustering problems. Applications in psychology, computer vision, target tracking and other domains are numerous. In each of these domains, specific partitioning problems arise as well as efficient solution methods that exploit particular characteristics of the domain. This book presents twenty essays and surveys on cluster analysis theory, methods and applications and related problems in psychology, computer vision and target tracking.

A first group of papers, devoted to cluster analysis methods and algorithms, covers the main paradigms of that field. Melvin F. Janowitz and Rudolf Wille consider hierarchical clustering and classify ordinal cluster methods according to the significance assigned to the input data. Patrice Bertrand studies structural properties of a recent extension of the hierarchical clustering scheme: pyramidal clustering. A detailed survey of the median procedure for partitions, both from the axiomatic and the algorithmic points of view, is provided by Jean-Pierre Barthelemy and Bruno Leclerc. Another consensus problem is addressed by Wayne Goddard, Ewa Kubicka, Grzegorz Kubicki and F. R. McMorris, who focus on agreement subtrees of labeled binary trees. Two papers are devoted to clustering on graphs: Weiqing Cai and David W. Matula give a linear heuristic for partitioning by maximum adjacency search of graphs; Sylvie G\'elinas, Pierre Hansen and Brigitte Jaumard provide a low-order polynomial labeling algorithm for minimum sum-of-diameters bipartitioning, useful in divisive hierarchical clustering. Fionn Murtagh studies contiguity-constrained hierarchical clustering and explains when and why reversals occur, demonstrating the approach on quantized images from astronomy. Pierre Hansen, Brigitte Jaumard and Nenad Mladenovic outline a new sequential clustering paradigm, in which clearly apparent clusters are isolated and removed one at a time (a procedure reminiscent of sequential identification and removal of objects in image analysis). In a more speculative vein, Edwin Diday describes symbolic data analysis which applies to boolean or probabilistic objects well adapted to representing knowledge.

A second group of papers addresses problems of partitioning that arise in the context of multitarget tracking and surveillance. Aubrey B. Poore describes a fast, near optimal Lagrangian relaxation based algorithm to solve the NP-hard data association problem of partitioning real-time observations into tracks and false alarms. Anil Kumar, Yaakov Bar-Shalom and Eliezer Oron describe an algorithm for segmenting an image into target and background regions.

The third group of papers describes some partitioning problems in computer vision. Ingemar J. Cox, James H. Rehg, Sunita L. Hingorani and Matt L. Miller propose an algorithm for grouping edges and segmenting contours that is based on Reid's multiple hypothesis algorithm, originally developed in the context of multitarget tracking. David W. Jacobs presents a low-order polynomial algorithm that locates salient convex collections of lines in an image. Optical flow computation merges information over a local image patch to estimate the $2$D image velocity at a point, assuming that only a single motion is present. Allan Jepson and Michael J. Black relax this single-motion assumption and apply mixture models and an extension of the EM algorithm to cluster the multiple motions present within a region. Yibing Yang and Allan L. Yuille present a multilevel algorithm for matching a pair of stereo images, in terms of surface detection in a three-dimensional spatio-disparity space.

Finally, a fourth group of papers focuses on human vision. Irving Biederman discusses problems of visual shape recognition for which clustering mathematics might yield some potential benefits. Daniel Kersten and Sutheg Madarasmi consider similarities and differences between human perception of surfaces, their properties and relationships and traditional studies of data clustering, focussing on three problems: occlusion, transparency and lightness. Two papers are devoted to problems of dot clusters: Jacob Feldman examines how human observers draw perceptually coherent clusters out of fields of dots, a process analogous to but different from statistical clustering. Special attention is given to molecular clusters and their classification. Steven W. Zucker proposes that visual perception of dots underlies clustering operations when more domain specific information is not available, and studies visual computations in this perspective. Bela Julesz reviews work on illusory contour perception, from early work on cyclopean contours using random-dot stereograms to recent results that show that illusory contours depend on global parameters such as closure of stacked lines as well as local interactions.

The multiplicity of approaches, methods, problems and algorithms made for very lively discussions at the DIMACS workshop. To a large extent, the language barrier, due to the multidisciplinary character of the workshop, was alleviated by the participants' goodwill and through many questions and answers. Any remaining difficulties were compensated for by new ideas and perspectives for theory and applications. As organizers, we thank the DIMACS center, NEC Research and the Office of Naval Research for financial support, as well as the AMS for its willingness to publish this volume.

Ingemar Cox
Pierre Hansen
Bela Julesz


Foreword                                                             ix

Preface                                                              xi

Part 1. Cluster Analysis Methods
The Median Procedure for Partitions Jean-Pierre Barthelemy and Bruno Leclerc 3 Structural Properties of Pyramidal Clustering Patrice Bertrand 35 Partitioning by Maximum Adjacency Search of Graphs Weiqing Cai and David W. Matula 55 From Data to Knowledge: Probabilist Objects for a Symbolic Data Analysis Edwin Diday 65 A Labeling Algorithm for Minimum Sum of Diameters Partitioning of Graphs Sylvie Gelinas, Pierre Hansen and Brigitte Jaumard 89 Agreement Subtrees, Metric and Consensus for Labeled Binary Trees Wayne Goddard, Ewa Kubicka, Grzegorz Kubicki and F.R. McMorris 97 How to Choose K Entities Among N Pierre Hansen, Brigitte Jaumard and Nenad Mladenovic 105 On the Classification of Monotone-Equivariant Cluster Methods Melvin F. Janowitz and Rudolf Wille 117 Contiguity-Constrained Hierarchical Clustering Fionn Murtagh 143 Part 2. Target Tracking Image Segmentation Based on Optimal Layering for Precision Tracking Anil Kumar, Yaakov Bar-Shalom and Eliezer Oron 155 Multidimensional Assignments and Multitarget Tracking Aubrey B. Poore 169 Part 3. Computer Vision Grouping Edges: An Efficient Bayesian Multiple Hypothesis Approach Ingemar J. Cox, James H. Rehg, Sunita L. Hingorani and Matt L. Miller 199 Finding Salient Convex Groups David W. Jacobs 237 Mixture Models for Optical Flow Computation Allan Jepson and Michael J. Black 271 Multilevel Detection of Stereo Disparity Surfaces Yibing Yang and Allan L. Yuille 287 Part 4. Human Vision Some Problems of Visual Shape Recognition to Which the Application of Clustering Mathematics Might Yield Some Potential Benefits Irving Biederman 313 Perceptual Models of Small Dot Clusters Jacob Feldman 331 Subjective Contours in Early Vision and Beyond Bela Julesz 359 The Visual Perception of Surfaces, their Properties and Relationships Daniel Kersten and Sutheg Madarasmi 373 Visual Computations and Dot Cluster Steven W. Zucker 391

Index Index of Volumes
DIMACS Homepage
Contacting the Center
Document last modified on October 28, 1998.