DIMACS Workshop on Graph Theoretic Methods in Computer Vision

May 20 - 21, 1999

*Star Type Graphical Models for Efficient Object Detection*

I will describe a simple star type graphical model for detecting objects from a particular class in real gray level images. The algorithm has local and global invariances hard wired in the computation. Training is done by identifying stable local features from a fixed pool at specific locations on registered examples of the object. The local features are themselves represented through star type graphs in terms of coarse oriented edge detectors. I will show examples in face detection, detection of 2d shapes, and 3d rigid objects. A parallel implementation of the algorithm will be discussed as well.

**Kim L. Boyer**, The Ohio State University

*Graph-Theoretical Methods in Computer Vision:
Issues in Inexact Matching and the Use of Spectra*

In this talk we shall present some issues in the use of graph-theoretical methods in computer vision. Among the themes we explore are the extensions of graphical formalisms to (random) parametric structural descriptions, a form of attributed, possibly directed, hypergraph on which a great deal of information can be stored, and methods for combining the descriptive power of graph theory with the principled methods for handling inexactness offered by probabilistic approaches, including maximum likelihood estimation and Bayesian decision theory.

We outline our use of these notions in such problems as stereo correspondence, spatial data registration, the organization of large structural modelbases, and related issues in object recognition. Then, we intend to pose some questions related to these extensions. For instance, can the notion of a graph spectrum be extended to random parametric structural decriptions and, more importantly, used effectively to yield faster matching algorithms, or to guide graph clustering? To close, we present a counting problem that has puzzled the speaker (on and off) over the past 16 years. This problem arose from a stripe indexing scheme in a color-encoded structured light system developed in 1984.

*Automatic Landmark Acquisition as a Graph Decomposition Problem*

This talk addresses the problem of automatically acquiring an optimal set of visual landmarks by which a mobile robot can visually navigate its environment. During a training phase, the mobile robot visits a discrete set of positions that cover the free space of the floor, acquiring a number of images at each position. These positions become the nodes in a graph, while the visible features at a given node form a binary-valued feature visibility vector. A "good" landmark is one that is both unique and widely visible. In terms of our graph, finding the optimal set of landmarks reduces to finding a partition of the graph into connected blobs, such that for a given blob, the same set of k features is visible from every node in the blob. This set of k features, representing the visual landmark, is exploited by a view-based recognition srategy, called Linear Combination of Views (due to Basri and Ullman), that allows the robot to localize itself with respect to the landmarks. In this talk, I will present a formulation of the problem, analyze its computational complexity, and descibe some preliminary algorithms for solving the problem. This is joint work with Sven Dickinson and Ali Shokoufandeh, Rutgers University, and David Rosenberg, Yale University.

*Efficiently computing a good segmentation*

This talk is concerned with the problem of segmenting an image into regions, using a local measure of the difference between image pixels. We develop a general framework for a broad range of segmentation problems, based on pairwise comparison of regions in a segmentation. This framework provides precise definitions of when a segmentation is too coarse or too fine. Within this framework, we define a particular pairwise region comparison function for graph-based segmentation problems. Then we provide an efficient algorithm for computing a segmentation using this comparison function, and prove that it produces good segmentations -- those that are neither too coarse nor too fine by our definitions. We apply this algorithm to image segmentation. An important characteristic of this method is its ability to preserve detail in low-variability image regions while ignoring detail in high-variability regions. We illustrate the method with several examples on both real and sythetic images.

**Edwin R. Hancock**, University of York

*Embedding Relational Constraints in the EM Algorithm*

**Daniel Huttenlocher**, Cornell University

*Recognizing Flexible Objects*

This talk will present an approach to the problem of recognizing flexible objects, such as people and many animals, that can be modeled as a collection of rigid subparts. We use a two-dimensional representation of an object, in which models for the individual parts are connected together in a tree structure. Each connection in the tree encodes relationships between a pair of parts, such as their relative appearance and location. We then formulate the recognition problem as that of finding the best configuration of the parts given a particular model and image, using maximum a posteriori (MAP) estimation.

The central result of this work is that for such tree-structured object models the MAP estimate can be found efficiently. For m parts each with n possible locations, there are O(n^m) possible configurations of the parts. For tree-structured object models, a Viterbi style search yields an O(n^2m) algorithm for the optimal configuration under MAP estimation. This is still impractical because the number of locations, n, is generally quite large. We introduce a novel generalization of distance transforms that reduces this time to O(nm). We illustrate our approach using a simple generic model of a person, and show results of finding people in complex scenes under a range of lighting conditions and in varied poses. The method is quite practical, our implementation runs in just a few seconds on a desktop workstation.

This is joint work with Pedro Felzenszwalb

*Embedded graph algorithms for computer vision*

I will discuss two applications of graph algorithms to vision problems where a graph is embedded in Euclidean space and thereby endowed with a geometric interpretation that is useful to the problem.

Even though a graph is "one-dimensional," or rather, does not have any geometry, embedding it into higher dimensional space enables us to apply efficient graph algorithms such as maximum flow and minimum mean weight cycle algorithms to globally solve optimization problems on higher dimensional objects as hypersurface and region integral. Also, I will show how directed graphs are more natural than undirected graphs and in some applications essential.

This is joint work with Ian Jermyn and Davi Geiger

**Philip Klein**, Brown University

*Comparing shapes using edit-distance on shock graphs*

Shapes can be represented by {\em shock graphs}, skeletons whose nodes and edges are labeled with numerical information describing, e.g., curvature. The idea of using hierarchies of shocks to represent shape was developed by Ben Kimia. He and collaborators have studied methods for estimating the visual distance between shapes by estimating the distance between the corresponding shock graphs (using a heuristic for an NP-complete nonlinear program).

We observe that for shapes defined by closed simple curves, the corresponding graphs are in fact planar embedded trees. We have developed a set of primitive transformations on such graphs, and have defined the distance between two such graphs as the minimum cost of a sequence of transformations that transforms one graph into the other. We have also developed a polynomial-time algorithm to find the minimum-cost sequence.

In this talk, we discuss this approach and our experience with it.

Joint work with Ben Kimia, Daniel Sharvit, and Srikanta Tirtapura

**Anand Rangarajan**, Yale University

*Weighted graphs and associated distance measures for non-rigid matching*

We construct a probabilistic generative model for the non-rigid matching of point-sets that are themselves derived via feature extraction on underlying image intensities. Our formulation is explicitly Platonist. Beginning with a Platonist super point-set, we derive real-world point-sets through the application of four operations: i) spline-based warping, ii) addition of noise, iii) point removal and iv) amnesia regarding the point-to-point correspondences between the real-world point-sets and the Platonist source. Given this generative model, we are able to derive new non-quadratic distance measures w.r.t. the ``forgotten'' correspondences by first eliminating the spline parameters from the generative model and then integrating out the Platonist super point-set. The result is a new non-quadratic distance measure which has the interpretation of weighted graph matching. The graphs are related in a straightfoward manner to the spline kernel used for non-rigid warping. Experimentally, we show that the new distance measure outperforms the conventional quadratic assignment distance measure when both distance measures use the same weighted graphs derived from the spline kernel.

**Jianbo Shi**, University of California - Berkeley

*Normalized Cut in Image Segmentation: regions, contours and
cue combination.*

Grouping is a complex problem with many intertwined issues and subgoals. Several key aspects of grouping are: 1) multiple cues are used, and 2) grouping is done through a global decision process-- the Gestaltist uses the notion of ``Pragnanz'' to express the aim of grouping as extracting ``holistic impression'' of an image through global interactions within the visual system, 3) multiple level of interpretation and grouping-- grouping is not just about clustering data.

In this talk, I will address the first two issues in the setting of image segmentation. We view grouping as a process of 1) defining the similarity between the image primitives, and 2) applying a generic grouping engine that groups the image primitive together based on the feature similarity measure.

The proposed grouping engine, Normalized Cuts, formulates grouping as a hierarchical graph partition problem. The input to the algorithm is the similarity measure between the image primitives. Based on the similarity measure, a weighted graph is constructed. The grouping criterion, the normalized cut, measures both the total dissimilarity between the different groups as well as the total similarity within the groups. An efficient algorithm is developed for finding optimal partition by solving a generalized eigenvector system.

The proposed cue combination mechanism is formulated in a similar graph framework. An operational definition of the textons is proposed. A local neighborhood graph is then constructed using Delaunay triangulation of the textons. We apply statistical test for isotropy of Delaunay neighbors to obtained a local applicability measure for each grouping cues, texture and contour. The combined similarity measure is then feed to the normalized cut algorithm to produce the final output.

This is a join work with Jitendra Malik, Serge Belongie and Thomas Leung.

**Ali Shokoufandeh**, Rutgers University

*A Spectral Encoding of Object Part Structure and its Application to
Indexing and Matching*

We have been developing a theory for the generic representation of 2-D shape, in which structural descriptions are derived from the shocks (singularities) of a curve evolution process acting on a silhouette. We now apply the theory to the problems of shape matching and indexing. The shocks are organized into a directed, acyclic shock graph which, in turn, can be reduced to a unique rooted shock tree whose levels are ordered by significance. We first introduce a novel tree indexing algorithm based on an eigenvalue characterization of a shock tree's topological structure. This topological signature, in the form of a vector, is computed at each node of a query tree and used to "vote" for objects having similar local topological structure. From the resulting set of model shock tree candidates, this same topological signature is exploited in a shock tree matching (verification) algorithm that factors in the geometry of the shock tree's nodes. The algorithm computes the distance between two shock trees as well as a set of node correspondences. Using a diverse database of shapes, we demonstrate our system's performance under articulation, occlusion, and changes in viewpoint.

This talk represents joint work with Sven Dickinson (Rutgers University), Kaleem Siddiqi (McGill University), and Steven W. Zucker (Yale University).

**Kaleem Siddiqi**, McGill University

*Matching Hierarchical Structures Using Association Graphs*

It is well known that the problem of matching two relational structures can be posed as an equivalent problem of finding a maximal clique in a (derived) ``association graph.'' However, it is not clear how to apply this approach to computer vision problems where the graphs are hierarchically organized, i.e., are trees, since maximal cliques are not constrained to preserve the partial order. Here we provide a solution to the problem of matching two trees by constructing the association graph using the graph-theoretic concept of connectivity. We prove that in the new formulation there is a one-to-one correspondence between maximal cliques and maximal subtree isomorphisms. This allows us to cast the tree matching problem as an indefinite quadratic program using the Motzkin-Straus theorem, and we use ``replicator'' dynamical systems developed in theoretical biology to solve it. Such continuous solutions to discrete problems are attractive because they can motivate analog and biological implementations. The framework is also extended to the matching of attributed trees by using weighted association graphs. We illustrate the power of the approach by matching articulated and deformed shapes described by shock trees.

Joint work with Marcello Pelillo and Steven W. Zucker

**Olga Veksler**, Cornell University

*Fast Energy Minimization for Computer Vision via Graph Cuts*

Yuri Boykov, Olga Veksler and Ramin Zabih

Computer Science Department, Cornell University

Many problems in computer vision can be naturally described in terms of energy minimization. Unfortunately, minimizing the energy is highly intractable; the energy functions have multiple local minima, and the search space has many thousands of dimensions. I will present some fast algorithms for solving these problems that rely on graph cuts. These algorithms produce answers with quality guarantees: the output is a local minimum even when very large moves are allowed. Experimental stereo results from real data with ground truth suggest that our results are several times more accurate than previous methods.