DIMACS Workshop on Graph Theoretic Methods in Computer Vision
May 20 - 21, 1999
Star Type Graphical Models for Efficient Object Detection
I will describe a simple star type graphical model for detecting objects from a particular class in real gray level images. The algorithm has local and global invariances hard wired in the computation. Training is done by identifying stable local features from a fixed pool at specific locations on registered examples of the object. The local features are themselves represented through star type graphs in terms of coarse oriented edge detectors. I will show examples in face detection, detection of 2d shapes, and 3d rigid objects. A parallel implementation of the algorithm will be discussed as well.
Kim L. Boyer, The Ohio State University
Graph-Theoretical Methods in Computer Vision: Issues in Inexact Matching and the Use of Spectra
In this talk we shall present some issues in the use of graph-theoretical methods in computer vision. Among the themes we explore are the extensions of graphical formalisms to (random) parametric structural descriptions, a form of attributed, possibly directed, hypergraph on which a great deal of information can be stored, and methods for combining the descriptive power of graph theory with the principled methods for handling inexactness offered by probabilistic approaches, including maximum likelihood estimation and Bayesian decision theory.
We outline our use of these notions in such problems as stereo correspondence, spatial data registration, the organization of large structural modelbases, and related issues in object recognition. Then, we intend to pose some questions related to these extensions. For instance, can the notion of a graph spectrum be extended to random parametric structural decriptions and, more importantly, used effectively to yield faster matching algorithms, or to guide graph clustering? To close, we present a counting problem that has puzzled the speaker (on and off) over the past 16 years. This problem arose from a stripe indexing scheme in a color-encoded structured light system developed in 1984.
Automatic Landmark Acquisition as a Graph Decomposition Problem
This talk addresses the problem of automatically acquiring an optimal set of visual landmarks by which a mobile robot can visually navigate its environment. During a training phase, the mobile robot visits a discrete set of positions that cover the free space of the floor, acquiring a number of images at each position. These positions become the nodes in a graph, while the visible features at a given node form a binary-valued feature visibility vector. A "good" landmark is one that is both unique and widely visible. In terms of our graph, finding the optimal set of landmarks reduces to finding a partition of the graph into connected blobs, such that for a given blob, the same set of k features is visible from every node in the blob. This set of k features, representing the visual landmark, is exploited by a view-based recognition srategy, called Linear Combination of Views (due to Basri and Ullman), that allows the robot to localize itself with respect to the landmarks. In this talk, I will present a formulation of the problem, analyze its computational complexity, and descibe some preliminary algorithms for solving the problem. This is joint work with Sven Dickinson and Ali Shokoufandeh, Rutgers University, and David Rosenberg, Yale University.
Efficiently computing a good segmentation
This talk is concerned with the problem of segmenting an image into regions, using a local measure of the difference between image pixels. We develop a general framework for a broad range of segmentation problems, based on pairwise comparison of regions in a segmentation. This framework provides precise definitions of when a segmentation is too coarse or too fine. Within this framework, we define a particular pairwise region comparison function for graph-based segmentation problems. Then we provide an efficient algorithm for computing a segmentation using this comparison function, and prove that it produces good segmentations -- those that are neither too coarse nor too fine by our definitions. We apply this algorithm to image segmentation. An important characteristic of this method is its ability to preserve detail in low-variability image regions while ignoring detail in high-variability regions. We illustrate the method with several examples on both real and sythetic images.
Edwin R. Hancock, University of York
Embedding Relational Constraints in the EM Algorithm
This talk describes a new framework for object alignment. The novel
feature is to unify
the tasks of estimating transformation geometry and identifying
point-correspondence matches. Unification is realised by
constructing a mixture model over the bi-partite graph representing the
correspondence match and by effecting optimisation
using the EM algorithm. According to our EM framework the probabilities of
structural correspondence gate contributions to the
expected likelihood function used to
estimate maximum likelihood transformation parameters.
These gating probabilities measure the consistency
of the matched neighbourhoods in the graphs.
The recovery of transformational
geometry and hard correspondence matches are interleaved
and are realised by applying coupled update operations
to the expected log-likelihood function. In this
way the two processes bootstrap one-another. This provides a means of
rejecting structural outliers. We evaluate the technique
on several real-world problems.
Daniel Huttenlocher, Cornell University
Recognizing Flexible Objects
This talk will present an approach to the problem of recognizing
flexible objects, such as people and many animals, that can be modeled
as a collection of rigid subparts. We use a two-dimensional
representation of an object, in which models for the individual parts
are connected together in a tree structure. Each connection in the
tree encodes relationships between a pair of parts, such as their
relative appearance and location. We then formulate the recognition
problem as that of finding the best configuration of the parts given a
particular model and image, using maximum a posteriori (MAP)
estimation.
The central result of this work is that for such tree-structured
object models the MAP estimate can be found efficiently. For m parts
each with n possible locations, there are O(n^m) possible
configurations of the parts. For tree-structured object models, a
Viterbi style search yields an O(n^2m) algorithm for the optimal
configuration under MAP estimation. This is still impractical because
the number of locations, n, is generally quite large. We introduce a
novel generalization of distance transforms that reduces this time to
O(nm). We illustrate our approach using a simple generic model of a
person, and show results of finding people in complex scenes under a
range of lighting conditions and in varied poses. The method is quite
practical, our implementation runs in just a few seconds on a desktop
workstation.
This is joint work with Pedro Felzenszwalb
Embedded graph algorithms for computer vision
I will discuss two applications of graph algorithms to
vision problems where a graph is embedded in Euclidean space and
thereby endowed with a geometric interpretation that is useful to the
problem.
Even though a graph is "one-dimensional," or rather, does not have any
geometry, embedding it into higher dimensional space enables us to apply
efficient graph algorithms such as maximum flow and minimum mean weight
cycle algorithms to globally solve optimization problems on higher
dimensional objects as hypersurface and region integral.
Also, I will show how directed graphs are more natural than undirected
graphs and in some applications essential.
This is joint work with Ian Jermyn and Davi Geiger
Philip Klein, Brown University
Comparing shapes using edit-distance on shock graphs
Shapes can be represented by {\em shock graphs}, skeletons whose nodes
and edges are labeled with numerical information describing, e.g.,
curvature. The idea of using hierarchies of shocks to represent shape
was developed by Ben Kimia. He and collaborators have studied methods
for estimating the visual distance between shapes by estimating the
distance between the corresponding shock graphs (using a heuristic for
an NP-complete nonlinear program).
We observe that for shapes defined by closed simple curves, the
corresponding graphs are in fact planar embedded trees. We have
developed a set of primitive transformations on such graphs, and have
defined the distance between two such graphs as the minimum cost of a
sequence of transformations that transforms one graph into the other.
We have also developed a polynomial-time algorithm to find the
minimum-cost sequence.
In this talk, we discuss this approach and our experience with it.
Joint work with Ben Kimia, Daniel Sharvit, and Srikanta Tirtapura
Anand Rangarajan, Yale University
Weighted graphs and associated distance measures for non-rigid matching
We construct a probabilistic generative model for the non-rigid
matching of point-sets that are themselves derived via feature
extraction on underlying image intensities. Our formulation is explicitly
Platonist. Beginning with a Platonist super point-set, we derive
real-world point-sets through the application of four operations: i)
spline-based warping, ii) addition of noise, iii) point removal and
iv) amnesia regarding the point-to-point correspondences between the
real-world point-sets and the Platonist source. Given this generative
model, we are able to derive new non-quadratic distance measures
w.r.t. the ``forgotten'' correspondences by first eliminating the
spline parameters from the generative model and then integrating
out the Platonist super point-set. The result is a new non-quadratic
distance measure which has the interpretation of weighted graph
matching. The graphs are related in a straightfoward manner to the
spline kernel used for non-rigid warping. Experimentally, we show that
the new distance measure outperforms the conventional quadratic
assignment distance measure when both distance measures use the same weighted
graphs derived from the spline kernel.
Jianbo Shi, University of California - Berkeley
Normalized Cut in Image Segmentation: regions, contours and
cue combination.
Grouping is a complex problem with many intertwined issues and
subgoals. Several key aspects of grouping are: 1) multiple cues are
used, and 2) grouping is done through a global decision process--
the Gestaltist uses the notion of ``Pragnanz'' to express the aim of
grouping as extracting ``holistic impression'' of an image through global
interactions within the visual system, 3) multiple level of
interpretation and grouping-- grouping is not just about clustering data.
In this talk, I will address the first two issues in the setting of image
segmentation. We view grouping as a process of 1)
defining the similarity between the image primitives, and 2) applying
a generic grouping engine that groups the image primitive
together based on the feature similarity measure.
The proposed grouping engine, Normalized Cuts, formulates grouping
as a hierarchical graph partition problem. The input to the algorithm
is the similarity measure between the image primitives. Based on the
similarity measure, a weighted graph is constructed. The grouping
criterion,
the normalized cut, measures both the total dissimilarity between
the different groups as well as the total similarity within the
groups. An efficient algorithm is developed for finding optimal
partition by solving a generalized eigenvector system.
The proposed cue combination mechanism is formulated
in a similar graph framework. An operational definition of the textons
is proposed. A local neighborhood graph is then constructed using
Delaunay triangulation of the textons. We apply statistical test for
isotropy of Delaunay neighbors to obtained a local applicability measure for each
grouping cues, texture and contour. The combined similarity measure
is then feed to the normalized cut algorithm to produce the final output.
This is a join work with Jitendra Malik, Serge Belongie and Thomas Leung.
Ali Shokoufandeh, Rutgers University
A Spectral Encoding of Object Part Structure and its Application to
Indexing and Matching
We have been developing a theory for the generic representation of 2-D
shape, in which structural descriptions are derived from the shocks
(singularities) of a curve evolution process acting on a
silhouette. We now apply the theory to the problems of shape matching
and indexing. The shocks are organized into a directed, acyclic shock
graph which, in turn, can be reduced to a unique rooted shock tree
whose levels are ordered by significance. We first introduce a novel
tree indexing algorithm based on an eigenvalue characterization of a
shock tree's topological structure. This topological signature, in the
form of a vector, is computed at each node of a query tree and used to
"vote" for objects having similar local topological structure. From
the resulting set of model shock tree candidates, this same
topological signature is exploited in a shock tree matching
(verification) algorithm that factors in the geometry of the shock
tree's nodes. The algorithm computes the distance between two shock
trees as well as a set of node correspondences. Using a diverse
database of shapes, we demonstrate our system's performance under
articulation, occlusion, and changes in viewpoint.
This talk represents joint work with Sven Dickinson (Rutgers
University), Kaleem Siddiqi (McGill University), and Steven W. Zucker
(Yale University).
Kaleem Siddiqi, McGill University
Matching Hierarchical Structures Using Association Graphs
It is well known that the problem of matching two relational
structures can be posed as an equivalent problem of finding a maximal
clique in a (derived) ``association graph.'' However, it is not clear
how to apply this approach to computer vision problems where the
graphs are hierarchically organized, i.e., are trees, since maximal
cliques are not constrained to preserve the partial order. Here we
provide a solution to the problem of matching two trees by
constructing the association graph using the graph-theoretic concept
of connectivity. We prove that in the new formulation there is a
one-to-one correspondence between maximal cliques and maximal subtree
isomorphisms. This allows us to cast the tree matching problem as an
indefinite quadratic program using the Motzkin-Straus theorem, and we
use ``replicator'' dynamical systems developed in theoretical biology
to solve it. Such continuous solutions to discrete problems are
attractive because they can motivate analog and biological
implementations. The framework is also extended to the matching of
attributed trees by using weighted association graphs. We illustrate
the power of the approach by matching articulated and deformed shapes
described by shock trees.
Joint work with Marcello Pelillo and Steven W. Zucker
Olga Veksler, Cornell University
Fast Energy Minimization for Computer Vision via Graph Cuts
Yuri Boykov, Olga Veksler and Ramin Zabih
Many problems in computer vision can be naturally described in terms of
energy minimization. Unfortunately, minimizing the energy is highly
intractable; the energy functions have multiple local minima, and the
search space has many thousands of dimensions. I will present some fast
algorithms for solving these problems that rely on graph cuts. These
algorithms produce answers with quality guarantees: the output is a local
minimum even when very large moves are allowed. Experimental stereo results
from real data with ground truth suggest that our results are several times
more accurate than previous methods.
Computer Science Department, Cornell University