DIMACS Workshop on Large Scale Discrete Optimization in Robotics and Vision

March 21 - 23, 1999
DIMACS Center, CoRE Building (Room 431), Rutgers University, Busch Campus, Piscataway, NJ


Presented under the auspices of the Special Year on Large Scale Discrete Optimization



Computational Mechanisms for Visual Grouping
Jitendra Malik, University of California - Berkeley

Early visual processing can be modeled as a two stage process. The first
stage is a measurement stage carried out with spatiotemporal receptive
fields tuned to orientation, spatial frequency, opponent color, and
short-range motion. In mathematical terms, this stage can be modeled
quite effectively using tools of linear system analysis and wavelet
theory. The second stage is a grouping stage resulting in the
formation of regions of coherent brightness, color and texture. Call
these 'proto-surfaces'.  This has historically proved to to be much
more of a challenge and ad hoc techniques dominate the field. We offer
a novel approach to the grouping problem by drawing on ideas and
techniques from the field of spectral graph theory. Consider the image
to be a weighted graph where the nodes are pixels and arcs weights
denote a local meaure of similarity between the two pixels.  Grouping
can then be modeled as the process of finding a partition of the image
into regions such that there is high similarity within a region and
low similarity across regions. This is made precise as the 'Normalized
cut' criterion which can be optimized by solving an eigenvalue
problem. The resulting eigenvectors provide a herarchical partitioning
of the image into regions ordered according to salience. Brightness,
color, texture, motion similarity, proximity and good continuation can
all be encoded into this framework. We show results on complex images
of natural scenes which demonstrate the significant superiority of
this technique over approaches such as those based on edge detection,
Markov Random fields, and variational formulations.

This talk is based on joint work with Jianbo Shi, Serge Belongie and
Thomas Leung.


Fast Energy Minimization for Computer Vision via Graph Cuts

Ramin Zabih
Cornell University

Abstract: Many problems in computer vision can be naturally described
in terms of energy minimization.  Unfortunately, minimizing the energy
is highly intractable; the energy functions have multiple local
minima, and the search space has many thousands of dimensions.  I will
present some fast algorithms for solving these problems that rely on
graph cuts.  These algorithms produce answers with quality guarantees:
the output is a local minimum even when very large moves are allowed.
Experimental stereo results from real data with ground truth suggest
that our results are several times more accurate than previous

This is joint work with Yuri Boykov and Olga Veksler.


Sampling methods in computer vision

David Forsyth
UC Berkeley

Abstract: Numerous computer vision problems can be phrased as
probabilistic inference.  I will discuss three examples: colour
constancy --- where the aim is to determine surface colour from image
colour, whatever the illumination; structure from motion --- where the
aim is to impose some camera and shape model on a set of observations
of moving points; and finding people --- where the aim is to find
image regions that are laid out consistent with human kinematics.

In each case, Bayesian methods are attractive, but the posteriors are
hopelessly complicated.  However, there is a lot of evidence that
posteriors have few significant modes, and a set of samples that
explores these modes is an attractive representation that makes it
possible to integrate new information by resampling the samples. For
example, in colour constancy, we can now express the effect of
observations of one surface under many lights on the inferred colour
of other surfaces seen with it.  While good samplers are difficult to
build, I will show results that suggest that it is useful to phrase
these problems in terms of inference and sampling.


Strategies for Taming the Combinatorially Explosive Search in Vision

David Mumford
Brown University

Abstract: Why is vision so hard? I believe it results from the
scale-invariance of images, the fact that images are cluttered with
objects and their parts at all scales. The result is that there is no
way to separate low and high level vision or to segment images correctly
without using high level knowledge. This drives one towards pyramid
based algorithms in which both high level and low level data structures
are manipulated, in which parse trees assembling objects are embedded.
In such a complex setting, it seems that simple optimization ideas like
dynamic programming or Monte Carlo Markov Chains are hopeless.
Fortunately, a new idea, drawing the best from genetic algorithms and
giving them a principled statistical base, has emerged: "particle
filtering" or CONDENSATION. We want to sketch how this might be adapted
to a pyramid architecture, incorporating the context-sensitive
grammatical ideas of Geman and Potter.


Probabilistic Motion Planning Using Visibility Based Roadmaps

Jean-Paul Laumond

Abstract: This talk will present Move3D, a software platform dedicated
to motion planning for various kinds of holonomic and nonholonomic
mechanical systems. The plateform integrates a motion planner based on
probabilistic roadmaps introduced by Kavraki and Latombe and by
Svestka and Overmars few years ago. The originality of the roadmap
consists in capturing the connectivity of the collision-free
configuration space by choosing randomly the so called "guard
configurations" (a guard configuration does not "see" any other
one). The interest of the approch w.r.t. other approaches is analyzed
at both formal and practical levels. Various examples ranging from
classical manipulators to nonholonomic mobile manipulators via
free-flying objects, rolling bridges... will support the presentation.

Joint work with C. Nissoux and T. Simeon.


Towards Planning for Elastic Objects

Lydia Kavraki
Rice University

Applications such as robot programming, design for manufacturing, 
animation of digital actors, rational drug design, or surgical
planning, require computing paths in high-dimensional geometric
spaces, a provably hard problem. Realistic treatment of the above
applications also requires accounting for the physical properties
of the moving objects such as their elasticity. In this talk, I will
describe a path planner that is capable of computing paths for a
thin elastic metal plate. The underlying geometric model for the
plate is augmented by an accurate mechanical model. The latter permits
the computation of the shape of the plate with respect to a set of
grasping constraints by minimizing the elastic energy function of the
deformation. A probabilistic roadmap planner is used to compute paths
for the plate. The planner captures the connectivity of a space by
building a roadmap, a network of simple paths connecting configurations
which are selected using randomized techniques. Planning queries can
then be answered by connecting the initial and final configurations of
the robot to two nodes of the roadmap and computing a path through the
roadmap between these two nodes. Several experimental results will be
shown demonstrating paths for an elastic plate manipulated in a
constrained environment and new insights on probabilistic roadmaps
will be discussed.


Generic Shape Indexing and Matching for 2-D Object Recognition

Sven Dickinson
Rutgers University

We have been developing a theory for the generic representation of 2-D
shape, in which structural descriptions are derived from the shocks
(singularities) of a curve evolution process acting on a
silhouette. We now apply the theory to the problems of shape matching 
and indexing. The shocks are organized into a directed, acyclic shock
graph which, in turn, can be reduced to a unique rooted shock tree
whose levels are ordered by significance. We first introduce a novel
tree indexing algorithm based on an eigenvalue characterization of a
shock tree's topological structure. This topological signature, in the
form of a vector, is computed at each node of a query tree and used to
"vote" for objects having similar local topological structure. From
the resulting set of model shock tree candidates, this same  
topological signature is exploited in a shock tree matching
(verification) algorithm that factors in the geometry of the shock
tree's nodes.  The algorithm computes the distance between two shock
trees as well as a set of node correspondences.  Using a diverse
database of shapes, we demonstrate our system's performance under
articulation, occlusion, and changes in viewpoint. 

This talk represents joint work with Ali Shokoufandeh (Rutgers
University), Kaleem Siddiqi (McGill University), and Steven W. Zucker
(Yale University). 


Recognition under Variable Illumination: Issues and Problems

David Kriegman
University of Illinois

Abstract: The appearance of an object can vary significantly due to changes in
pose, changes in illumination (spectral distribution, source
distribution, and strength) as well as changes to the object's state
(articulations, deformations, reflectance changes).  Yet in the
problem of object recognition, the goal is to classify a subset of the
image as corresponding to some object in a model database in the
presence of this wide range of image variation.  One approach to
modelling objects for recognition is to obtain sample images over the
full range of variation. One could then build upon standard pattern
classification techniques (e.g. nearest neighbor, estimating class
conditional densities, Bayes classification, etc.) to perform
recognition.  Unfortunately, both the dimensionality of images and the
dimensionality of the variation are sufficiently large that the number
of training images required would be enormous; so one must be
skeptical about the success of a purely data-driven approach.

Alternatively, some, but not all, of the variation (e.g. that due to
pose and illumination) can be very well modeled using geometry and
physics, and representations of objects can be constructed from image
data which can then be used to predict the image variation.  The
challenge is to construct sufficiently accurate and expressive
representations from modest amounts of image data.  In this talk, we
will focus on the variation due to lighting, and summarize a series of
results obtained over the last few years in characterizing the set of
images of an object in fixed pose but under a range of lighting
conditions (color, light source location, number and distribution of
sources, etc.)  We have developed recognition algorithms based on this
characterization, and experimental results in face recognition will be
described.  A number of open problems concerned with modelling the set
of images under variable illumination will be presented, and issues
concerned with other sources of image variation will be discussed. 

This is joint work with P. Belhumeur, A. Georgiades, A. Yuille.


Template-Based Indexing for Vision and Robotics

Martial Hebert
Carnegie Mellon University

Abstract: Using image templates for image-based object recognition
is an attractive approach in computer vision for robotics
application. Many challenges stand in the way of the
practical use of this class of techniques. In particular,
geometric and photometric variation make it difficult
to represent templates. The situation is complicated by the fact
that we may want to discrimante among a large library
of objects. 

In this presentation, I will describe techniques that permit both time
and space-efficient template-based indexing. The approach starts with
a new approach for the efficient computation of steerable filters
which leads to an efficient representation of image templates.  This
representation is based on recent theoretical results in the
community, as well as a new approach to large dimensional SVD

Given indexes built from this representation, I will then describe an
approach to indexing that permits good time performance, even with
large data sets.  Finally, I will discuss ideas on the trade-off
between modeling (memory) and recognition (speed).

I will also show how the general notion of template-based indexing can
be used effectively for 3-D surfaces, i.e., comparing surfaces instead
of images.  In particular, by using the appropriate local surface
representation, it is possible to use memory-based adaptation
techniques to improve recognition performance over time. Examples from
industrial applications of robotics systems will be shown.


Kinetic Data Structures for Optimization
Leonidas J. Guibas
Stanford University
Computer systems commonly cache the values of variables to gain
efficiency. In applications where the goal is to track attributes of a
continuously moving or deforming physical system over time, caching
relations between variables works better than caching individual
values. The reason is that, as the system evolves, such relationships
are more stable than the values of individual variables.

Kinetic data structures (KDSs) are a novel formal framework for
designing and analyzing sets of assertions to cache about the
environment, so that these assertion sets are at once relatively
stable and tailored to facilitate or trivialize the computation of the
attribute of interest. Formally, a KDS is a mathematical proof
animated through time, proving the validity of a certain computation
for the attribute of interest. KDSs have rigorous associated measures
of performance and their design shares many qualities with that of
classical data structures.

The KDS framework has led to new and promising algorithms in
applications such as collision detection for moving rigid and
deformable bodies, optimal network connectivity among mobile
workstations, and visibility/occlusion maintenance. In particular, the
KDS framework can be adapted to track optimal solutions to
combinatorial optimization problems with continuously varying data.
This talk will survey the general ideas behind KDSs and illustrate
their application to some simple optimization problems.


Differential Algebra & Robot Control

Bud Mishra
New York University

Abstract: In this talk, we shall examine an interesting connection
between the classical Ritt-Kolchin differential algebra and several
control theoretic problems in robotics.

However, unlike the tremendous progress in computational commutative
algebra and algebraic geometry and their ready applications to
mechanism design, kinematics and motion planning, a similar success
story in this domain remains largely elusive. We shall attempt to
explain why and where efforts need to be focused.


Fences and Traps: Geometric Algorithms for Part Feeding

Mark Overmars
Utrecht University

Abstract: Part feeding plays a crucial role in many industrial
applications. When assembling products, parts have to be presented to
the assembly system in a particular orientation. Many devices are
being used to orient part in such a required pose. Typical devices
include robotic systems with cameras, conveyor belts with fences,
vibratory bowlfeeders, etc. In this talk I will outline geometric
problems when designing such devices and I will present geometric
algorithms to solve them.


Capture Regions for Grasping, Manipulating and Re-Orienting Parts

Jean Ponce
University of Illinois

Abstract: This talk addresses the problem of manipulating an object in
the plane using a succession of contacts with simple robotic devices
that may have discrete or continuous degrees of freedom. The problem
of planning the overall object trajectory is reduced to constructing a
sequence of atomic robot motions between equilibrium states of the
system.  The proposed approach is based on constructing the capture
region of each equilibrium, i.e., the maximal subset of the object's
configuration space such that any motion starting within this subset
is guaranteed to end up in the equilibrium configuration. In
particular, we do not assume that contact is maintained during the
execution of the manipulation task; nor do we attempt to predict the
object motion between equilibria. This approach will be applied to
three problems: (1) grasping and in-hand manipulation in the presence
of frictional effects and large position uncertainties, (2) motion
planning for mobile robots pushing an object while avoiding obstacles,
and (3) re-orienting parts by dropping them on a simple mechanism
reminiscent of Pachinko machines. I will present simulation results as
well as preliminary experiments with Nomadic Super Scout II mobile
robots and prototypes of the reconfigurable gripper and of the
Pachinko machine developed in our laboratory.


Specifying and Performing Hand-Eye Tasks with Uncertain Camera Models

Gregory D. Hager
Yale University

Abstract: Most of the work in robotic manipulation and visual servoing
has emphasized HOW to specify and perform particular tasks from image
information.  However, an important precursor is to first understand
WHAT tasks a given vision-based system can perform with predictable
accuracy.  This leads to the following question: ``When is it possible
to decide, on the basis of the information contained in the images of
an imprecisely modeled two-camera vision system, whether or not a
prescribed robot positioning task has been precisely accomplished?''
Results relating to this question are now known for for several camera
model classes including purely injective cameras, weakly calibrated
projective cameras, and uncalibrated projective cameras.

The above characterizations are "non-constructive" -- that is, they do
not suggest how a feedback system that realizes a given task might be
constructed.  We have recently developed constructive
characterizations of the set of tasks that can be performed under
different types of camera model uncertainty. We argue that the
resulting structure provides a principled foundation both for a
specification language and for automatic execution monitoring in
uncalibrated environments. These two system components are
prerequisites for extending existing feature-centered approaches to
manipulation into a more powerful and intuitive object-centered
architecture for hand/eye coordination.


Reconstructing Cameras and Scene Geometry from Multiple Views

Andrew Zisserman
Oxford University

A classical problem in Computer Vision is that of structure-and-motion
recovery from images: given a stream of images determine aspects of
the camera motion, the camera calibration and the scene
geometry. Solutions to this problem have now been developed for quite
general motions and scenes.

The talk will first outline a solution to this problem and illustrate
how the recovered structure and motion may be used as a basis for
building 3D graphical models from an image sequence.  Then the
advantages that accrue from special motions will be described.
Special motions are readily available in robotic related applications.
It will be demonstrated that the recovery problem is considerably
simplified in such cases.  For example, if the camera rotates about a
single axis then the reconstruction ambiguity is reduced to a two
parameter family with rotation angles determined unambiguously, 
and accuracy is an order of magnitude better than for general motion.


Human Visual Space 

Jan J. Koenderink 
Utrecht University

Abstract: Human visual space is a collective term for a number of
different competences. I will discuss some recent results from
psychophysics on various aspects (pictorial relief, the space we move
in, the shape of objects) and speculate on the nature of the
mechanisms involved. One intriguing aspect of human visual space is
that it is like a quilt of mutually rather independent competences
that often entertain conflicting representations. Since we have few
complaints concerning the coherence of our actions (at least on the
short term) this poses some problems as to how such coherence might
come about.

Other Workshops
DIMACS Homepage
Contacting the Center
Document last modified on March 3, 1999.