1. Computational Mechanisms for Visual Grouping Jitendra Malik, University of California - Berkeley Early visual processing can be modeled as a two stage process. The first stage is a measurement stage carried out with spatiotemporal receptive fields tuned to orientation, spatial frequency, opponent color, and short-range motion. In mathematical terms, this stage can be modeled quite effectively using tools of linear system analysis and wavelet theory. The second stage is a grouping stage resulting in the formation of regions of coherent brightness, color and texture. Call these 'proto-surfaces'. This has historically proved to to be much more of a challenge and ad hoc techniques dominate the field. We offer a novel approach to the grouping problem by drawing on ideas and techniques from the field of spectral graph theory. Consider the image to be a weighted graph where the nodes are pixels and arcs weights denote a local meaure of similarity between the two pixels. Grouping can then be modeled as the process of finding a partition of the image into regions such that there is high similarity within a region and low similarity across regions. This is made precise as the 'Normalized cut' criterion which can be optimized by solving an eigenvalue problem. The resulting eigenvectors provide a herarchical partitioning of the image into regions ordered according to salience. Brightness, color, texture, motion similarity, proximity and good continuation can all be encoded into this framework. We show results on complex images of natural scenes which demonstrate the significant superiority of this technique over approaches such as those based on edge detection, Markov Random fields, and variational formulations. This talk is based on joint work with Jianbo Shi, Serge Belongie and Thomas Leung. 2. Fast Energy Minimization for Computer Vision via Graph Cuts Ramin Zabih Cornell University Abstract: Many problems in computer vision can be naturally described in terms of energy minimization. Unfortunately, minimizing the energy is highly intractable; the energy functions have multiple local minima, and the search space has many thousands of dimensions. I will present some fast algorithms for solving these problems that rely on graph cuts. These algorithms produce answers with quality guarantees: the output is a local minimum even when very large moves are allowed. Experimental stereo results from real data with ground truth suggest that our results are several times more accurate than previous methods. This is joint work with Yuri Boykov and Olga Veksler. 3. Sampling methods in computer vision David Forsyth UC Berkeley Abstract: Numerous computer vision problems can be phrased as probabilistic inference. I will discuss three examples: colour constancy --- where the aim is to determine surface colour from image colour, whatever the illumination; structure from motion --- where the aim is to impose some camera and shape model on a set of observations of moving points; and finding people --- where the aim is to find image regions that are laid out consistent with human kinematics. In each case, Bayesian methods are attractive, but the posteriors are hopelessly complicated. However, there is a lot of evidence that posteriors have few significant modes, and a set of samples that explores these modes is an attractive representation that makes it possible to integrate new information by resampling the samples. For example, in colour constancy, we can now express the effect of observations of one surface under many lights on the inferred colour of other surfaces seen with it. While good samplers are difficult to build, I will show results that suggest that it is useful to phrase these problems in terms of inference and sampling. 4. Strategies for Taming the Combinatorially Explosive Search in Vision David Mumford Brown University Abstract: Why is vision so hard? I believe it results from the scale-invariance of images, the fact that images are cluttered with objects and their parts at all scales. The result is that there is no way to separate low and high level vision or to segment images correctly without using high level knowledge. This drives one towards pyramid based algorithms in which both high level and low level data structures are manipulated, in which parse trees assembling objects are embedded. In such a complex setting, it seems that simple optimization ideas like dynamic programming or Monte Carlo Markov Chains are hopeless. Fortunately, a new idea, drawing the best from genetic algorithms and giving them a principled statistical base, has emerged: "particle filtering" or CONDENSATION. We want to sketch how this might be adapted to a pyramid architecture, incorporating the context-sensitive grammatical ideas of Geman and Potter. 5. Probabilistic Motion Planning Using Visibility Based Roadmaps Jean-Paul Laumond LAAS Abstract: This talk will present Move3D, a software platform dedicated to motion planning for various kinds of holonomic and nonholonomic mechanical systems. The plateform integrates a motion planner based on probabilistic roadmaps introduced by Kavraki and Latombe and by Svestka and Overmars few years ago. The originality of the roadmap consists in capturing the connectivity of the collision-free configuration space by choosing randomly the so called "guard configurations" (a guard configuration does not "see" any other one). The interest of the approch w.r.t. other approaches is analyzed at both formal and practical levels. Various examples ranging from classical manipulators to nonholonomic mobile manipulators via free-flying objects, rolling bridges... will support the presentation. Joint work with C. Nissoux and T. Simeon. 6. Towards Planning for Elastic Objects Lydia Kavraki Rice University Applications such as robot programming, design for manufacturing, animation of digital actors, rational drug design, or surgical planning, require computing paths in high-dimensional geometric spaces, a provably hard problem. Realistic treatment of the above applications also requires accounting for the physical properties of the moving objects such as their elasticity. In this talk, I will describe a path planner that is capable of computing paths for a thin elastic metal plate. The underlying geometric model for the plate is augmented by an accurate mechanical model. The latter permits the computation of the shape of the plate with respect to a set of grasping constraints by minimizing the elastic energy function of the deformation. A probabilistic roadmap planner is used to compute paths for the plate. The planner captures the connectivity of a space by building a roadmap, a network of simple paths connecting configurations which are selected using randomized techniques. Planning queries can then be answered by connecting the initial and final configurations of the robot to two nodes of the roadmap and computing a path through the roadmap between these two nodes. Several experimental results will be shown demonstrating paths for an elastic plate manipulated in a constrained environment and new insights on probabilistic roadmaps will be discussed. 7. Generic Shape Indexing and Matching for 2-D Object Recognition Sven Dickinson Rutgers University We have been developing a theory for the generic representation of 2-D shape, in which structural descriptions are derived from the shocks (singularities) of a curve evolution process acting on a silhouette. We now apply the theory to the problems of shape matching and indexing. The shocks are organized into a directed, acyclic shock graph which, in turn, can be reduced to a unique rooted shock tree whose levels are ordered by significance. We first introduce a novel tree indexing algorithm based on an eigenvalue characterization of a shock tree's topological structure. This topological signature, in the form of a vector, is computed at each node of a query tree and used to "vote" for objects having similar local topological structure. From the resulting set of model shock tree candidates, this same topological signature is exploited in a shock tree matching (verification) algorithm that factors in the geometry of the shock tree's nodes. The algorithm computes the distance between two shock trees as well as a set of node correspondences. Using a diverse database of shapes, we demonstrate our system's performance under articulation, occlusion, and changes in viewpoint. This talk represents joint work with Ali Shokoufandeh (Rutgers University), Kaleem Siddiqi (McGill University), and Steven W. Zucker (Yale University). 8. Recognition under Variable Illumination: Issues and Problems David Kriegman University of Illinois Abstract: The appearance of an object can vary significantly due to changes in pose, changes in illumination (spectral distribution, source distribution, and strength) as well as changes to the object's state (articulations, deformations, reflectance changes). Yet in the problem of object recognition, the goal is to classify a subset of the image as corresponding to some object in a model database in the presence of this wide range of image variation. One approach to modelling objects for recognition is to obtain sample images over the full range of variation. One could then build upon standard pattern classification techniques (e.g. nearest neighbor, estimating class conditional densities, Bayes classification, etc.) to perform recognition. Unfortunately, both the dimensionality of images and the dimensionality of the variation are sufficiently large that the number of training images required would be enormous; so one must be skeptical about the success of a purely data-driven approach. Alternatively, some, but not all, of the variation (e.g. that due to pose and illumination) can be very well modeled using geometry and physics, and representations of objects can be constructed from image data which can then be used to predict the image variation. The challenge is to construct sufficiently accurate and expressive representations from modest amounts of image data. In this talk, we will focus on the variation due to lighting, and summarize a series of results obtained over the last few years in characterizing the set of images of an object in fixed pose but under a range of lighting conditions (color, light source location, number and distribution of sources, etc.) We have developed recognition algorithms based on this characterization, and experimental results in face recognition will be described. A number of open problems concerned with modelling the set of images under variable illumination will be presented, and issues concerned with other sources of image variation will be discussed. This is joint work with P. Belhumeur, A. Georgiades, A. Yuille. 9. Template-Based Indexing for Vision and Robotics Martial Hebert Carnegie Mellon University Abstract: Using image templates for image-based object recognition is an attractive approach in computer vision for robotics application. Many challenges stand in the way of the practical use of this class of techniques. In particular, geometric and photometric variation make it difficult to represent templates. The situation is complicated by the fact that we may want to discrimante among a large library of objects. In this presentation, I will describe techniques that permit both time and space-efficient template-based indexing. The approach starts with a new approach for the efficient computation of steerable filters which leads to an efficient representation of image templates. This representation is based on recent theoretical results in the community, as well as a new approach to large dimensional SVD problems. Given indexes built from this representation, I will then describe an approach to indexing that permits good time performance, even with large data sets. Finally, I will discuss ideas on the trade-off between modeling (memory) and recognition (speed). I will also show how the general notion of template-based indexing can be used effectively for 3-D surfaces, i.e., comparing surfaces instead of images. In particular, by using the appropriate local surface representation, it is possible to use memory-based adaptation techniques to improve recognition performance over time. Examples from industrial applications of robotics systems will be shown. 10. Kinetic Data Structures for Optimization Leonidas J. Guibas Stanford University Computer systems commonly cache the values of variables to gain efficiency. In applications where the goal is to track attributes of a continuously moving or deforming physical system over time, caching relations between variables works better than caching individual values. The reason is that, as the system evolves, such relationships are more stable than the values of individual variables. Kinetic data structures (KDSs) are a novel formal framework for designing and analyzing sets of assertions to cache about the environment, so that these assertion sets are at once relatively stable and tailored to facilitate or trivialize the computation of the attribute of interest. Formally, a KDS is a mathematical proof animated through time, proving the validity of a certain computation for the attribute of interest. KDSs have rigorous associated measures of performance and their design shares many qualities with that of classical data structures. The KDS framework has led to new and promising algorithms in applications such as collision detection for moving rigid and deformable bodies, optimal network connectivity among mobile workstations, and visibility/occlusion maintenance. In particular, the KDS framework can be adapted to track optimal solutions to combinatorial optimization problems with continuously varying data. This talk will survey the general ideas behind KDSs and illustrate their application to some simple optimization problems. 11. Differential Algebra & Robot Control Bud Mishra New York University Abstract: In this talk, we shall examine an interesting connection between the classical Ritt-Kolchin differential algebra and several control theoretic problems in robotics. However, unlike the tremendous progress in computational commutative algebra and algebraic geometry and their ready applications to mechanism design, kinematics and motion planning, a similar success story in this domain remains largely elusive. We shall attempt to explain why and where efforts need to be focused. 12. Fences and Traps: Geometric Algorithms for Part Feeding Mark Overmars Utrecht University Abstract: Part feeding plays a crucial role in many industrial applications. When assembling products, parts have to be presented to the assembly system in a particular orientation. Many devices are being used to orient part in such a required pose. Typical devices include robotic systems with cameras, conveyor belts with fences, vibratory bowlfeeders, etc. In this talk I will outline geometric problems when designing such devices and I will present geometric algorithms to solve them. 13. Capture Regions for Grasping, Manipulating and Re-Orienting Parts Jean Ponce University of Illinois Abstract: This talk addresses the problem of manipulating an object in the plane using a succession of contacts with simple robotic devices that may have discrete or continuous degrees of freedom. The problem of planning the overall object trajectory is reduced to constructing a sequence of atomic robot motions between equilibrium states of the system. The proposed approach is based on constructing the capture region of each equilibrium, i.e., the maximal subset of the object's configuration space such that any motion starting within this subset is guaranteed to end up in the equilibrium configuration. In particular, we do not assume that contact is maintained during the execution of the manipulation task; nor do we attempt to predict the object motion between equilibria. This approach will be applied to three problems: (1) grasping and in-hand manipulation in the presence of frictional effects and large position uncertainties, (2) motion planning for mobile robots pushing an object while avoiding obstacles, and (3) re-orienting parts by dropping them on a simple mechanism reminiscent of Pachinko machines. I will present simulation results as well as preliminary experiments with Nomadic Super Scout II mobile robots and prototypes of the reconfigurable gripper and of the Pachinko machine developed in our laboratory. 14. Specifying and Performing Hand-Eye Tasks with Uncertain Camera Models Gregory D. Hager Yale University Abstract: Most of the work in robotic manipulation and visual servoing has emphasized HOW to specify and perform particular tasks from image information. However, an important precursor is to first understand WHAT tasks a given vision-based system can perform with predictable accuracy. This leads to the following question: ``When is it possible to decide, on the basis of the information contained in the images of an imprecisely modeled two-camera vision system, whether or not a prescribed robot positioning task has been precisely accomplished?'' Results relating to this question are now known for for several camera model classes including purely injective cameras, weakly calibrated projective cameras, and uncalibrated projective cameras. The above characterizations are "non-constructive" -- that is, they do not suggest how a feedback system that realizes a given task might be constructed. We have recently developed constructive characterizations of the set of tasks that can be performed under different types of camera model uncertainty. We argue that the resulting structure provides a principled foundation both for a specification language and for automatic execution monitoring in uncalibrated environments. These two system components are prerequisites for extending existing feature-centered approaches to manipulation into a more powerful and intuitive object-centered architecture for hand/eye coordination. 15. Reconstructing Cameras and Scene Geometry from Multiple Views Andrew Zisserman Oxford University A classical problem in Computer Vision is that of structure-and-motion recovery from images: given a stream of images determine aspects of the camera motion, the camera calibration and the scene geometry. Solutions to this problem have now been developed for quite general motions and scenes. The talk will first outline a solution to this problem and illustrate how the recovered structure and motion may be used as a basis for building 3D graphical models from an image sequence. Then the advantages that accrue from special motions will be described. Special motions are readily available in robotic related applications. It will be demonstrated that the recovery problem is considerably simplified in such cases. For example, if the camera rotates about a single axis then the reconstruction ambiguity is reduced to a two parameter family with rotation angles determined unambiguously, and accuracy is an order of magnitude better than for general motion. 16. Human Visual Space Jan J. Koenderink Utrecht University Abstract: Human visual space is a collective term for a number of different competences. I will discuss some recent results from psychophysics on various aspects (pictorial relief, the space we move in, the shape of objects) and speculate on the nature of the mechanisms involved. One intriguing aspect of human visual space is that it is like a quilt of mutually rather independent competences that often entertain conflicting representations. Since we have few complaints concerning the coherence of our actions (at least on the short term) this poses some problems as to how such coherence might come about.Other Workshops