Joint DIMACS-CMU-Georgia Tech. Workshop on Large Scale Discrete Optimization

May 27-29, 1998
DIMACS Center, CoRE Building, Busch Campus, Rutgers University, Piscataway, NJ

R. Ravi, Carnegie Mellon University
Vijay V. Vazirani, Georgia Tech.
Presented under the auspices of the DIMACS Special Year on Large Scale Discrete Optimization.


May 27th: (Applications)

AM:     George Nemhauser, Georgia Tech. -- Optimization in Airline 
                Integrated Planning and Operations - Part I: Robust Planning  
        Ellis Johnson, Georgia Tech. -- Optimization in Airline 
                Integrated Planning and Operations - Part II: Recovery

TITLE: Optimization in Airline Integrated Planning and Operations
        Part I: Robust Planning, Part II: Recovery

SPEAKERS:  George L. Nemhauser (Part I),  Ellis L. Johnson (Part II)

AFFILIATION: School of Industrial and Systems Engineering
                   Georgia Institute of Technology

ABSTRACT: The major airlines have been pioneers in the use of
optimization for the allocation and scheduling of resources (both
planes and crews).  The formulation and solution of the crew scheduling
problem and partitioning problem goes back to the 1960s.
Moreover, many new ideas for large-scale optimization were first
developed and tested on airlines problems.  But until recently
optimization was used only in planning, and under the assumption that
the printed schedule would be flown without modifications.  In reality,
because of random events such as equipment failures and bad weather,
there are almost always disruptions that cause changes in the schedule.
 Flights may be delayed, rerouted or canceled.  Crews may miss
connections or be illegal for an assigned flight because of government
time restrictions.  Reallocation and rescheduling, which is called the
recovery problem, frequently must be done in almost real time. =20
Research on using optimization to solve the recovery problem is just
emerging.   Essentially, one can view the process as a two-stage
stochastic programming problem  with the recovery decisions being the
recourse to the planning decisions. 

In the first part of this talk we will review the planning process,
including both fleet assignment and crew scheduling, and then discuss
some new approaches to robust planning, i.e. planning that attempts to
be less sensitive to disruptions.  In the second part, we model the
recovery problem as an optimization problem and we also discuss a
simulation that is used to evaluate new ideas and algorithms, and also
can serve as a training tool.
        Jitendra Malik, UC Berkeley -- Graph partitioning and matching
                problems in computer vision 
   TITLE: Graph partitioning and matching problems in computer vision

   SPEAKER: Jitendra Malik
   AFFILIATION: Computer Science, UC Berkeley
   ABSTRACT: to be announced

PM:     Prabhakar Raghavan, IBM Almaden -- Customer segmentation and 
                collaborative filtering: a microeconomic view 

Title: Customer segmentation and collaborative filtering: a microeconomic view

Abstract: In this talk we propose a microeconomic view of data mining,
based on the premise that the value of patterns unearthed from legacy
data is determined by their impact on an enterprise's objective.  This
leads us to optimization as a means for evaluting data mining
operations.  We focus on customer segmenatation, one of the few areas
where data mining has (to date) met with consistent success.  We then
build on the basic framework to address the setting of collaborative
filtering (sometimes also called "recommendation systems"): systems
that, based on the past behavior of a group of users, will recommend
to them books to read, movies to watch, and so on.  This talk will
include work from collaborations with Jon Kleinberg and Christos
Papadimitriou, and with Sridhar Rajagopalan, S. Ravi Kumar and Andrew

        Rajeev Motwani, Stanford University -- High-Dimensional
                Similarity Search in Large Databases   

          High-Dimensional Similarity Search in Large Databases  

                        Rajeev Motwani
                Department of Computer Science
                        Stanford University

The nearest neighbor problem is the following: Given a set P of n points
in some metric space X, preprocess P so as to efficiently answer queries
which require finding the point in P closest to a query point q in X. This
fundamental geometric problem is of growing importance as an abstraction
of similarity search in areas such as information retrieval and image
processing. We will focus on the particularly interesting case of the
d-dimensional Euclidean space. Despite decades of effort, the current
solutions are far from satisfactory; in fact, for large d, in theory or in
practice, they provide little improvement over the brute-force algorithm
which compares the query point to each data point. Of late, there has been
some interest in the approximate nearest neighbors problem where we are
only required to produce a point p' whose distance from the query q is
within a factor (1+e) of the distance between q and its true nearest neighbor.

We will present two algorithmic results for the approximate version that
significantly improve the known bounds: (a) preprocessing cost polynomial
in n and d, and a truly sublinear query time (for e>1); and, (b) query
time polynomial in d and log n, and only a mildly exponential preprocessing
cost O(n/e^d). Applying a classical geometric lemma on random projections,
we show how to obtain the first known algorithm with polynomial preprocessing
and query time polynomial in d and log n. Unfortunately, for small e, the
latter is a purely theoretical result since the exponent depends on 1/e. On
the other hand, experimental results indicate that the first algorithm offers
extremely fast running times on real data sets. Its key ingredient is the
notion of locality-sensitive hashing which may be of independent interest;
in fact, we will give applications to information retrieval, pattern
recognition, dynamic closest-pairs, and fast clustering algorithms.

Time permitting, we will also touch upon a new abstraction of similarity
search that leads to an interesting generalization of the nearest neighbors

(Joint work with Piotr Indyk.)
        Eva Tardos, Cornell Unviersity -- Flows in Networks

TITLE: Flows in Networks

SPEAKER: Eva Tardos
AFFILIATION: Department of Computer Science
Cornell University

In this talk we will consider the disjoint paths problem. This
problem is one of the basic algorithmic problems underlying the management
of high-speed communication networks, and is also one of the oldest and most
the basic algorithmic questions, it is one of Karp's original NP-complete
problems. Given a graph and a set of node-pairs that would like to be
connected by disjoint paths, we can ask a number of natural questions: How
many of the disjoint paths are simultaneously realizable? How many rounds
are required to satisfy all connection requests, when all paths assigned in
single round must be disjoint? All these questions are NP-hard, and hence we
cannot expect to have efficient algorithms for them. In this talk we will
consider approximation algorithms for these questions in different classes
of graphs. We will discuss efficient algorithms that are guaranteed to
provide solutions close to optimal.

May 28th: (Tools)
AM:     Kurt Mehlhorn, Max Planck Institute f"ur Informatik,
                Saarbr"ucken, Germany -- Secondary Memory Algorithms
                for Networks and Geometry  

TITLE: Secondary Memory Algorithms for Networks and Geometry

SPEAKER: Kurt Mehlhorn
AFFILIATION: Max Planck Institute f"ur Informatik, Saarbr"ucken, Germany

ABSTRACT: In the theoretical part of the talk I will report on
secondary memory algorithms for shortest paths and some geometric
tasks (line segment intersection, convex hulls, ...). 

In the experimental part I will report on LEDA-SM, 
an extension of LEDA into the realm of secondary memory algorithms.

        Andrew Goldberg, NEC -- Maximum Flows: New Techniques and New

TITLE: Maximum Flows: New Techniques and New Algorithms

SPEAKER: Andrew V. Goldberg, NEC Research Institute, Inc.


The maximum flow problem is a classical optimization problem with
many applications. Algorithms for this problem have been studied for 
over four decades. Recently, significant improvements have been made 
in theoretical performance of maximum flow algorithms. In this talk 
we put these results in perspective. We also outline major recent
developments, including the use of adaptive length functions and,
for undirected problems with unit capacities, the use of sparsification

        John Hooker, Carnegie Mellon -- Mixed logical/linear programming

TITLE: Mixed logical/linear programming

SPEAKER: John Hooker

AFFILIATION: Carnegie Mellon U

ABSTRACT: I will cover three themes.  a) The mathematical programming
community has made an enormous investment in the use of integer
variables for solving combinatorial problems.  However, a rethinking
and downsizing of their role improves both modeling and solution.
Their main value is in supplying a continuous relaxation, but they can
add a good deal of overhead without providing much in return.  b)
Replacing integer modeling with logic-based modeling forges a link
between math programming and constraint satisfaction and in particular
provides logical analogs of cutting planes, etc.  c) The formulation
of a framework for combining these two approaches raises a fundamental
modeling question: how can a declarative framework allow and even
compel the user to exploit specialized solvers, such as linear
programming solvers?


PM:     Bob Bixby, Rice University -- Recent developments in CPLEX

Title:  Recent Developments in CPLEX


CPLEX 1.0 was released in 1988.  Following that release there were
steady performance improvments in the core simplex and, later, barrier
algorithms for linear optimization, with the last really significant
LP improvments coming in the barrier implementations in CPLEX 4.0.
In this talk I will describe some quite surprising, new performance
improvements to be available in the next released version of CPLEX.
For the barrier algorithms, these improvements apply to models of all
sizes, but are particularly marked for instances in which the basic
"factorization step" is the dominant expense.  For the simplex
implementations, both primal and dual, there will typically not be
much improvement seen on smaller models (with, say, fewer than 5,000
constraints), but for larger models, the improvements are even more
marked than for the barrier algorithms.  These simplex improvements
then translate into further significant improvments in the cost of the
crossover step from a barrier "vector" solution to a simplex basic

This work is based upon a number of new ideas as well as earlier work
of Rothberg, Hendrickson, Wunderling, and others.  For the simplex
method, we were particularly motivated by the idea of concentrating on
larger models -- it is now not unusual for real applications to
generate LPs with several hundreds of thousands of constraints -- and
by the unpublished work of Zonghao Gu.

The work described in this talk is joint with Irv Lustig
(, Ed Rothberg (, and Roland
Wunderling (, all of the CPLEX Division of ILOG.
        Martin Savelsbergh, Georgia Tech. -- PARINO, a PARallel
                INteger Optimizer

 TITLE: PARINO, a PARallel INteger Optimizer
 SPEAKER: Martin W.P. Savelsbergh
 AFFILIATION: School of Industrial and Systems Engineering
                  Georgia Institute of Technology

 ABSTRACT: We describe the design and implementation of PARINO, a
parallel integer optimizer. PARINO is portable across any
message-passing parallel computing platform. It is written in C++ and
developed using an ``entity-FSM''-based structure that greatly
facilitates the modularity and extensibility of the functionality of
the system and its portability to any message-passing interface.  The
current version of PARINO has been used on an 8-node IBM SP2
multicomputer and a cluster of 48 Intel machines, each one of with a
quad PentiumPro running at a clock speed of 200Mhz. It has been able
to solve most of the instances in the MIPLIB test set with linear and
superlinear speedups.  The implementation is based on a distributed
version of the traditional linear programming based branch-and-bound
approach to integer programming. In addition, several state-of-the-art
techniques are incorporated, such as preprocessing and probing,
reduced-cost fixing, global reduced-cost fixing, primal heuristics,
SOS-branching and various types of cutting planes. Cut management is
an important factor that can influence the performance of parallel
mixed integer optimization.  A key contribution of our work is the
development of a novel architecture and associated techniques for
distributed cut management, and addressing the issues arising in its
        Sebastian Ceria, Columbia University -- Lift-and-project cuts:
                An efficient solution method for mixed integer programs 

TITLE: Lift-and-project cuts: An efficient solution method for mixed
integer programs  

SPEAKER: Sebastian Ceria

A disjunctive constraint (disjunction for short) for a mixed-integer
program specifies that at least one linear constraint set among
several be satisfied. The simplest disjunction in a general
mixed-integer program states that a 0-1 constrained variable must
either take the value 0 or 1. Disjunctions provide a natural tool for
dividing a discrete problem into smaller, and hopefully simpler,
subproblems and have been used extensively in branch-and-bound
methods. In our work we use disjunctions to generate
``lift-and-project'' cutting planes for general mixed-integer
programs. This type of cuts are strong, in the sense that they are
facets of the polyhedron defined by the convex hull of the union of
the disjunctive sets, which is a relaxation of the original
mixed-integer program.  Lift-and-project cuts have already been used
effectively to solve many problems in the literature.

In this talk we present new computational results with
lift-and-project cuts. We focus on disjunctions that tend to be
efficient when used as branching rules in branch-and-bound: our
computational results show that these disjunctions also provide good
cuts. We will discuss extensive computational testing of the test
problems in MIPLIB, and provide ample evidence of the effectiveness of
these cuts. This work is joint with Gabor Pataki.

Evening: "An hour with algorithms" by Kurt Mehlhorn over wine and cheese
May 29th: (Applications)
AM:     Laurence Wolsey, University of Utrecht, on leave from CORE, Universite
                Catholique de Louvain, Belgium --  Production Planning
                Problems: Models and Software 

TITLE: Production Planning Problems: Models and Software

SPEAKER: Laurence A. Wolsey
AFFILIATION:University of Utrecht, on leave from CORE, Universite
Catholique de Louvain

ABSTRACT: We discuss ongoing work aimed at establishing a testbed of
production planning models, and the development of a specialised
modelling and branch-and-cut system incorporating several of the
polyhedral results for lot-sizing models developed over the last ten
years. The system is based upon the Extended Modelling and
Optimisation Subroutine Library (EMOSL) of XPRESS.  Several models and
computational results will be presented.

        Gary Miller, Carnegie Mellon -- Graph Embeddings in Scientific Computation

Graph embeddings have been used in computer science to analyze VLSI
layouts, to analyze interconnection networks, to bound graph separator
sizes, and to analyze mixing rates for Markov processes.

The simplest case of a graph embedding is when we have two graphs over
the same vertex set. In this case, we say a graph $G$ is embedded into
a graph $H$ if every edge of $G$ is mapped to a path of $H$ with the
same end points. The goal is to find an embedding that minimizes
dilation, the length of the longest path, and minimizes congestion,
the maximum number of paths using the same edge of $H$.

In this talk I will show how graph embeddings can be used in Scientific
Computation and Numerical Analysis. In particular, I will present new
preconditioners and their analysis, and give new interpolation error
bounds for the finite element method.  Other applications include the
analysis of old preconditioners such as the Incomplete Cholesky
Factorization method.

This talk represents joint work with many people including: 
Keith Gremban, Steve Guattery, Dafna Talmor, Shang-Hua Teng, Noel

        Dana Randall, Georgia Tech. -- Markov chain methods from Physics

TITLE:  Markov chain methods from physics.

SPEAKER:  Dana Randall
AFFILIATION:  Georgia Tech, School of Mathematics and College of Computing

Fundamental combinatorial questions lie at the heart of many physical
systems studied in statistical mechanics.  While in some cases
elegant solutions have been discovered, in many cases the best tools
currently available are Markov chain Monte Carlo methods.  Monte Carlo 
algorithms allow us to sample large configurations and approximate 
important parameters of the system.  Both rigorous Monte Carlo methods 
and ingenious heuristics have proven useful tools for many problems.
We will survey some of the techniques used by computer scientists and 
physicists and highlight the differences between their approaches.

PM:     Craig Tovey, Georgia Tech. -- Self-Organizing Social Structure
                in Fish Groups  


We employ discrete mathematical models to explore how two different
kinds of structures arise in fish groups.  The first structure is the
ellipsoidal spatial arrangement of many fish schools, observed both in
the laboratory and in the wild.  Our model is the first to account for
the imperfect packing observed in actual schools, and to predict the
shape of the ellipsoid.

The second structure is the acyclic set of pairwise dominance
relationships observed in small groups (joint work with I. Chase).
Recent experiments demonstrate that this social structure is
self-organizing.  How could the individual behaviors that generate
such structure evolve?  We hypothesize a behavioral trait, reactivity,
which is consistent with laboratory experiment. For a simple model of
hierarchy formation, the presence of this trait increases the
probability of acyclicity.  Moreover, there is positive selection
pressure for reactivity, under realistic assumptions about the
relation between fitness and hierarchy rank.

        Michael Trick, Carnegie Mellon -- Adventures in Sports

TITLE: Adventures in Sports Scheduling

SPEAKER: Michael Trick
AFFILIATION: Carnegie Mellon University

ABSTRACT: The problem of creating a reasonable, playable schedule for
a sports league seems simple, but is surprisingly difficult.  For
three years I have been working with Major League Baseball and (with
George Nemhauser and Kelly Easton at Georgia Tech) college basketball.
Along the way, we have formulated and solved many large and
interesting discrete optimization problems.  I will discuss the
problems and the difficulties that arise in this field.

        Milena Mihail, Bellcore -- Network survivability and network evolution

Telecommunications and cable industries, as well as the government,
are moving rapidly towards upgrading their networks to support massive
use of broadband services. The design and operation of such broadband
networks give rise to fundamental algorithmic questions such as
survivability under element failure, cost and service effective
covering of bandwidth demand with alterative and competing
technological solutions, bandwidth allocation and the economics of
network evolution.

We survey techniques from the emerging theory of approximation 
algorithms that address the above questions and  discuss implementations
of suitable adaptations of these algorithms relevant to software of
commercial toolkits for automated design of  backbone SONET/WDM networks.
We also discuss new combinatorial problems whose solution would have further 
impact in the design of backbone broadband networks.

Other Workshops
DIMACS Homepage
Contacting the Center
Document last modified on April 13, 1998.