  ### Proposed projects for the 2001 DIMACS REU Program

Applications of semidefinite programming

Semidefinite programming (SDP) is a relatively new field (about ten years old) in the area of optimization theory. Although the special problems go back to over 40 years, the general problem has been studied intensively only in the last ten years. Semidefinite programming is special optimization problem that is similar to linear programming (LP). In linear programming we try to solve the optimization problem:

 minimize c'x subject to Ax=b and x non-negative
Here c and x are column vectors with n components (c' is transpose of c and thus a row vector), and A is an m by n matrix. In SDP we have a similar problem with two basic differences: first, x is not just a vector but actually a symmetric matrix. Second, instead of requiring x to have nonnegative components, we require it to be a "positive semidefinite matrix", that is s'As is nonnegative for every vector s. The feasible set of SDP is not a polyhedron in general, but it is a convex set. Presently some good algorithms exist to tackle this problem and several software packages do a reasonable job of solving them.

The project I would like to propose to look for models and applications of SDP in various areas. Already there are significant applications in combinatorial optimization and graph theory. I like to explore applications in probability theory, statistics and financial engineering. Here is a sample of specific problems:

a) Suppose you are asked to design a random walk with certain rules, and you wish to maximize the rate of convergence. What is the best random walk that you can come up with?

b) You are given some information about an unknown probability distribution, for example its first n moments, and some other restrictions on the distribution. You wish to choose a distribution that satisfies these restrictions and is best in some sense, for instance is as "symmetric as possible" or its median is as close to a particular number as possible or some other criterion.

c) You are looking to approximate a function of several variables
y=f(x_1,...,x_n). You are given a "sample" of observations:
For x_11, x_12, ..., x_1n you observed y_1
For x_21, x_22, ..., x_2n you observed y_2
...
For x_m1, x_m2, ..., x_mn you observed y_m

The traditional regression analysis (or in general approximation theory) asks you to find the best function f from a given class of functions (say polynomials, exponential, linear etc.) that fits these observations as closely as possible. We like to study this problem with added twist that the function f is convex. This under certain circumstances may yield an SDP problem.

The prospective student should be familiar with linear programming, linear algebra and introductory probability. (S)he should also be willing to do some simple programming, mainly using a very high level programming language such as Matlab, or Matlab like language and write scripts to model a problem and feed it to a SDP solver (which is already written.)

Mentor: Endre Boros, RUTCOR

Quadratic binary optimization is a very difficult problem, in general. A possibility to get good approximations is to construct a "tight" convex majorant. Such tight majorants are known for functions depending on 2 variables, but even for 3-variable functions, the best convex majorants are not known.

This project will aim at developing a method of constructing such majorants, and perhaps even to characterize those for small functions, depending on 3,4 or 5 variables.

Difference graphs

```Given  n  finite sets  S_1,...,S_n , let us define a graph
G  with  n  vertices   v_1,...,v_n  in which 2 vertices
v_i  and  v_j  are connected by an edge  IFF
two differences  S_i \ S_j  and  S_j \ S_i
have at least  k  elements each.
EVERY graph can be realized in such a way
if  k  is arbitrarily large.
But is it still true if  k  is bounded by a constant  ?
Even for  k=2  neither a proof nor a counter-example is known.
If  k=1  then only co-occurrence graphs can be realized
```

Stationary Strategies

In 1912 Zermelo proved that games with perfect information (e.g. Chess) can be solved in pure stratagies, that is without randomizing. A strategy is called STATIONARY if for every position v the move in v may only depend on v but not on the preceding moves. Does Zermelo's Theorem generalize this case ? Is it true that games with perfect information can be solved in PURE STATIONARY strategies ? For acyclic games the answer is "YES". But for games with cycles the problem is open

Graph Theoretic Algorithms For Cluster Analysis

Mentor: Dr. Mel Janowitz, DIMACS

Cluster analysis is a branch of multivariate data analysis that seeks to discover hidden patterns in a data set. The usual formulation involves a finite data set E and a dissimilarity coefficient d on E. This is a mapping of ordered pairs of E into the nonnegative reals such that d(a,b)=d(b,a) and d(a,a)=0 for all a,b in P. A cluster algorithm tries to convert this dissimilarity coefficient into a hierarchical tree or some generalization thereof.

From the viewpoint of graph theory, the input is a complete graph on which there is specified an edge-length function. Cluster algorithms may be viewed as transformations of this edge length function into a new function that somehow gives more information on the structure of the graph. The idea is to divide the graph into subgraphs in such a way that distinct subgraphs are separated, and each given subgraph is highly connected. The problem is complicated by the fact that any increase in internal connectivity often forces a decrease in separation, so this becomes an interesting optimization problem.

The idea of the project is to investigate cluster analysis from this graph theoretic vantage point, and to formulate cluster algorithms from the viewpoint of graph theoretic concepts such as connectivity, separation and bridges. A bridge of a symmetric graph is an edge whose removal causes the graph to have an extra connected component. There are a number of unpublished and largely unexplored algorithms based on bridges. The idea of the project will be to understand these algorithms, investigate their properties, and possibly start implementing them with an appropriate programming language.

The project will also examine past graph theoretic models for cluster analysis, and attempt to create a model that will place cluster analysis within the realm of graph theoretic concepts that have arisen for other reasons. Once this is done, there will hopefully be feedback between the two subjects.

No prior knowledge of cluster analysis is assumed, and a combinatorial approach will be used. The exact nature of the project will be negotiated and can take a number of directions.

Magic labeling in undirected graphs with particular application to the traveling salesman problem

Mentor: Dr. Bahman Kalantari, Computer Science

Given an undirected graph G=(V,E) with n vertices, a natural number lambda is said to be a ``magic labeling'' if the edges of G can be labeled with nonnegative integers so that for each vertex the sum of the labels of the corresponding incident edges is lambda. The ``capacitated'' version of the magic labeling problem is the case where integral upper bounds are imposed on the edge labels. If G is edge-weighted, then the ``weighted'' magic labeling problem can be defined where the ``weight'' of a magic labeling is the sum of the products of the weight and label for each edge.

The problem of testing the existence of lambda-magic labeling or computing the minimum-weight lambda-magic labeling, as well as constrained versions of these problems include some of the most celebrated problems in polyhedral combinatorics and combinatorial optimization, e.g. the perfect matching problem (1-magic labeling) and its weighted version; the prefect 2-matching problem (2-magic labeling) and its weighed or capacitated versions; as well as the Hamiltonian cycle problem and its weighted version, the traveling salesman problem (TSP) which can be viewed as the minimum-weight 2-magic labeling where the set of positively labeled edges form a connected subgraph of G.

Although the problems of computing the minimum magic labeling and the minimum-weight lambda-magic labeling are solvable in polynomial-time, most constrained cases (e.g. Hamiltonian cycle problem) are NP-hard. More generally, magic labeling problems can be defined with respect to hypergraphs in which case even the problem of testing the existence of a lambda-magic labeling is NP-complete. Indeed for some very special hypergraphs, the existence of some lambda-magic labelings remain to be the most central problem in combinatorial design theory, difficult to establish theoretically or computationally. Despite the deceptive simplicity in the description of magic labeling problems, it is not surprising to imagine that an enormous set of challenging problems can be stated. Even in the case of graphs, some constrained versions of magic labeling problems give a very challenging set of problems that at the very least can be used to assess the strengths and weaknesses of general integer programming methods, as well as codes.

The purpose of this research project is to discover new findings with regard to this fundamental problem. We wish to assess the effectiveness of using magic labeling problems in solving some hard combinatorial problems, in particular with respect to TSP or its variants. Assume that G is complete, edge-weighted with weights satisfying the triangle inequality, and n even. For i=1,2,3, let W_i denote the weight of the minimum-weight i-magic labeling where each edge label is at most two (i.e. capacitated for i=3). Let T_* denote the weight of the minimum-weight TSP tour. It is well-known and quite easy to show that 2W_1 and W_2 are lower bounds to T_*. It can be shown that (2/3)W_3 is also a lower bound to T_* (see Kalantari and Khosrovshahi, Magic Labeling in Graphs: Bounds, Complexity, and an Application to a Variant of TSP, Networks, 28, 1996, 211-219). It can also be shown that W_3 can be used to give a 2-approximate solution to a difficult variant of TSP.

The question is how good is the new lower bound and how can it be incorporated within Branch-and-Bound or Branch-and-Cut algorithms for TSP or its variants? It is hoped that these questions can be answered at least computationally with respect to some previously defined set of test problems for TSP, and using the existing codes.

DNA Sequences and Three-dimensional Structure

Mentor: Wilma Olson, Chemistry (Computational Chemistry/Biology)

The DNA base sequence, once thought to be interesting only as a carrier of the genetic blueprint, is now recognized as playing a structural role in modulating the biological activity of genes. Through subtle variations of bending and twisting at the level of neighboring base pairs, the chemical sequence not only generates a higher-order folding of the double helix itself but also produces structural motifs that facilitate the specific binding of proteins and other ligands. The consequent looping and twisting of the DNA assumes an important role in various biological processes.

To understand how the genetic code is translated into three-dimensional structures used in biological processes, our group is analyzing the known geometry of DNA bases in chemical structures and developing mathematical models to incorporate the sequence-dependent bending, twisting, and translation of known fragments in DNA molecules of varying lengths and chemical composition.

Property Testing

Mentor: Mario Szegedy, Computer Science

The notion of combinatorial property testing is introduced in a paper of Goldreich, Goldwasser and Ron. Suppose, a huge graph (like e.g. the Internet) is to be checked for a certain graph property, say for k-cycle freeness. The graph is so big, that we have to be satisfied with examining a small fraction of nodes and all connections in between them. The selection of this fraction is done randomly according to some appropriate probability distribution. Is it true that if the probability that the check reveals no k-cycle is 99%, then either the graph is very sparse or we can leave out no more than 10% of the edges such that we get rid of all the k-cycles? In the proposed project we want to examine various combinatorial, geometric and algebraic properties, and find out how they behave with respect to checking schemes as above. A little knowledge of probability theory and combinatorics is useful for the topic.

Compositional Model Checking

Mentor: Mahesh Viswanathan, DIMACS

As our reliance on computers and digital systems increases, the need for the underlying software to be dependable becomes critical. However, the problems addressed by today's software systems are quite complicated, often involving the coordination of concurrent systems while meeting real-time constraints, for which the task of developing correct and reliable software is particularly difficult. Hence, there is an increasing need for automated techniques to analyze software before they are deployed in widely used products.

One popular algorithmic technique, called Model Checking, is concerned with determining whether a formal model of a system, often presented as a graph of states and state transitions, satisfies certain correctness properties. The method works by systematically stepping through the global states of the system while checking various properties at each stage. However, such a method runs into the well-known problem of "state explosion" which severely limits the size of the systems that can be analyzed. In more detail, in contrast to the compact representation of system descriptions using variables, the explicit representation of the state space (on which model checking performs the search) includes the enumeration of all possible values for the variables which is prohibitively large.

One approach that has been found to be effective is compositional model checking. It is a divide and conquer paradigm where one verifies components of the software for local correctness and then deduces the global correctness of the system from the local properties of the components. The project will involve extending existing techniques for compositional model checking and experimenting with examples in some well known model checker. Background in logic and algebra will be useful, though not crucial. Index of REU program DIMACS Home Page
Contacting the Center