Proposal for Project on Graph Coloring for DIMACS Challenge

Robert H. Storer and S. David Wu
Department of Industrial Engineering,  Mohler Lab #200
Lehigh University,   Bethlehem PA 18015
Phone (215) 758-4436   Fax: (215) 758-4886  Internet: RHS2@LEHIGH.EDU


In this informal proposal, we propose to develop, study, and test "problem
space search" heuristics for graph coloring problems and variations thereof.
As described below, this entails empirical investigation of a class of
neighborhood structures which can be used to develop local search algorithms
for graph coloring.

I. Problem Space Search

Problem space search is a means for developing local search search heuristics
which we have applied with success to combinatorial problems including
job shop scheduling and related problems [2], and number partitioning.
Problem space search can be viewed as a cross between local search and
randomized heuristics.  The method assumes the existence of a reasonably
fast "base" heuristic H which, when presented with a problem instance
P, will generate a solution S=H(P).  By perturbing the data of the
problem instance in some fashion, the heuristic can be forced to generate
alternative solutions: e.g. S(new) = H(P+d) where d is a (perhaps randomly
generated) perturbation vector.  This basic approach of randomizing a
heuristic has been taken many times in the literature.  Perhaps the earliest
use dates to the early 1960's with "randomized" and "probabilistic" dispatching
heuristics for scheduling.  More recently in the paper by Johnson et al.
[1] on the application of simulated annealing to graph coloring and number
partitioning, randomization of a similar nature was used as a tie breaking
mechanism to generate a distribution of solutions for three graph coloring
heuristics.

What is somewhat different about the approach is the use of the perturbation
vector d, or alternatively the space of perturbed problems P+d, (thus
the name problem space search) as a search space to which local search
can be applied.  Problem space search can be formulated loosely as follows:

     Min(Max)  V[H(P+d),P]
      d

 Where V(S,x) is the objective function value of solution S to problem x
 H(x) is the solution returned when heuristic H is applied to problem x
 P is the original problem data
 d is a perturbation vector

S could be a coloring in graph coloring, a job sequence in scheduling,
a partition in partitioning problems etc.  Perturbed data (P+d) force
the heuristic to generate alternative solutions, but, of course, solutions
are evaluated based on the original problem (P).  Decisions as to what
to use as the base heuristic H, and precisely how the problem data should
be perturbed must be made on an individual problem basis.

"Problem space", in which the search is carried out, has some nice properties,
and some bad properties.  Principal among good properties is (i) the tendency
of good solutions to be clustered near the original solution.  While apparently
not provable in a rigorous fashion, this property has certainly been supported
by experimental studies.  It seems reasonable that a solution provided
by the base heuristic when applied to a slightly perturbed problem should
be a reasonable solution for the original problem as well.  (ii) As alluded
to by the use of the term "near" previously, a second "good" property
is the ease with which distance metrics can be defined.  Since problem
space is defined in N dimensional real number space (where N is the dimension
of d), Euclidian or other standard distance metrics apply.  These metrics
can then be used to define neighborhoods for local search.  (iii) It is
usually a trivial matter to show that the optimal solution to the problem
exists somewhere in problem space [2].  Finding the optimal solution is
obviously difficult, but it is comforting to know that it is not impossible.
(iv) Problem space makes direct use of problem specific information since the
base heuristic is assumed to have been developed and customized for the
problem at hand.  (v) The embedded heuristic typically guarantees that
each solution visited in the search is feasible.  In contrast, "swapping"
based local search must often investigate many infeasible solutions when
applied to combinatorial optimization problems.  (vi) A final notable
property is that problem space algorithms are typically very easy to
understand, develop, and code.

Among the less desirable properties of problem space are (i) the
typically high dimensionality of the search space, and (ii) the
lack of any useful gradient information.  The topology of V(d) resembles
my desk top (and those of the readers I am sure) with multiple stacks of
paper of different heights: i.e. "flat" regions separated by discontinuities.
A side effect of these properties is that it is nearly impossible to determine
when the search is in a local minimum.  (iii) Finally, we have no idea
what so ever as to how one might determine meaningful worst case performance
bounds, expected performance on random instances, or other such guarantees
of performance.  While we would welcome insight on this issue, it seems
that the random and reasonably complex nature of problem space search
is not conducive to performance guarantees.

The relevant question to be addressed is: Are there good ways to search
problem space?  For example, can local search applied to problem space
produce better results than randomly generating solutions (i.e. better
results than "blind" randomized versions of the base heuristic)?  Our
experience to date indicates that the answer to this question is resoundingly
yes.  We have experimented with several local search procedures and have
found to our surprise that genetic algorithms consistently work best.

Our experience with problem space search seems to indicate that it is
better suited to problems with real valued data which effect the heuristic,
and which can be perturbed.  Examples include perturbing processing times
in scheduling problems and then applying SPT dispatching to generate alternate
sequences, and perturbing the number values in number partitioning to
generate alternative partitions.  For general problems on graphs, problem
space is likely better suited for "weighted" versions of the problems.

II. Problem Space Heuristics for Graph Coloring

There are a number of ways in which a problem space might be formulated
for graph coloring.  Three are presented here to serve as examples only.
To begin, let the base heuristic H be the recursive largest first (RLF)
algorithm of Leighton as described in Johnson et al. [1].  This algorithm
colors vertices one at a time.  At each step the vertex to color is selected
in greedy fashion by choosing the vertex (in some set V') with the maximum
number of edges to the set of uncolored vertices.

As the RLF heuristic proceeds it is likely to encounter myriad situations
in the selection of vertices in which ties must be broken.  In Johnson
et al. [1], ties were broken randomly in order to generate a distribution
of alternative solutions.  This approach could be accomplished in problem
space by assigning a priority to each vertex to be used in tie breaking
situations.  These priorities can then be perturbed to form a problem
space.  This approach can be extended so that priorities are used not
just to break ties, but rather to directly select the vertex to color
next at each step.  A third approach would be to assign a weight to each
edge (initially set to 1), and use the sum of the weights of incident
edges to the set of uncolored vertices in place of the usual criterion
in the RLF heuristic.  Edge weights can then be perturbed to form a problem
space.  Each of these examples have obvious shortcomings, and thus alternative
problem space formulations should be developed and tested.

III. Issues for Investigation

1. How can problem space be searched effectively?  While empirical results
point toward the success of genetic algorithms, little progress has been
made in understanding why this is true.  We hope to learn more about the
nature of problem space, explain why some search strategies work better
than others, and exploit this knowledge to improve the search methods.
We anticipate that observations from our fellow participants in the challenge
will be especially useful in this regard.

2. Clearly, many problem space formulations can be envisioned.  We hope
to determine which are most effective, and then to quantify their performance.

3. Due to the discrete nature of the objective function in graph coloring,
it seems likely that a surrogate objective will be of use in directing
the search.  For a graph colored with k colors, an example surrogate objective
could be:

V = M*k + (number of vertices colored k)  where M is a suitably large constant

Another method we have employed is to penalize distance from the original
problem so that the search does not stray too away.  It is of interest
to investigate other means by which the search can be directed in this
fashion.

4. We would like to extend our investigation to other problems related
to graph coloring.  As stated above, problem space by its nature seems
more conducive to "weighted" problems.  As an example, consider the problem
in which a graph must be colored with a fixed number of colors, each edge
has a weight, and the objective is determined by summing the weights of
all monochromatic edges.  This is essentially the university course scheduling
problem.

We look forward to your comments, and to participating in the challenge.

References

[1] Johnson, D.S., Aragon, C.R., McGeoch, L.A., and Schevon, C. (1991).
"Optimization by simulated annealing: an experimental evaluation; part
II, graph coloring and number partitioning". Operations Research, 39(3),
pp 378-406.

[2] Storer, R.H., Wu, S.D., and Vaccari, R. (1992). "New search spaces
for sequencing problems with application to job shop scheduling", Management
Science, 38(10), pp. 1495-1509.