Proposal for Project on Graph Coloring for DIMACS Challenge Robert H. Storer and S. David Wu Department of Industrial Engineering, Mohler Lab #200 Lehigh University, Bethlehem PA 18015 Phone (215) 758-4436 Fax: (215) 758-4886 Internet: RHS2@LEHIGH.EDU In this informal proposal, we propose to develop, study, and test "problem space search" heuristics for graph coloring problems and variations thereof. As described below, this entails empirical investigation of a class of neighborhood structures which can be used to develop local search algorithms for graph coloring. I. Problem Space Search Problem space search is a means for developing local search search heuristics which we have applied with success to combinatorial problems including job shop scheduling and related problems [2], and number partitioning. Problem space search can be viewed as a cross between local search and randomized heuristics. The method assumes the existence of a reasonably fast "base" heuristic H which, when presented with a problem instance P, will generate a solution S=H(P). By perturbing the data of the problem instance in some fashion, the heuristic can be forced to generate alternative solutions: e.g. S(new) = H(P+d) where d is a (perhaps randomly generated) perturbation vector. This basic approach of randomizing a heuristic has been taken many times in the literature. Perhaps the earliest use dates to the early 1960's with "randomized" and "probabilistic" dispatching heuristics for scheduling. More recently in the paper by Johnson et al. [1] on the application of simulated annealing to graph coloring and number partitioning, randomization of a similar nature was used as a tie breaking mechanism to generate a distribution of solutions for three graph coloring heuristics. What is somewhat different about the approach is the use of the perturbation vector d, or alternatively the space of perturbed problems P+d, (thus the name problem space search) as a search space to which local search can be applied. Problem space search can be formulated loosely as follows: Min(Max) V[H(P+d),P] d Where V(S,x) is the objective function value of solution S to problem x H(x) is the solution returned when heuristic H is applied to problem x P is the original problem data d is a perturbation vector S could be a coloring in graph coloring, a job sequence in scheduling, a partition in partitioning problems etc. Perturbed data (P+d) force the heuristic to generate alternative solutions, but, of course, solutions are evaluated based on the original problem (P). Decisions as to what to use as the base heuristic H, and precisely how the problem data should be perturbed must be made on an individual problem basis. "Problem space", in which the search is carried out, has some nice properties, and some bad properties. Principal among good properties is (i) the tendency of good solutions to be clustered near the original solution. While apparently not provable in a rigorous fashion, this property has certainly been supported by experimental studies. It seems reasonable that a solution provided by the base heuristic when applied to a slightly perturbed problem should be a reasonable solution for the original problem as well. (ii) As alluded to by the use of the term "near" previously, a second "good" property is the ease with which distance metrics can be defined. Since problem space is defined in N dimensional real number space (where N is the dimension of d), Euclidian or other standard distance metrics apply. These metrics can then be used to define neighborhoods for local search. (iii) It is usually a trivial matter to show that the optimal solution to the problem exists somewhere in problem space [2]. Finding the optimal solution is obviously difficult, but it is comforting to know that it is not impossible. (iv) Problem space makes direct use of problem specific information since the base heuristic is assumed to have been developed and customized for the problem at hand. (v) The embedded heuristic typically guarantees that each solution visited in the search is feasible. In contrast, "swapping" based local search must often investigate many infeasible solutions when applied to combinatorial optimization problems. (vi) A final notable property is that problem space algorithms are typically very easy to understand, develop, and code. Among the less desirable properties of problem space are (i) the typically high dimensionality of the search space, and (ii) the lack of any useful gradient information. The topology of V(d) resembles my desk top (and those of the readers I am sure) with multiple stacks of paper of different heights: i.e. "flat" regions separated by discontinuities. A side effect of these properties is that it is nearly impossible to determine when the search is in a local minimum. (iii) Finally, we have no idea what so ever as to how one might determine meaningful worst case performance bounds, expected performance on random instances, or other such guarantees of performance. While we would welcome insight on this issue, it seems that the random and reasonably complex nature of problem space search is not conducive to performance guarantees. The relevant question to be addressed is: Are there good ways to search problem space? For example, can local search applied to problem space produce better results than randomly generating solutions (i.e. better results than "blind" randomized versions of the base heuristic)? Our experience to date indicates that the answer to this question is resoundingly yes. We have experimented with several local search procedures and have found to our surprise that genetic algorithms consistently work best. Our experience with problem space search seems to indicate that it is better suited to problems with real valued data which effect the heuristic, and which can be perturbed. Examples include perturbing processing times in scheduling problems and then applying SPT dispatching to generate alternate sequences, and perturbing the number values in number partitioning to generate alternative partitions. For general problems on graphs, problem space is likely better suited for "weighted" versions of the problems. II. Problem Space Heuristics for Graph Coloring There are a number of ways in which a problem space might be formulated for graph coloring. Three are presented here to serve as examples only. To begin, let the base heuristic H be the recursive largest first (RLF) algorithm of Leighton as described in Johnson et al. [1]. This algorithm colors vertices one at a time. At each step the vertex to color is selected in greedy fashion by choosing the vertex (in some set V') with the maximum number of edges to the set of uncolored vertices. As the RLF heuristic proceeds it is likely to encounter myriad situations in the selection of vertices in which ties must be broken. In Johnson et al. [1], ties were broken randomly in order to generate a distribution of alternative solutions. This approach could be accomplished in problem space by assigning a priority to each vertex to be used in tie breaking situations. These priorities can then be perturbed to form a problem space. This approach can be extended so that priorities are used not just to break ties, but rather to directly select the vertex to color next at each step. A third approach would be to assign a weight to each edge (initially set to 1), and use the sum of the weights of incident edges to the set of uncolored vertices in place of the usual criterion in the RLF heuristic. Edge weights can then be perturbed to form a problem space. Each of these examples have obvious shortcomings, and thus alternative problem space formulations should be developed and tested. III. Issues for Investigation 1. How can problem space be searched effectively? While empirical results point toward the success of genetic algorithms, little progress has been made in understanding why this is true. We hope to learn more about the nature of problem space, explain why some search strategies work better than others, and exploit this knowledge to improve the search methods. We anticipate that observations from our fellow participants in the challenge will be especially useful in this regard. 2. Clearly, many problem space formulations can be envisioned. We hope to determine which are most effective, and then to quantify their performance. 3. Due to the discrete nature of the objective function in graph coloring, it seems likely that a surrogate objective will be of use in directing the search. For a graph colored with k colors, an example surrogate objective could be: V = M*k + (number of vertices colored k) where M is a suitably large constant Another method we have employed is to penalize distance from the original problem so that the search does not stray too away. It is of interest to investigate other means by which the search can be directed in this fashion. 4. We would like to extend our investigation to other problems related to graph coloring. As stated above, problem space by its nature seems more conducive to "weighted" problems. As an example, consider the problem in which a graph must be colored with a fixed number of colors, each edge has a weight, and the objective is determined by summing the weights of all monochromatic edges. This is essentially the university course scheduling problem. We look forward to your comments, and to participating in the challenge. References [1] Johnson, D.S., Aragon, C.R., McGeoch, L.A., and Schevon, C. (1991). "Optimization by simulated annealing: an experimental evaluation; part II, graph coloring and number partitioning". Operations Research, 39(3), pp 378-406. [2] Storer, R.H., Wu, S.D., and Vaccari, R. (1992). "New search spaces for sequencing problems with application to job shop scheduling", Management Science, 38(10), pp. 1495-1509.