From dimacs.rutgers.edu!trick Fri Nov 13 15:24:05 EST 1992 Received: by alice; Fri Nov 13 15:24:15 EST 1992 Received: by inet.att.com; Fri Nov 13 15:24 EST 1992 Received: by shakespeare.rutgers.edu (5.59/SMI4.0/RU1.5/3.08) id AA05506; Fri, 13 Nov 92 15:24:05 EST Date: Fri, 13 Nov 92 15:24:05 EST From: trick@dimacs.rutgers.edu (Mike Trick) Message-Id: <9211132024.AA05506@shakespeare.rutgers.edu> To: dsj@research.att.com Subject: The Overview Status: RO The Overview file \documentstyle[12pt]{article} \title{Clique and Coloring Problems\\Literature and Open Problems} \author{} \date{Last revision: \today} \begin{document} \maketitle {\narrower\it This paper is a work in process. We will be adding more references and details of the references given as we find them. If you have information you think should be included, please send a note to {\tt challenge@dimacs.rutgers.edu}.} \section{Introduction} Given an undirected graph, a {\it clique} of the graph is a set of mutually adjacent vertices. A {\it maximum clique} is, naturally, a clique whose number of vertices is at least as large as that for any other clique in the graph. If the vertices have weights then a {\it maximum weighted clique} is a clique with a larger sum of vertex weights than any other clique. A (vertex) {\it coloring} of an undirected graph is an assignment of a label to each node. It is required that the labels on the pair of nodes incident to any edge be different. A {\it minimum coloring} of a graph is a coloring that uses as few different labels as possible. Clique and coloring problems are very closely related. It is straightforward to see that the size of the maximum clique is a lower bound on the minimum number of labels needed to color a graph. Many problems of practical interest can be modeled as clique and coloring problems. The general form of these applications involves forming a graph with nodes representing items of interest. An edge connects two ``compatible'' items. The maximum clique problem is then to find as large a set of pairwise compatible items as possible. The minimum coloring problem is to assign a color to each item so that every compatible pair is assigned different colors. Both of these problems are formally NP--hard for general graphs \cite{GaJo79}. It therefore seems unlikely that it will be possible to find a fast (i.e. polynomial-time) algorithm to solve these problems. In fact, based on the results of \cite{GaJo76,???} it seems unlikely that it is even possible to find an approximate solution to these problems quickly. Despite these negative results, it is still necessary to find solutions to large clique and coloring problems. The applications do not go away just because of the formal complexity of the problem. Therefore, it is important for us to determine how difficult clique problems are to solve in practice. Do hard instances only occur rarely? Are hard--to--approximate instances almost pathological? What structures in problems make it difficult to solve these problems, and how can those structures be avoided? The purpose of this note (which is currently an evolving document) is to outline some past research on solving clique and coloring problems and to identify some possible research directions. In the next section, we discuss a few typical applications of the maximum clique and minimum coloring problems. In section~\ref{sec:alg}, we discuss a number of alternative solution approaches that have been tried in the past. In section~\ref{sec:open} we talk about some suggested research directions. \section{Sample Applications} Both coloring and clique models are useful in a variety of applications (since one bounds the other, most applications are applicable to either). In this section, we discuss a few typical applications. \subsection{Frequency Assignment} Gamst \cite{Ga86} examines a problem in assigning frequencies to mobile radios and other users of the electromagnetic spectrum. In the simplest case, two customers that are sufficiently close must be assigned different frequencies, while those that are distant can share frequencies. The problem of minimizing the number of frequencies is then a graph coloring problem. In the general setup examined by Gamst, there is a required distance (in terms of ``steps'' on the frequency list) between the customers (so this generalizes graph coloring, where the required distance is either 0 or 1). The objective is to use as few frequencies as possible. For this more general problem, a lower bound can be derived by considering all customers that have to be at least one frequency apart. The largest clique in the resulting graph is then a lower bound on the number of frequencies required. There are other bounds that also involve finding large cliques, including node weighted cliques. This application also provides a seemingly good setting for on--line algorithms depending on the ability to change frequencies midstream. \subsection{Pattern Matching} Ogawa \cite{Og86} has an interesting application involving pattern recognition. Given a ``target'' picture and an input picture (which involve only a set of points), a related compatibity graph is created based on pairs of points. A maximum clique represents a large number of mutually consistent pairs (where ``mutually consistent'' can include a number of restrictions, including angular relationships as well as the requirement that no point be matched with more than one other) and can then be used to represent the resulting fit. This model seems to correctly recognize affine transformations as well as moderately nonlinear transformations. \subsection{Time Tabling and Scheduling} Many scheduling problems involve allowing for a number of pairwise restrictions on which jobs can be done simultaneously. For instance, in attempting to schedule classes at a university, two courses taught by the same faculty member cannot be scheduled for the same time slot. Similarly, two course that are required by the same group of students also should not conflict. The problem of determing the minimum number of time slots needed subject to these restrictions is a graph coloring problem. This problems has been studied by many researchers, including Leighton \cite{Le79}, Opsu and Roberts \cite{OpRo81}, and de Werra \cite{De85}. \subsection{Register Allocation} One very active application for graph coloring is register allocation. The register allocation problem is to determine the contents of a limited number of hardware registers during program execution. Variables in registers can be accessed much quicker than those not in registers. Typically, however, there are far more variables than registers so it is necessary to assign multiple variables to registers. Variables conflict with each other if they are both used within a short period of time (for instance, within a subroutine). The goal is to assign variables that do not conflict so as to minimize the use of non--register memory. A simple approach to this is to create a graph where the nodes represent variables and an edge represents conflict between its nodes. A coloring is then a conflict--free assignment. If the number of colors used is less than the number of registers then a conflict--free register assignment is possible. Some papers that outline and expand on this method include Chaitin \cite{Ch82}, Chaitin et. al \cite{ChAuChCoHoMa81}, Chow and Hennessy \cite{ChHe84,ChHe90}, and Briggs, Cooper, Kennedy, and Torczon \cite{BrCoKeTo89}. \subsection{Designing a Printed Circuit Board Tester} A printed circuit board tester involves placing probes onto a board. A probe can determine if a portion of a board is working correctly. Since probes have a particular size, not every component can be checked in one pass. The problem of maximizing the number of components checked in one pass can be formulated as a clique problem: each node represents a component and an edge represents two nodes that are too close to be checked simultaneously. A clique in this graph is then a set of components that can be checked in one pass. This is an example of a {\it geometric} clique problem and has been studied by Yu, Goldschmidt and Chen \cite{YuGoCh92} and Yu. Kouvelis, and Luo \cite{YuKoLu92}. \subsection{Analysis of Biological and Archeological Data} In biology and archeology, a standard model for relating objects is that of a tree. Trees can represent the division of a species into two separate species or the division of features of some artifact (like pottery or pins). Species do not come with histories, however, nor are artifacts completely dated. Therefore, it is necessary to deduce the tree structure from the features of the items. One approach to this is to create a distance measure between the items. If the distance measure represents distances along a tree, then that tree is a good estimate for the underlying, ``real'' tree. Normally, the distances do not represent a tree, so it is necessary to find a tree that accurately estimates the true distances. One approach to this, suggested by Barth\'elemy and Gu\'enoche \cite{BaGu91}, creates a graph as follows: the nodes of the graph represent partitions of the items. These partitions are chosen because items within a partition are closer to each other than to those in the other side of the partition. Two nodes are adjacent if the partitions are consistent with coming from the same tree (which reduces to an inclusion condition). A clique in this graph represents a set of partitions that can be formed into a tree. Maximum cliques attempt to encapsulate as much of the partition data as possible. For more information, see Chapter 5 of \cite{BaGu91}. \section{Solution Approaches} \label{sec:alg} \subsection{Cliques} There have been a number of attempts to create algorithms that solve the clique problem. An early attempt that illustrates the advantages of publishing computer codes is that of Bron and Kerbosch \cite{BrKe73}, which is still often used to compare approaches. This paper, as well as those by Akkoyunlu \cite{Ak73}, Gerhards and Lindenberg \cite{GeLi79} and Loukakis and Tsouros \cite{LoTs82}, is an implicit enumeration routines with varying levels of testing to eliminate dominated solutions. Polyedral approaches have been suggested by Nemhauser and Trotter \cite{NeTr75} and Balas and Samuelsson \cite{BaSa77}. Tarjan and Trojanowski \cite{TaTr77} develop a recursive routine which has a worst case bound of $O(2^{n/3})$. Babel \cite{Ba91} is a branch and bound approach where the bounds are find by graph coloring heuristics. One interesting aspect of this work is that the resulting algorithm is analyzed and various classes are characterized where the branch and bound algorithm performs in polynomial time. Balas and Xue \cite{BaXu91} develop a branch and bound procedure for the maximum weight clique problem by using, at its core, an algorithm for minimum weight coloring of triangulated graphs and represents one approach for using algorithms for structured graphs to solve the problem for unstructured graphs. This is an extension from the unweighted clique--finding method of Balas and Yu \cite{BaYu86}. There is not enough time or space to summarize the vast amount of research that has been done on theoretical and practical methods for solving the clique problem for restricted classes of graphs. Since most of the classes examined are ``perfect graphs'' of one form or another, Golumbeck \cite{Go80} is a good place to begin. Some general approaches to finding cliques have been analyzed in the case of random graphs (where typically a random graph is one chosen uniformly at random from the set of all labelled graphs), where Bollob\'as and Erd\"os \cite{BoEr76} have very strong bounds on the size of the largest clique. Pittel \cite{Pi82} has analyzed a number of heuristics for such graphs, and found none that will find the largest clique in this model. Jerrum \cite{Je92} has extended this work to include heuristics based on simulated annealing. \subsection{Coloring} There have been a number of heuristics developed for graph coloring. The most common heuristic is the technique of successive augmentation. In this approach a partial coloring is found on a small number of vertices and this is extended vertex by vertex until the entire graph is colored. Examples of varients of this approach include Welsh and Powell \cite{WePo67}, Matula, Marble and Isaacson \cite{MaMaIs72}, Johnson \cite{Jo74}, Grimmet and McDiarmid \cite{GrMc75}, Br\'elaz \cite{Br79}, Leighton \cite{Le79}, Johri and Matula \cite{JoMa82}, Wigderson \cite{Wi83}, and Berger and Rompel \cite{BeRo90}. Some of these algorithms are also applicable to the {\it on--line} coloring problem. In the on--line problem, the graph is only determined one node at a time. Once a node is seen, it is necessary to color that node immediately. This color is then fixed and other nodes are shown. This is similar to most successive augmentation heuristics except that the ordering of nodes to be colored cannot be set by the heuristic. This models some natural phenomena but is also a very large restriction on the class of heuristics examined. Examples of this line of research include \cite{???}. More general heuristic techniques that have been tried include simulated annealing (Chams, Hertz, and De Werra \cite{ChHeDe87} and Johnson et. al \cite{JoArMcSc91}) and tabu search (Hertz and De Werra \cite{HeDe87}). Issues related to finding an appropriate neighborhood structure for coloring graphs are described by Morgenstern and Shapiro \cite{MoSh90}. \section{Open Problems and Suggested Research} \label{sec:open} Here are a few suggestions to stimulate thinking on computational aspects of finding cliques and coloring in graphs. \begin{enumerate} \item Implement and test algorithms and heuristic methods. This is the most direct sort of project at hand. Devise a new method or modify an existing method and see how well it solves instances. There are no limitations to the solution method: it might be polyhedrally based, it might involve clever enumeration, or might use any of a host of other approaches. \item Examine general purpose heuristic methods. This is a tremendous opportunity to use cliques as a problem area to explore new techniques in heuristic problem solving. There are a number of general techniques that might be of interest for solving clique and coloring problems. These include tabu search \cite{???}, genetic algorithms \cite{???}, and simulated annealing \cite{AaKo89,Ce85,CoEgGo88,KiGeVe83,VaAa87}. Johnson, Aragon, McGeoch, and Schevon \cite{JoArMcSc91} have used simulated annealing for the graph coloring problem. Is there anything in this problem that makes it particularly amenable to solution by one of these approaches? Is there anything that makes it particularly difficult? \item Explore the role of randomization. Most of the algorithms created are completely deterministic. Can randomization help in solving these types of problems? \item Experiment on alternative machine architectures, including parallel and vector processors. How can multiple processors be exploited in solving these problems? This could be either via distributed computing (generally with high communications costs, heterogeneous machines, and/or limited connectivity) or with such special purpose machines as Connection Machine computers. \item Examine real world problems. The clique problem is commonly claimed to have a great deal of applicability. Most computational work, however, is done with random graphs. How do real world problems differ from such random graphs? How would the conclusions drawn from past research be different if real problems were solved? \item Create and examine alternative generators. Consider the nodes of a graph as points in the Euclidean plane where an edge occurs only if the corresponding nodes are sufficiently close (or sufficiently far away). Is this an interesting problem generator? Do algorithms perform well or poorly for it? What other generators have interesting properties? \item Examine ``pathological'' examples. For each optimization algorithm developed, there is a class of examples for which the algorithm does well and a class for which it does poorly. Can you characterize why an algorithm does poorly? Alternatively, can you determine a large, interesting, class for which certain algorithms work well? \item Examine planar graphs. The four color theorem, as proved by \cite{???} is actually a constructive theorem that gives an $O(n^2)$ algorithm for 4--coloring a graph. This algorithm seems completely impractical due to the large constants involved. Morgenstern and Shapiro \cite{MoSh91} provide some heuristics for such graphs and might be seen as a starting point for this class of interesting graphs. \item Examine neighborhood structures. One aspect of these problems that makes solution difficult is the lack of a natural neighborhood structure for local search. Neighborhood structures for these problems tend either to be too narrow (leading to poor local optima) or too broad (leaving a difficult neighborhood problem). What are some alternative neighborhoods and how can they be effectively searched? \item Examine more fundamental issues in computational testing using these problems as a testbed. How should algorithms and heuristics be compared across environments? Does good computational work require outstanding ``hacking'' ability, or is there some more fundamental measure independent of implementation? Are there alternatives to worst case bounds (which can be warped by a small number of pathological instances) and average case results (which often are not particularly robust to changes in the probabilistic assumptions). \end{enumerate} \bibliographystyle{plain} \bibliography{clique} \end{document}