Working Group on Computer-Generated Conjectures from Graph Theoretic and Chemical Databases I

Working Group Meeting: November 12 -16, 2001

Public Workshop: Graph Theory Day, Saturday, November 10, 2001

Location: DIMACS Center, CoRE Building, Rutgers University

Organizers:
Patrick Fowler, University of Exeter, P.W.Fowler@exeter.ac.uk
Pierre Hansen, GERAD - Ecole des Hautes Etudes Commerciales, pierreh@crt.umontreal.ca

This material is based upon work supported by the National Science Foundation under Grant No. 0100921


The Process of Discovery in the Mathematical Sciences

The process of scientific discovery is a very complex one. Computers can aid human beings in the process and, as has been a goal of researchers in artificial intelligence, in an automatic way. In the mathematical sciences, discovery can be thought to have three components: development of conjectures, formation of new concepts, and proving of theorems or disproving of conjectures. There has been a large amount of research done on automatic theorem proving. Here, we concentrate on the use of the computer as a tool in generating conjectures, whether in an automated way or as a tool used interactively by a person.

Over the years, a variety of programs have been written to produce mathematical conjectures, either automatically or interactively. Some of these programs have led to very interesting new conjectures and new theorems. An area of particularly widespread activity and interest recently has been graph theory and related areas of chemistry. Researchers from a variety of groups around the world have been working in this area and are engaged in the tasks of using computational power to develop new conjectures, eliminate uninteresting ones, publicize the remaining conjectures, and aid in their solution by working scientists, with or without the aid of the computer. Many of these efforts start with large databases, for example of graph invariants and of relations among them, or of chemical structures. We would like to bring together a group of researchers working on computer generated conjectures from graph-theoretic and chemical databases, including developers of some of the most important programs being used, to share their ideas and systems and initiate joint research efforts. We would also like to give a larger group of reseachers an opportunity to try these systems out.

Software for Automatic Generation of Conjectures

The use of computers in scientific discovery, and in particular in the development of mathematical conjectures, is not new. The paper by Larsen [C39] surveys automatic conjecture-generating software tracing back to the work of Lenat on the AM system beginning in 1975. (See for example [C41].)

Graffiti, a program developed by Siemion Fajtlowicz, was introduced in 1986. For a given set of invariants, conjectures are automatically generated and tested on a database of examples. The system discards many of the conjectures automatically, for instance those implied by others. The most interesting conjectures are proposed to the community of researchers and counterexamples that are produced are added to the database. There are now over 900 conjectures developed through Graffiti, many in graph theory and related areas of chemistry, but others in geometry, number theory, and other fields. See the URL http://www.math.uh.edu/~clarson/#fajt. This website provides information about the current status of these conjectures -- open, refuted, or proved -- and on developments related to these conjectures, such as new conjectures or theorems. Early descriptions of the Graffiti system can be found in a series of papers [C27, C28, C29, C30, C31] Some of the world's most distinguished graph theorists have worked on the conjectures generated by Graffiti. They include Noga Alon, Bela Bollobas, Fan Chung, Paul Erdos, Jerry Griggs, Daniel Kleitman, Laszlo Lovasz, Paul Seymour, and Joel Spencer. (See [C4, C17, C25, C26, C34, C36] for examples.) Among the more exciting products of Graffiti are conjectures in chemistry on the structure of fullerenes (as represented by graphs). The papers by Patrick Fowler and colleagues [C32, C33], motivated by Graffiti, show how some very simple ideas of symmetry, geometry, and graph theory can be used to count, construct, and classify fullerenes and make predictions about the types and structures of their chemical derivatives. As an aside, we comment on the chemistry involved. The main point of much of the conjecture-generating software we are summarizing is to use theory to give models and physical pictures that can be used to understand complicated chemical systems. The all-carbon fullerenes give an example of a relatively new class of molecules for which these conjecture-generating systems have already been very useful. Among the questions of interest about these molecules are: How many fullerenes are there? Which have greatest overall stability? What are their likely patterns of reactivity? Graph theory can be used to answer some of these questions since fullerenes can be viewed as planar cubic graphs having only faces of size 5 or 6. Collaborations between chemists and graph theorists in the working group follow on collaborations stimulated by the DIMACS workshop on Discrete Mathematical Chemistry ([C35]) and through workshops on computational chemistry in the Special Year on Mathematical Support for Molecular Biology.

The AutoGraphiX (AGX) system was developed more recently by Pierre Hansen and co-workers at GERAD, Universite de Montreal. See the paper [C14] for an introduction to this system. The AGX system is designed to find (heuristically) extremal graphs for some invariant. To that effect, Caporossi and Hansen [C14] have made use of a ``variable neighborhood search'' metaheuristic within AGX, developed by viewing the conjecture generation task as a combinatorial optimization problem on an infinite family of graphs. By making use of parametrization, the system produces a set of presumably extremal graphs in an automated way. Conjectures can then be found either interactively or automatically. In the latter case, the process exploits a numerical approach based upon principle components analysis, a geometric approach using the notion of convex hull in a space of chosen invariants, or an algebraic approach based on graph recognition and known relations among families of graphs. In applications in graph theory, AGX has been used, for example, to obtain results about the largest eigenvalue for color-constrained trees (see [C22]). In applications related to chemistry, the system AGX has been used to obtain bounds for the connectivity of chemical trees and for the ``energy'' E of a graph, a problem for which much more complicated bounds had previously been obtained. See [C12, C13]. It has also found tight bounds on extremal chemical graphs for the much-studied Randic index (see [C13]).

An earlier system we should mention is called Graph (see [C19]). This system, due to Dragos Cvetkovic and colleagues, is aimed at enhancing the capacity of the graph theorist to explore graphs and families of related graphs through computation of invariants and visualization. In a review of Graph [C21], 55 papers applying it are mentioned. Brief mention should also be made of the system HR (for Hardy-Ramanujan) that was developed by Simon Colton and others to form concepts, make conjectures, and use a theorem prover and model generator to prove or disprove those conjectures. HR has so far produced conjectures in group theory and number theory. See [C10, C11, C18]. We give this system less emphasis here because it has so far not been particularly directed at graph theory or chemistry.

Other Relevant Software

We mention a number of other systems that do not generate conjectures automatically, but have been widely used as tools in scientific discovery/conjecture generation and also in education in discrete mathematics.

The LINK software system was developed at DIMACS starting in 1991. (See paper [C3] and see the URL http://zhivago.elon.edu/~berryj/LINK.html.) LINK resulted from the DIMACS workshop on Software for Discrete Mathematics, during which we brought together researchers working on software from different points of view, using different platforms, emphasizing different fields of discrete mathematics. (See [C23].) LINK is a software system designed to be a general-purpose environment for experimentation with discrete mathematics. The environment is friendly enough so that non-technical users will be able to visualize problems easily, yet powerful enough for computer scientists and mathematicians to express complicated computations. The system is designed to support educational pursuits, to be used as a research tool, and to be useful in solving real-life problems. LINK grew out of the experiences of the authors of Combinatorica, due to Steven Skiena (see [C44]), NETPAD, due to Nate Dean and others at Bellcore (see [C42]), SetPlayer, due to Mark Goldberg and his students (see [C1]), and GraphLab, due to Greg Shannon, et al. (see [C43]). Skiena, Dean, Goldberg, and Shannon came together through DIMACS to develop LINK. Also, Jonathan Berry, a DIMACS postdoc, played an important role in developing LINK and currently maintains the LINK website. LINK has been used as a tool to develop and analyze graph-theoretical conjectures, as well as a tool to explain graph-theoretical concepts and a tool for practical applications of graph theory and discrete mathematics. For example, Brenda Latka has used the system to generate conjectures about infinite antichains of tournaments (complete directed graphs) [C16, C40]). Of some interest are some of the non-mathematical uses of LINK. Berry and Dean [C2] used LINK to analyze correlations between purchases using supermarket data. An interesting offshoot of LINK has been its adoption by the Internal Revenue Service in fraud detection. This resulted from the system's ability to detect common patterns.

The system INGRID was developed by Robert Brigham, Ronald Dutton, and Felix Gomez in the late 1980's. INGRID was used to generate patterns, thus suggesting conjectures to its users, with an emphasis on parameters of graph theory. This system and results obtained from it are summarized in [C5] and [C8]. INGRID automatically bounded invariants of graphs from values for some given invariants stored in a database and by manipulation of a database of formulas between graph invariants. A summary of results obtained by INGRID can be found in [C6, C7].

Briefly, here are some other systems of interest. VEGA, of which Tomasz Pisanski is a major developer, is a system for ``doing'' discrete mathematics that is based on Mathematica and springs from the Combinatorica system developed by Steve Skiena (see [C44]) and the nauty system developed by Brendan McKay (see the URL http://cs.anu.edu.au:80/people/bdm/nauty). VEGA allows researchers, teachers, and students to quickly test ideas on small and mid-size examples. (See the URL http://vega.ijp.si/Htmldoc/vinfo.htm.)

Plantri, developed by Brendan McKay and Gunnar Brinkmann, is a program for generating planar triangulations and planar cubic graphs. It includes a fullerene generator fullgen, written by Brinkmann. Fullerenes, as we have noted, can be viewed as planar cubic graphs having only faces of size 5 or 6. See http://cs.anu.edu.au/people/bdm/plantri for general information about plantri and fullgen and [C9] for information on generation of fullerenes.

LEDA (Library of Efficient Data Types and Algorithms) was started in 1988. Stefan Naeher is a primary developer. LEDA provides a collection of data types and algorithms for ``combinatorial computing''. (See http://www2.informatik.uni-halle.de/~naeher/Manual/Preface.html.) The Graph Theorist, GT, is an older system developed by Susan Epstein (see [C24]). It is a learning system that uses algorithmic class descriptions to discover and prove relations among mathematical concepts. CABRI, developed by Jean Marie Laborde, has recently been aimed at the use of the computer to learn about and explore concepts of geometry. (See [C37, C38].) However, it has also been widely used for exploring graphs and families of related graphs, as summarized in [C15].

Some of the Issues

The primary goals of the working group are to foster an interchange of ideas among those working on different approaches to the use of computers to generate scientific conjectures, mainly in graph theory and in chemistry, and to initiate new collaborative research in this area. We anticipate exchange of views both about methods and about specific conjectures. Among the questions to be addressed are the following:


Working group home page
DIMACS Homepage
Contacting the Center
Document last modified on April 24, 2001.