Project Summary

This project supports three interdisciplinary working groups who are exploring specific research areas. Each group consists of researchers with expertise in the field and/or in one of the applications areas. The groups are concerned with streaming data analysis and mining, multidimensional scaling, and computer-generated conjectures.   Two of the groups have met twice (the third has another meeting planned), with informal presentations and lots of time for discussion and interaction. There have  been three public workshops and two tutorials with more formal presentations, where others in the community interested in the field were given an opportunity to learn about the working group's efforts and the working group had the opportunity to learn about the efforts of others.  We have invited subgroups of researchers back for more intensive collaborations on ideas generated in the preliminary meetings.  The fields of the groups’ members include theoretical and applied computer science, statistics, discrete and non-discrete mathematics, chemistry, astronomy, economics, psychology, information theory, management science, ecology, molecular biology, and others. DIMACS has a long and successful history of getting researchers with different backgrounds and approaches together, stimulating new collaborations, setting the agenda for future research, and acting as a catalyst for major new developments at the interface among disciplines and we built on this tradition with these working groups.

S. Fajtlowicz, P. Fowler, P. Hansen, M. Janowitz and F. Roberts (eds.), Graphs and Discovery, Proceedings of the DIMACS Workshop on Computer-Generated Conjectures from Graph Theoretic and Chemical Databases, American Math. Society. In preparation.

Project Impact

There have been important developments in each of the working groups. The interconnections among people working in data analysis but from different points of view were particularly striking. In the case of multidimensional scaling, experts in cluster analysis were matched with researchers in combinatorial optimization, for example. In streaming data analysis, those working on frequency statistics were matched with those working on probabilistic 'sketches.' In the case of computer-generated conjectures, developers of different methods compared and contrasted their software, with lively discussions as to the pros and cons, advantages and limitations, of each method. There were many connections made across disciplines.  In the MDS working group, participants represented areas such as chemometrics, marketing, social networks, ecology, biomolecular databases, social and clinical psychology. Of particular note is the connection to industry, with Kraft Foods, AT&T Labs, etc. participating. Several new applications of MDS have been jump-started or significantly enhanced; image visualization, structure of molecules, MRI imaging, distance geometry, data mining.  In the Streaming Data Analysis and Mining working group, applications areas discussed included financial modeling, astrophysics, and homeland security.  In the Computer-generated Conjectures working group, there was a strong mix of computer scientists, mathematicians, and chemists. A large number of the problems dealt with chemical structures. One of the most exciting results obtained was proof of a computer-generated conjecture about the separator of fullerenes.

Goals, Objectives and Targeted Activities

The goals were to formulate problems, share ideas and approaches, and set an agenda for future interactions. The emphasis was on unifying promising approaches that come from many distinct communities of researchers. The topics of interest included methodologies and algorithms for data mining, including clustering, discriminant analysis, enumerative methods, and multidimensional scaling; the increasingly abstract formulations and models of data mining questions using logical methods, conceptual clustering, learning and discovery that are critical in data mining and in particular for automatic, intelligent decision making; and the special problems that arise from applications to such important areas as fraud and intrusion detection, web mining, medical and scientific databases, marketing, and natural language data. Interdisciplinary working groups were formed and meetings were held from which fruitful collaborations developed.

Area Background

Theoretical and algorithmic approaches to data analysis have played a central role in the development of modern methods for handling data. Now, however, the massive amounts of data gathered in important modern applications ranging from the Internet to credit card fraud detection to astronomy and medicine have dramatically changed the requirements for algorithms and provide ample motivation for a great deal of new theoretical development. We need methods for data analysis and mining that scale to the huge volumes of data in such applications.

