The goal of the DIMACS Workshop on Data Mining and Digital Libraries is to bring together researchers whose work bears on the following question: "Since the content of a Digital Library is machine readable, to what extent can the computers that maintain the library also extract new and useful information from that content?"
The workshop aims at better communication among members of the diverse communities which all have something important to offer about the problem. A key challenge in the course of the meeting, will be to develop a common language, enabling workshop participants to bring their own knowledge to bear on the problems that interest others.
Among the application areas to be explored are:
This workshop welcomes researchers who are addressing one or more of these problems: image indexing, sound indexing, text indexing, clustering of (image, sound, text); collection selection; information retrieval; pattern finding in text collections. We are also interested in new research on the problem of evaluating schemes for either retrieval or clustering and classification, that can bring some much needed rigor to this problem.
We also welcome discussion of the issue of "evaluating" systems. Two practically oriented "gold standards" have emerged. One is the TREC setting, a DARPA/NIST/ARDA sponsored annual workshop in which dozens of systems address the same fixed set of tasks, with performance on those tasks being assessed impartially at NIST. The other is the commercial web setting, where schemes compete to attract the eyeballs of surfers, and the methods are often tailored to hold the user's attention, while providing acceptable levels of organization and retrieval. The effectiveness (in, say, the TREC senses) of the commercial schemes is not well understood. Perhaps neither set of measures will ultimately prove most effective in defining the new technology of digital libraries.
The organizing committee will, in consultation with the authors, organize papers into a series of focussed working sessions. There will be no parallel sessions, so participants will be able to attend all sessions. Presenters are encouraged to present both recent results and open questions. In keeping with DIMACS tradition, we are most interested in mathematically rigorous approaches to these problems. We recognize that, practice may precede theory, and we also welcome papers presenting methods which prove very effective, even though the theoretical justification is unsatisfactory.