DIMACS Workshop on Data Mining and Scalable Algorithms

August 22 - 24, 2001
DIMACS Center, Rutgers University, Piscataway, NJ

Alex Smola, Australian National University,
Paul Bradley, Digimine Inc.,
Nello Cristianini, Royal Holloway College, University of London,
Olvi Mangasarian, University of Wisconsin,
Presented under the auspices of the Special Focus on Data Analysis and Mining.

With the availability of very large collections of data, the areas of machine learning, statistics, optimization, and databases face the challenge of making efficient use of this information. Data mining targets the problem of finding useful, interesting, and understandable structure or models derived from the data. While there exist advanced techniques for dealing with nonparametric estimators efficiently when only limited data is available, often algorithms for large amounts of data resort to a rather limited class of possible estimates such as linear models or the assumption that the data can be represented by a small number of clusters. This restriction is mainly imposed due to implementation constraints.

Yet this situation is paradoxical since complex models could be more easily justified from a statistical point of view, especially when data is abundant. It gives rise to the question whether statistical methods exist that strike a better balance between complexity and performance.

Aims and Topics

Next: Call for Participation
Workshop Index
DIMACS Homepage
Contacting the Center
Document last modified on July 17, 2001.