WORKSHOP CANCELLED

DIMACS Workshop on Data Mining and Scalable Algorithms

August 22 - 24, 2001
DIMACS Center, Rutgers University, Piscataway, NJ

Organizers:: Alex Smola, Australian National University, Alex.Smola@anu.edu.au; Paul Bradley, Digimine Inc., paulb@digimine.com; Nello Cristianini, Royal Holloway College, University of London, N.Cristianini@dcs.rhbnc.ac.uk; Olvi Mangasarian, University of Wisconsin, olvi@cs.wisc.edu

Presented under the auspices of the Special Focus on Data Analysis and Mining.

With the availability of very large collections of data, the areas of machine learning, statistics, optimization, and databases face the challenge of making efficient use of this information. Data mining targets the problem of finding useful, interesting, and understandable structure or models derived from the data. While there exist advanced techniques for dealing with nonparametric estimators efficiently when only limited data is available, often algorithms for large amounts of data resort to a rather limited class of possible estimates such as linear models or the assumption that the data can be represented by a small number of clusters. This restriction is mainly imposed due to implementation constraints.

Yet this situation is paradoxical since complex models could be more easily justified from a statistical point of view, especially when data is abundant. It gives rise to the question whether statistical methods exist that strike a better balance between complexity and performance.

Aims and Topics

Practical Limits of Nonparametric Methods: Runtime, storage, relation to nearest neighbor methods.
Practical Limits of Parametric Models: Is data really nonlinear or is a simple model good enough?
Handling Categorical Data: Kernels for categorical data, data with mixed numeric and categorial attributes.
Novelty Detection and Discovering Patterns: Fraud detection, modeling temporal/cyclic data.
Missing or Censored Data
Efficiency: Integration with database systems, efficient model building, efficient model deployment, large datasets.
Data and Feature Selection: Reduced dataset and feature methods
Small Training Set - Large Test Set: Can we gain anything by transduction or EM?
Understandability and Visualization: Prediction explanation, data visualization/navigation.
Applications: Collaborative filtering, text classification (e.g. email classification), mining of massive document repositories (withhypertextual, multilingual, multimedia features).

Next: Call for Participation

Workshop Index

DIMACS Homepage

Contacting the Center
Document last modified on July 17, 2001.

WORKSHOP CANCELLED

DIMACS Workshop on Data Mining and Scalable Algorithms

August 22 - 24, 2001 DIMACS Center, Rutgers University, Piscataway, NJ

Aims and Topics

August 22 - 24, 2001
DIMACS Center, Rutgers University, Piscataway, NJ