DIMACS Mini Workshop:
Exploring Large Data Sets Using Classification, Consensus, and Pattern Recognition Techniques (May 29-30, 1997)
From Organizers
Data mining of massive data sets brings a great interests to combinatorial
methods of data analysis and, particularly, for combinatorial clustering.
Initial ideas of clustering so simple that in many field practitioners develop
such models by themselves without any support from computer science experts.
At the same time the theory of the methods is high development now and open
a real new horizon for their application specially when data base is
very large and non enough studied.
The main goal of the Mini Workshop "Exploring Large Data Sets Using
Classification, Consensus, and Pattern Recognition Techniques"
(May 29-30, 1997 DIMACS Center, Rutgers University) was to bring
together for discussion methodological researchers in clustering and
practitioners who need or/and use clustering methods. Some related
models, for instance, consensus were presented on the Mini Workshop.
This Technical Report contains extended abstracts of talks from the Workshop
(some of them are just original). It divided into two part. In the first one
we put papers which focused mostly into methodological aspects of explorative
analysis. The second one contains papers which are mostly describe concrete
applications. We put in the volume also original abstracts from whom we
didn't get an extended ones.
Preface
The fields of Classification Theory and Pattern Recognition have been
maturing over the past 30 years into powerful collections of theory-based data
analysis techniques. During this development, methods from discrete
mathematics and theoretical computer science have had greater and greater
impact. It has now become clear that combinatorial methods for data analysis,
especially combinatorical clustering, have the potential to significantly
affect data mining and other approaches to the analysis of massive data sets.
The main goal of the Miniworkshop "Exploring Large Data Sets Using
Classification, Consensus, and Pattern Recognition Techniques"
(May 29-30, 1997 at DIMACS Center, Rutgers University) was to bring
methodological researchers together with practitioners to investigate
problem areas where classification/consensus/pattern recognition might
be developed more specifically for exploring various types of large data sets.
This Technical Report contains abstracts of the talks presented. Some of
the abstracts were 'extended' just for this Report. It is divided into two
sections. The first section contains papers focused mostly on methodological
aspects of exploratory analysis. The section one contains papers which mostly
describe applications.
Table of Contents
- David Banks
- The Analysis of Superlarge Datasets
(abstract)
- Moses Charikar
- Incremental Clustering and Dynamic Information Retrieval
(abstract)
- Jaime Cohen and Martin Farach
- Pivot Algorithms for Clustering
(postscript file)
- Corinna Cortes and Daryl Pregibon
- Tracking STARS in the Universe
(postscript file)
- Lenore Cowen
- Approximate Distance Methods for Clustering
High-Dimensional Data
(abstract)
- Dan Daly and Anne. M. Chaka
- Predicting Intake Valve Deposits: A Joint QSAR Project
Between LZ and Purdue University Employing Neural Networks
and First Principles Modeling
(abstract)
- Nate Dean and Kiran Chlakamarri
- A Measure for Analyzing Group Interaction
(abstract)
- Oya Ekin, Peter Hammer and Alexander Kogan
- Convexity in Logical Analysis of Data (LAD)
(postscript file)
- Saveli Goldberg
- Inference Engine the Systems of the Dr. Watson Type
(Microsoft Word file)
- V.G.Grishin
- Pictorial Methods with Applications to Monitoring,
Diagnostics and Control in Industrial Processes
(html file)
- Leonid Gurvits
- Traditional and not-so-Traditional Applications
of VC-dimension and its Generalizations
(abstract)
- Pierre Hansen and Nenad Mladenovic
- Large Scale Clustering by Variable Neighborhood Search
(abstract)
- Haym Hirsh
- Learning to Recommend
(abstract)
- Sorin Istrail and R. Ravi
- Multiple Alignment of Biomolecular Sequences and
Voting Paradoxes
(abstract)
- O.K. Kedrov
- Algorithm of Multichannel On-Line Detection
of Seismic Signals at Three-Component Station
(gzipped postscript file)
- Christopher Landauer
- "Thar She Blows!": Analysis of Yellowstone Geyser Eruptions
(postscript file)
- Yann LeCun, Yoshua Bengio, Leon Bottou,
Corinna Cortes and Vladimir Vapnik
- Neural Networks and Other Numerical Learning Techniques
for Pattern Recognition in Large Data Sets
(abstract)
- Vyacheslav Mazur and Alexander Genkin
- Pareto Distributions in Business Modeling
(abstract)
- Alex Meystel
- Algorithms of Unsupervised Learning for Organizing and
Interpreting Large Data Sets
(Microsoft Word file)
- Boris Mirkin
- Approximation clustering as a framework for solving
challenging problems in processing of massive data sets
(gzipped postscript file)
- David Ozonoff
- Environmental Epidemiology: A Natural Application
for Discrete Mathematics
(Microsoft Word file)
- William Shannon
- Clustering in Large Biomedical Databases
(postscript file)
- Mark Stitson and J. Weston
- Function Approximation Using SV Machines
(abstract)
- Simon Streltsov
- Traffic Behavior Pattern Analysis
(abstract)
- Vladimir Yancher and Alexander Genkin
- Multidimensional visualization using rectangles
for business applications
(postscript file)