DIMACS TR: 2002-50
Spanned Patterns for the Logical Analysis of Data
Authors: Gabriela Alexe and Peter L. Hammer
ABSTRACT
In a finite dataset consisting of positive and negative observations
represented as real valued n-vectors, a positive (negative) pattern is an
interval in Rn with the property that it contains sufficiently many
positive (negative) observations, and sufficiently few negative
(positive) ones. A pattern is spanned if it does not include properly any
other interval containing the same set of observations. Although large
collections of spanned patterns can provide highly accurate classification
models within the framework of the Logical Analysis of Data, no efficient
method for their generation is currently known. We propose in this paper
an incrementally polynomial time algorithm for the generation of all
spanned patterns in a dataset, which runs in linear time in the
output; the algorithm resembles closely the Blake and Quine consensus
method for finding the prime implicants of Boolean functions. The
efficiency of the proposed algorithm is tested on various publicly
available datasets. In the last part of the paper, we present the results
of a series of computational experiments which show the high degree of
robustness of spanned patterns.
Paper Available at:
ftp://dimacs.rutgers.edu/pub/dimacs/TechnicalReports/TechReports/2002/2002-50.ps.gz
DIMACS Home Page