2002-50: Spanned Patterns for the Logical Analysis of Data

DIMACS TR: 2002-50

Spanned Patterns for the Logical Analysis of Data

Authors: Gabriela Alexe and Peter L. Hammer

ABSTRACT
In a finite dataset consisting of positive and negative observations represented as real valued n-vectors, a positive (negative) pattern is an interval in Rn with the property that it contains sufficiently many positive (negative) observations, and sufficiently few negative (positive) ones. A pattern is spanned if it does not include properly any other interval containing the same set of observations. Although large collections of spanned patterns can provide highly accurate classification models within the framework of the Logical Analysis of Data, no efficient method for their generation is currently known. We propose in this paper an incrementally polynomial time algorithm for the generation of all spanned patterns in a dataset, which runs in linear time in the output; the algorithm resembles closely the Blake and Quine consensus method for finding the prime implicants of Boolean functions. The efficiency of the proposed algorithm is tested on various publicly available datasets. In the last part of the paper, we present the results of a series of computational experiments which show the high degree of robustness of spanned patterns.

Paper Available at: ftp://dimacs.rutgers.edu/pub/dimacs/TechnicalReports/TechReports/2002/2002-50.ps.gz

DIMACS Home Page