DIMACS TR: 2009-02
Logical Analysis of Data: Classification with Justification
Authors: Endre Boros, Yves Crama, Peter L. Hammer, Toshihide Ibaraki, Alexander Kogan and Kazuhisa Makino
ABSTRACT
Learning from examples is a frequently arising challenge, with a large number
of algorithms proposed in the classification and data mining literature.
The evaluation of the quality of such
algorithms is usually carried out \textit{ex post}, on an experimental basis: their
performance is measured either by cross
validation on benchmark data sets, or by clinical trials. None of these
approaches evaluates directly the learning process \textit{ex ante},
on its own merits.
In this paper, we discuss a property of rule-based classifiers which we call
``justifiability", and which focuses on the type of information extracted from
the given training set in order to classify new observations.
We investigate some interesting
mathematical properties of justifiable classifiers.
In particular, we establish the existence of justifiable classifiers, and
we show that several well-known learning approaches,
such as decision trees or nearest neighbor based methods, automatically
provide justifiable classifiers. We also identify maximal subsets of observations
which must be classified in the same way by every justifiable classifiers.
Finally, we illustrate by a numerical example
that using classifiers based on ``most justifiable" rules does not seem to
lead to overfitting, even though it involves an
element of optimization.
Paper Available at:
ftp://dimacs.rutgers.edu/pub/dimacs/TechnicalReports/TechReports/2009/2009-02.pdf
DIMACS Home Page