2009-02: Logical Analysis of Data: Classification with Justification

DIMACS TR: 2009-02

Logical Analysis of Data: Classification with Justification

Authors: Endre Boros, Yves Crama, Peter L. Hammer, Toshihide Ibaraki, Alexander Kogan and Kazuhisa Makino

ABSTRACT
Learning from examples is a frequently arising challenge, with a large number of algorithms proposed in the classification and data mining literature. The evaluation of the quality of such algorithms is usually carried out \textit{ex post}, on an experimental basis: their performance is measured either by cross validation on benchmark data sets, or by clinical trials. None of these approaches evaluates directly the learning process \textit{ex ante}, on its own merits. In this paper, we discuss a property of rule-based classifiers which we call ``justifiability", and which focuses on the type of information extracted from the given training set in order to classify new observations. We investigate some interesting mathematical properties of justifiable classifiers. In particular, we establish the existence of justifiable classifiers, and we show that several well-known learning approaches, such as decision trees or nearest neighbor based methods, automatically provide justifiable classifiers. We also identify maximal subsets of observations which must be classified in the same way by every justifiable classifiers. Finally, we illustrate by a numerical example that using classifiers based on ``most justifiable" rules does not seem to lead to overfitting, even though it involves an element of optimization.

Paper Available at: ftp://dimacs.rutgers.edu/pub/dimacs/TechnicalReports/TechReports/2009/2009-02.pdf

DIMACS Home Page