DIMACS TR: 2009-16

Leveraging Higher Order Dependencies Between Features for Text Classification



Authors: M.C. Ganiz, N.I. Lytkin and W.M. Pottenger

ABSTRACT

Traditional machine learning methods only consider relationships between feature values within individual data instances while disregarding the dependencies that link features across instances. In this work, we develop a general approach to supervised learning by leveraging higher-order dependencies between features. We introduce a novel Bayesian framework for classification named Higher Order Naive Bayes (HONB). Unlike approaches that assume data instances are independent, HONB leverages co-occurrence relations between feature values across different instances. Additionally, we generalize our framework by developing a novel data-driven space transformation that allows any classifier operating in vector spaces to take advantage of these co-occurrence relations. Results obtained on several benchmark text corpora demonstrate that higher-order approaches achieve significant improvements in classification accuracy over the baseline (first-order) methods.

Paper Available at: ftp://dimacs.rutgers.edu/TechnicalReports/TechReports/2009/2009-16.pdf
DIMACS Home Page