DIMACS TR: 2009-16
Leveraging Higher Order Dependencies Between Features for Text Classification
Authors: M.C. Ganiz, N.I. Lytkin and W.M. Pottenger
ABSTRACT
Traditional machine learning methods only consider relationships
between feature values within individual data instances while
disregarding the dependencies that link features across instances. In
this work, we develop a general approach to supervised learning by
leveraging higher-order dependencies between features. We introduce a
novel Bayesian framework for classification named Higher Order Naive
Bayes (HONB). Unlike approaches that assume data instances are
independent, HONB leverages co-occurrence relations between feature
values across different instances. Additionally, we generalize our
framework by developing a novel data-driven space transformation that
allows any classifier operating in vector spaces to take advantage of
these co-occurrence relations. Results obtained on several benchmark
text corpora demonstrate that higher-order approaches achieve
significant improvements in classification accuracy over the baseline
(first-order) methods.
Paper Available at:
ftp://dimacs.rutgers.edu/TechnicalReports/TechReports/2009/2009-16.pdf
DIMACS Home Page