DIMACS Theoretical Computer Science Seminar

Title: Learning decision trees and DNF formulas in the average case

Speaker: Rocco Servedio

Date: February 8, 2005 2:00-3:00pm

Location: DIMACS Center, CoRE Bldg, Room 431, Rutgers University, Busch Campus, Piscataway, NJ


The problem of average-case learning for decision trees is as follows: a decision tree T (over Boolean features x1,...,xn) is chosen at random from some natural distribution over decision trees. The learning algorithm is given access to uniform random examples which are labeled according to T. The job of the learning algorithm is to efficiently construct a hypothesis function f which closely agrees with the unknown randomly chosen tree T.

Decision trees are an expressive representation for Boolean functions which have proven hard to learn in worst-case learning models. An even more challenging problem is that of learning an unknown DNF (Disjunctive Normal Form formula, i.e. an AND of ORs of Boolean variables) or CNF in the worst case.

In this work we give the first efficient algorithm for the average-case versions of these learning problems. Our algorithm for decision trees runs in polynomial time for random trees of logarithmic depth. Our algorithm for DNF and CNF formulas runs in polynomial time for random formulas with a subquadratic number of AND gates.

The talk will be self-contained and assumes no prior background in computational learning theory.

This is joint work with Jeff Jackson.