DIMACS Theoretical Computer Science Seminar

Title: Reducing Contextual Bandits to Supervised Learning

Speaker: Daniel Hsu, Columbia University

Date: Wednesday, April 27, 2016 11:00am-12:00pm

Location: CoRE Bldg, Room 301, Rutgers University, Busch Campus, Piscataway, NJ


I'll describe a fast and simple algorithm for the contextual bandit learning problem, where the learner repeatedly takes an action in response to the observed context, observing the reward only for that action. The algorithm assumes access to an oracle for solving cost-sensitive classification problems and achieves the statistically optimal regret guarantee with only O(T^1/2) oracle calls across all T rounds. Joint work with Alekh Agarwal, Satyen Kale, John Langford, Lihong Li, and Rob Schapire.

See: http://www.cs.rutgers.edu/~allender/theory-spring16.html