Title: Reducing Contextual Bandits to Supervised Learning
Speaker: Daniel Hsu, Columbia University
Date: Wednesday, April 27, 2016 11:00am-12:00pm
Location: CoRE Bldg, Room 301, Rutgers University, Busch Campus, Piscataway, NJ
Abstract:
I'll describe a fast and simple algorithm for the contextual bandit learning problem, where the learner repeatedly takes an action in response to the observed context, observing the reward only for that action. The algorithm assumes access to an oracle for solving cost-sensitive classification problems and achieves the statistically optimal regret guarantee with only O(T^1/2) oracle calls across all T rounds. Joint work with Alekh Agarwal, Satyen Kale, John Langford, Lihong Li, and Rob Schapire.
See: http://www.cs.rutgers.edu/~allender/theory-spring16.html