DIMACS Theoretical Computer Science Seminar

Title: Optimization and Learning for Online Decision Making

Speaker: Shipra Agrawal, Columbia University

Date: Wednesday, November 11, 2015 11:00am-12:00pm

Location: CoRE Bldg, Room 301, Rutgers University, Busch Campus, Piscataway, NJ


In this talk, I will present techniques that combine optimization and learning for decision making in complex, uncertain, online environments. Much of this work is motivated by challenges faced in modern revenue management problems, namely, a) unknown or difficult to estimate demand distributions, b) multiple complex nonlinear constraints and objectives, c) the need for fast large-scale algorithms, and d) personalized decisions. Formulating these problem aspects into an "online stochastic convex programming" framework, we devise fast algorithms that combine primal-dual paradigm with online learning to achieve provably optimal performance bounds. When applied to the special case of online packing, our ideas yield simpler and faster algorithms with optimal competitive ratio for this widely studied problem.

I will further discuss a "bandit" property present in many revenue management problems, where the uncertain demand at each point in time is determined by the decision, and can only be observed "after" taking the decision. For example, in pay-per-click revenue model in internet advertising, a click (or no click) on an ad can be observed only after the ad is selected and displayed in response to the user query. Similar situations occur in many other applications including posted-price based revenue management, worker-task allocation problem in crowdsourcing, machine scheduling, sensor network management etc. Modeling this problem as a combination of the classic multi-armed bandit problem and online stochastic convex programming, we design algorithms that balance the inherent exploration-exploitation tradeoff to achieve asymptotically optimal regret bounds. Our results significantly improve upon several known results in online linear programming and multi-armed bandits literature, and reveal many interesting connections between convex optimization algorithms, Fenchel duality, multi-armed bandits and online learning.

This talk is based on joint work with Nikhil R. Devanur.

See: https://dragon.rutgers.edu/~mk1029/theoryseminarF15.html