DIMACS Workshop: On-Line Decision Making

July 12 - 15, 1999
Busch Student Center, Rutgers University, Busch Campus, Piscataway, NJ

Organizers:

Yoav Freund, AT&T Labs-Research
Rakesh Vohra, Northwestern University
ABSTRACTS


1.

Introduction and overview
Yoav Freund, AT&T Labs-Research

In this introductory talk I will try and give a roadmap for some of the
main results that will be covered in this workshop. My emphasis will be on
algorithms and their performance bounds. I will overview the work in the
following directions and some of them inter-relationships:

- Universal compression, MDL, Bayesian prediction and universal portfolios.
- Learning to play repeated matrix games. 
- Competitive analysis of online algorithms. 
- The aggregating strategy of Vovk.
- The metric based analysis of online learning of Kivinen and Warmuth.


2.

Learning in Games
Rakesh Vohra, Northwestern University and David Levine, University of
California - Los Angeles

In this talk I will provide an overview of what the `learning programme' in
Game Theory is supposed to accomplish as well as highlight some of its
conclusions many of which will be described in the talks to come.  I will
also describe some of the connections to computational learning theory.

What makes learning in games intersting, is that in addition to the usual
exploration/exploitation tradeoff common in computational learning theory,
there is the veiled issue of `teaching'. Actions taken on the basis of what
one has learned affect the rivals hypothesis about  ones own future
behavior.


3.

An overview of the competitive analysis of online algorithms and its
relation to game theory
Allan Borodin 	University of Toronto
Ran El-Yaniv	The Technion, Israel

In this talk we will provide an overview of Online Computation and
Competitive Analysis. In particular, we will discuss the competitive
ratio measure (i.e. the multiplicative regret in the context of
repeated decision making) from the perspective of (the worst case)
complexity analysis.  We will emphasize general (upper and lower
bound) analysis techniques, models for online problems and the
relation between various problems, the role of randomization, and the
relation to more classic fields such as decision theory, game theory,
and online learning.  We will also discuss the most significant
contributions to date, and major open problems.


4.

The Prequential Approach to Probability and Statistics
Philip Dawid, University College London

"Prequential" means "predictive sequential".  In this talk I will give
an outline of the fundamental ideas of prequential Probability and
Statistics.  Consider a sequential game between two players,
Forecaster and Nature.  A conventional probability model can be
constructed by ascribing suitable strategies to each player.  The
prequential approach concentrates, instead, on the sequences of
choices made by each player, and attempts to see what can be said
under weak, or absent, assumptions about underlying strategies.  By
considering suitable criteria for the "agreement" between forecasts
and outcomes, we obtain a new game-theoretic construction of
Probability Theory.  Similarly, a standard statistical model can be
considered as a strategy for a parametrised collection of "experts".
Prequential Statistics focuses attention on the actual forecasts
issued, eschewing, so far as possible, further assumptions about
underlying strategies.  I shall argue that little of value is lost,
and much insight is gained, by imposing such constraints.


5.

Decision Theory of Regret for Universal Coding, Gambling, and Prediction
Andrew Barron, Yale University

  This talk presents some new aspects of an information-theoretic game
and its implications for universal coding, gambling, and for
prediction using the log loss.  We are given a parameterized family of
strategies which is used to define the target level of performance,
associated in hindsight with the shortest code length, greatest
wealth, and best cumulative log loss.  The regret is the difference in
what we achieve by an on-line strategy and the ideal target level on
the cumulative log scale.  In this talk we examine to what extent we
should restrict attention to Bayes strategies when examining expected
regret or minimax point-wise regret criteria.

   In the finite sample expected regret setting, a Pythagorean
inequality of information theory is used to show that strategies that
are not information limits of Bayes strategies are beaten by a
constant amount, uniformly over the parameter space.

   In an asymptotic minimax point-wise regret setting, we consider
exponential and certain non-exponential families.  In exponential
families certain Bayes mixtures are asymptotically minimax.  However,
for non-exponential families, joint work with Jun-ichi Takeuchi shows
that to obtain an asymptotically minimax strategy the mixture can not
be restricted to the given family, yet asymptotic minimaxity can be
achieved by mixtures over a certain enlargement of the given family.

  In the point-wise regret setting, asymptotically maximin and minimax
strategies have information divergence from the Shtarkov normalized
maximum likelihood distribution that tends to zero as the sample size
gets large, as shown in joint work with Trent Xie.


6.

Reinforcement learning as a useful approximation: accuracy of prediction 
in experimental games.
Alvin Roth, Harvard University

One of the goals of experimental economics is to develop models of
behavior that can accurately predict the behavior of human players in
strategic environments.  This is a different task than developing
models that learn quickly to play well.  And,to the extent that models
are viewed as useful approximations (rather than as precisely true
descriptions), this raises some questions about how to evaluate models
(when they can be rejected by conventional statistical
techniques). Preliminary results suggest that simple reinforcement
models are quite useful for predicting human behavior in a wide
variety of rather simple strategic environments.  These results also
give suggestions for how to proceed in exploring more complex
strategic environments.


7.

Adaptive Strategies, Regret and Approachability
Sergiu Hart, Hebrew University, Jerusalem

(joint work with Andreu Mas-Colell, Universitat Pompeu Fabra, Barcelona)

Consider strategies for the repeated play of a game (or a decision
problem) which, while simple to implement, generate desirable outcomes
(like "no-regret", "calibration", and "correlated equilibria").  An entire
class of such adaptive strategies is characterized and exhibited.  It
includes as particular cases "smooth fictitious play" and
"regret-matching".  The basic tool for the analysis is a generalization of
Blackwell's approachability strategy for games with vector payoffs.


8.

Information geometry and reconstruction problems 
Imre Csiszar, Hungarian Academy of Sciences, Budapest
 
Consider the problem of reconstructing a real valued
function f(x) when the available information specifies 
only a feasible set F of functions that f belongs to. 
Then, if a default model g is also given, the closest
element of F to g is accepted as reconstruction; diffe-
rent measures of distance may lead to different recon-
structions. If f is known to be non-negative, e.g., a
probability density, an intensity, or attenuation fun-
ction, the Kullback-Leibler distance (infomation diver-
gence) is a favored choice, while for arbitrary real-
valued functions, the mean-squared distance is most 
preferred. These choices have been distinguished also
by an axiomatic approach.
Geometric properties of information divergence similar 
to those of Euclidean (or mean-squared) distance, par-
ticularly the Pythagorean identity, are very useful for 
reconstruction. A larger class called Bregman distances
also shares these properties. In this talk, results in
this direction will be surveyed, covering also iterative
algorithms such as iterative scaling, generalized itera-
tive scaling (or SMART), and the EM algorithm.
  

9.

Learning to play games using multiplicative weights
Robert Schapire, AT&T Labs

We study the problem of learning to play a repeated game against any
adversary. We present a simple algorithm which is guaranteed to come close
to the best payoff achievable with respect to the sequence of plays taken
by the opponent. The bounds we obtain are not asymptotic and hold for any
finite number of rounds. We also show that the bounds are optimal in a
particular sense. The algorithm and its amortized analysis are based
directly on the ``on-line prediction'' methods first studied by Littlestone
and Warmuth. We also study a variant of this problem in which the learner 
receives very limited feedback about the game that is being played; in 
essence, this is an adversarial version of the multi-armed bandit problem. 
We present an effective algorithm for this problem, and also prove lower 
bounds on the best possible performance.

This talk includes joint work with Peter Auer, Nicolo` Cesa-Bianchi and
Yoav Freund.


10.

Worst-case analyses of the exploration/exploitation tradeoff
Phil Long, National University of Singapore

In many natural settings, an on-line learning algorithm must trade between
taking actions that its current hypothesis suggests are good, and taking
actions to obtain information that will potentially improve its hypothesis.
I will survey some worst-case analyses of these types of problems.

11.

Tracking the Best Predictor
Mark Herbster, Royal Holloway University

(joint work with Manfred Warmuth)

In this talk I will present on-line learning algorithms which are
designed for combining classifiers or ``experts'' in nonstationary
environments.  The motivating assumption of this work is as follows:
given a set of classifiers, and a data sequence whose underlying
generative process is changing over time, one expects that different
classifiers will perform well over different segments of the sequence.
I give an algorithm whose performance guarantee is an additive term
larger than the performance of the ``best'' sequence of classifiers.
The additive term measures the degree of nonstationarity in the
sequence of classifiers.


12.

Rational Learning
Ehud Kalai, Northwestern University

Rational learning is learning guided by expected utility maximization.  It
implies Bayesian updating.  The lecture will survey concepts and some of the
results in both single person and multi-person learning. Specific topics may
include notions of merging, where Bayesian updating of subjective beliefs
leads a player to the true probabilities, being calibrated, where empirical
tests of observed data confirms the subjective beliefs, and related notions
of equilibrium.


13.

Rational Bayesian Learning in Repeated Games
John Nachbar, Wash University, St. Louis

For a large set of 2-player discounted infinitely repeated games of complete
information (stage game payoff functions are common knowledge), there does
not exist any Bayesian learning theory in which player beliefs are weakly
cautious, symmetric,  and consistent.   Loosely, weak caution means that if
a repeated game strategy is in the support of a player's belief then so are
computationally trivial variants of that strategy, symmetry means that the
supports of player beliefs contain repeated game strategies of comparable 
strategic complexity, and consistency means that the support of each player's 
belief contains one of his opponent's epsilon best responses.  A similar 
impossibility theorem holds for repeated games of incomplete information 
(stage game payoff functions are private information).  The impossibility 
theorems casts doubt on whether one can construct a convincing theory of 
rational Bayesian learning in repeated games.


14.

"When rational learning fails"
Dean Foster, Wharton School, University of Pennsylvania and Peyton Young,
Johns Hopkins University

Consider an infinitely repeated 2-person game played by Bayesian
rational players.  Before the game begins, the payoffs of each player
are perturbed by iid random errors drawn from a distribution whose
support is a small interval containing the game matching pennies.  Then
with probability one some player almost surely fails to learn how to
predict the behavioral strategies of the other player.  Further, with
probability one, the players' strategies will almost surely not come
close to any Nash equilibrium of the repeated game.  A particular
consequence is that, for some games of incomplete information, the
Kalai-Lehrer hypothesis about beliefs holds with probability zero.


15.

`Impossibility' of Absolute Continuity
Chris Sanchirrico, Columbia University

Two agents with different priors watch a sequence unfold over time, updating
their priors about the future course of the sequence with each new
observation.  Blackwell and Dubins (1962) show that the agents' opinions
about the future will converge if, roughly speaking, their priors are
absolutely continuous: i.e., if both agents agree about which events are
possible.  We show that the probability that two priors drawn at random have
any chance of weakly merging (a fortiori, merging) is zero.  As a corollary,
we establish that two randomly drawn priors will almost surely be singular
(a fortiori, not absolutely continuous).


16.

Minimizing regret: the general case
Aldo Rustichini, Tilburg University

Abstract In repeated games with differential information on one side, the
labelling ``general case'' refers to games in which the action of the
informed player is not known to the uninformed, who can only observe a
signal which is the random outcome of his and his opponent's action. Here
we consider the problem of minimizing regret (in the sense first formulated
by Hannan) when the information available is of this type. We give a simple
condition describing the approachable set.


17.

Reasonable Learning in Non-cooperative Games and the Internet
Eric Friedman, Rutgers University, Department of Economics

In this talk I will propose a definition of what makes a model of 
learning in games "reasonable." 
Such definitions are clearly context dependent, so I will focus on 
the Internet as a particular case study, but will also consider 
several other settings of Interest to Economists and Computer 
Scientists.  My definition of reasonableness involves computational 
complexity, rate (and definition of) convergence, and a strong notion 
of robustness.  


18.

Bayesian representations of stochastic processes under learning - 
De-Finetti revisited
Matt Jackson, CalTech; Rann Smorodnitsky, Technion, and Ehud Kalai, 
Northwestern University

What are the Bayesian representations (decompositions) of a stochastic process
that are learnable and are not further refined by observation of the process?
Such a ``maximally'' learnable representation is equivalent to conditioning on
the tail field.


19.

Fast Universal Portfolios For Parameterized Target Classes
Jason Cross, Yale University

This talk generalizes and extends some of the concepts of universal
portfolios as introduced by Cover (1991). We begin in an abstract
framework, generalizing the concept of target class beyond constant
rebalanced portfolios to include potential parameterized target
classes having dependence on side information of a continuous form.
An analogy of Cover's universal portfolio algorithm in discrete time
is presented and is shown to be universal with respect to the
generalized classes. The problem of computing universal portfolios in
discrete time is addressed by extending results to continuous time
where we show the existence of an easily computable, continuously
updated, universal procedure for linearly parameterized classes of
portfolios. Finally, to reconcile ease of computation with
applicability we discretize the above procedure and analyze it in near
continuous time. Given an appropriate schedule of increasingly
frequent rebalances, we are able to show that this discretized
portfolio remains universal with respect to its continuously traded
target class. Furthermore, if $d$ is the dimension of our associated
parameter space, we conclude that the proposed portfolio is computable
within a factor of $d^2$ steps per iteration. This is an improvement
over previous universal procedures for which calculations grow
exponentially with $d$.  


20.

On-line Learning and the Metrical Task System Problem
Avrim Blum, Carnegie Mellon University

In this talk I will discuss connections between work in the
competitive analysis of online algorithms and work in online machine
learning.  In particular, I will focus on the Metrical Task System problem,
studied in the area of online algorithms, and the problem of combining
expert advice, studied in computational learning theory.  
These problems contain several interesting similarities and we will
see that algorithms designed for each can be used to achieve good
bounds and new approaches for solving the other.  I will also talk
about some intermediate problems, such as the problem of combining
online algorithms online.


21.

On-line load balancing and routing
Yossi Azar, Tel-Aviv University

We survey some new and old results on on-line load balancing and virtual
circuit routing. In the load balancing problems there are m parallel
machines and a number of independent tasks (jobs); the tasks arrive at
arbitrary times, where each task has an associated load vector and duration.
A task has to be assigned immediately to one of the machines, thereby
increasing the load on this machine by the amount specified by the
corresponding coordinate of the load vector for the duration of the task.
The decisions are made on-line, i.e., without knowing the tasks that will
arrive in the future. The goal is to minimize the maximum load of the
machines (other goals are plausible) . Various online load balancing
problems naturally arise in many applications involving allocation of
resources. Especially, they are useful for on-line virtual circuit routing.


22.

Average case and worst case sample complexity of learning distributions
Manfred Opper, Sheffield University

This work analyses the on-line regret of optimal algorithms for the problem
of predicting a sequence using cumulative log loss. They analyze both the
case in which the data is generated by a distribution in the class and the
case in which there is no assumption about how the sequence is generated.
They show that, surprisingly, the gap between the bounds is very small even
in cases where the metric dimension grows exponentially, they also show a
breaking point in which the bounds diverge.


23.

On prediction of individual sequences relative to a set of experts.
N. Cesa-Bianchi, University of Milan, Italy and Gabor Lugosi, Pompeu Fabra
University, Spain

We study a prediction model where the task is to sequentially guess the
bits of an arbitrary sequence making almost as few mistakes as the best
predictor (or ``expert'') in a given class. All predictions are allowed to
depend on the outcome of biased coin flips. Applying tools from empirical
process theory, we show upper and lower bounds on the minimax regret
(number of excess mistakes with respect to the mistakes of the best expert)
in terms of the geometry of the class of experts. For expert classes such
as Markov experts and autoregressive linear experts, our bounds determine
the exact order of magnitude of the minimax regret. We then discuss some
recent results in a related model, where each randomized prediction is
scored according to the log of the probability assigned to the actual
outcome. This has applications to problems in data compression and
portfolio selection.
Other Workshops
DIMACS Homepage
Contacting the Center
Document last modified on July 7, 1999.
DIMACS Workshop: On-Line Decision Making

July 12 - 15, 1999 Busch Student Center, Rutgers University, Busch Campus, Piscataway, NJ

Organizers:

ABSTRACTS

July 12 - 15, 1999
Busch Student Center, Rutgers University, Busch Campus, Piscataway, NJ