Corrections to “Satisficing in Multiarmed Bandit Problems”

A multiarmed bandit problem is studied when the arms are not always available. The arms are first assumed to be intermittently available with some state/action-dependent probabilities. It is proven that no index policy can attain the maximum expected total discounted reward in every instance of that problem. The Whittle index policy is derived, and its properties are studied. Then it is assumed that the arms may break down, but repair is an option at some cost, and the new Whittle index policy is derived. Both problems are indexable. The proposed index policies cannot be dominated by any other index policy over all multiarmed bandit problems considered here. Whittle indices are evaluated for Bernoulli arms with unknown success probabilities.

Download Full-text

Conservation Laws, Extended Polymatroids and Multiarmed Bandit Problems; A Polyhedral Approach to Indexable Systems

Mathematics of Operations Research ◽

10.1287/moor.21.2.257 ◽

1996 ◽

Vol 21 (2) ◽

pp. 257-306 ◽

Cited By ~ 104

Author(s):

Dimitris Bertsimas ◽

José Niño-Mora

Keyword(s):

Conservation Laws ◽

Bandit Problems ◽

Polyhedral Approach ◽

Multiarmed Bandit

Download Full-text

On Abruptly-Changing and Slowly-Varying Multiarmed Bandit Problems

2018 Annual American Control Conference (ACC) ◽

10.23919/acc.2018.8431265 ◽

2018 ◽

Cited By ~ 3

Author(s):

Lai Wei ◽

Vaibhav Srivatsva

Keyword(s):

Bandit Problems ◽

Multiarmed Bandit

Download Full-text

Index policies for discounted bandit problems with availability constraints

Advances in Applied Probability ◽

10.1239/aap/1214950209 ◽

2008 ◽

Vol 40 (2) ◽

pp. 377-400 ◽

Cited By ~ 5

Author(s):

Savas Dayanik ◽

Warren Powell ◽

Kazutoshi Yamazaki

Keyword(s):

Bandit Problem ◽

Bandit Problems ◽

Index Policy ◽

State Action ◽

Index Policies ◽

Availability Constraints ◽

Whittle Index ◽

Multiarmed Bandit

A multiarmed bandit problem is studied when the arms are not always available. The arms are first assumed to be intermittently available with some state/action-dependent probabilities. It is proven that no index policy can attain the maximum expected total discounted reward in every instance of that problem. The Whittle index policy is derived, and its properties are studied. Then it is assumed that the arms may break down, but repair is an option at some cost, and the new Whittle index policy is derived. Both problems are indexable. The proposed index policies cannot be dominated by any other index policy over all multiarmed bandit problems considered here. Whittle indices are evaluated for Bernoulli arms with unknown success probabilities.

Download Full-text

Robust Multiarmed Bandit Problems

Management Science ◽

10.1287/mnsc.2015.2153 ◽

2015 ◽

pp. 150805104205004 ◽

Cited By ~ 8

Author(s):

Michael Jong Kim ◽

Andrew E.B. Lim

Keyword(s):

Bandit Problems ◽

Multiarmed Bandit

Download Full-text

Asymptotically efficient rules in multiarmed bandit problems

Modelling and Adaptive Control - Lecture Notes in Control and Information Sciences ◽

10.1007/bfb0043173 ◽

2006 ◽

pp. 1-9

Author(s):

V. Anantharam ◽

P. Varaiya

Keyword(s):

Bandit Problems ◽

Asymptotically Efficient ◽

Multiarmed Bandit

Download Full-text

On nearly selfoptimizing strategies for multiarmed bandit problems with controlled arms

Applicationes Mathematicae ◽

10.4064/am-23-4-449-473 ◽

1996 ◽

Vol 23 (4) ◽

pp. 449-473

Author(s):

Ewa Drabik

Keyword(s):

Bandit Problems ◽

Multiarmed Bandit

Download Full-text

Multiplayer Bandits Without Observing Collision Information

Mathematics of Operations Research ◽

10.1287/moor.2021.1168 ◽

2021 ◽

Author(s):

Gábor Lugosi ◽

Abbas Mehrabian

Keyword(s):

Nash Equilibria ◽

Square Root ◽

Bandit Problems ◽

Approximate Nash Equilibria ◽

Regret Bounds ◽

Multiarmed Bandit

We study multiplayer stochastic multiarmed bandit problems in which the players cannot communicate, and if two or more players pull the same arm, a collision occurs and the involved players receive zero reward. We consider two feedback models: a model in which the players can observe whether a collision has occurred and a more difficult setup in which no collision information is available. We give the first theoretical guarantees for the second model: an algorithm with a logarithmic regret and an algorithm with a square-root regret that does not depend on the gaps between the means. For the first model, we give the first square-root regret bounds that do not depend on the gaps. Building on these ideas, we also give an algorithm for reaching approximate Nash equilibria quickly in stochastic anticoordination games.

Download Full-text

Asymptotically efficient rules in multiarmed Bandit problems

10.1109/cdc.1986.267217 ◽

1986 ◽

Author(s):

V. Anantharam ◽

P. Varaiya

Keyword(s):

Bandit Problems ◽

Asymptotically Efficient ◽

Multiarmed Bandit

Download Full-text