A Marginal Productivity Index Policy for the Finite-Horizon Multiarmed Bandit Problem

A multiarmed bandit problem is studied when the arms are not always available. The arms are first assumed to be intermittently available with some state/action-dependent probabilities. It is proven that no index policy can attain the maximum expected total discounted reward in every instance of that problem. The Whittle index policy is derived, and its properties are studied. Then it is assumed that the arms may break down, but repair is an option at some cost, and the new Whittle index policy is derived. Both problems are indexable. The proposed index policies cannot be dominated by any other index policy over all multiarmed bandit problems considered here. Whittle indices are evaluated for Bernoulli arms with unknown success probabilities.

Download Full-text

Index policies for discounted bandit problems with availability constraints

Advances in Applied Probability ◽

10.1239/aap/1214950209 ◽

2008 ◽

Vol 40 (2) ◽

pp. 377-400 ◽

Cited By ~ 5

Author(s):

Savas Dayanik ◽

Warren Powell ◽

Kazutoshi Yamazaki

Keyword(s):

Bandit Problem ◽

Bandit Problems ◽

Index Policy ◽

State Action ◽

Index Policies ◽

Availability Constraints ◽

Whittle Index ◽

Multiarmed Bandit

A multiarmed bandit problem is studied when the arms are not always available. The arms are first assumed to be intermittently available with some state/action-dependent probabilities. It is proven that no index policy can attain the maximum expected total discounted reward in every instance of that problem. The Whittle index policy is derived, and its properties are studied. Then it is assumed that the arms may break down, but repair is an option at some cost, and the new Whittle index policy is derived. Both problems are indexable. The proposed index policies cannot be dominated by any other index policy over all multiarmed bandit problems considered here. Whittle indices are evaluated for Bernoulli arms with unknown success probabilities.

Download Full-text

Independently Expiring Multiarmed Bandits

Probability in the Engineering and Informational Sciences ◽

10.1017/s0269964800005325 ◽

1998 ◽

Vol 12 (4) ◽

pp. 453-468 ◽

Cited By ~ 1

Author(s):

Rhonda Righter ◽

J. George Shanthikumar

Keyword(s):

Simple Proof ◽

Gittins Index ◽

Bandit Problem ◽

Index Policy ◽

Multiarmed Bandit

We give conditions on the optimality of an index policy for multiarmed bandits when arms expire independently. We also give a new simple proof of the optimality of the Gittins index policy for the classic multiarmed bandit problem.

Download Full-text

Fast Two-Stage Computation of an Index Policy for Multi-Armed Bandits with Setup Delays

Mathematics ◽

10.3390/math9010052 ◽

2020 ◽

Vol 9 (1) ◽

pp. 52

Author(s):

José Niño-Mora

Keyword(s):

Numerical Study ◽

Arithmetic Operation ◽

Bandit Problem ◽

Index Policy ◽

Two Stage ◽

Second Stage ◽

Whittle Index ◽

Set Up ◽

Computing Method ◽

Special Case

We consider the multi-armed bandit problem with penalties for switching that include setup delays and costs, extending the former results of the author for the special case with no switching delays. A priority index for projects with setup delays that characterizes, in part, optimal policies was introduced by Asawa and Teneketzis in 1996, yet without giving a means of computing it. We present a fast two-stage index computing method, which computes the continuation index (which applies when the project has been set up) in a first stage and certain extra quantities with cubic (arithmetic-operation) complexity in the number of project states and then computes the switching index (which applies when the project is not set up), in a second stage, with quadratic complexity. The approach is based on new methodological advances on restless bandit indexation, which are introduced and deployed herein, being motivated by the limitations of previous results, exploiting the fact that the aforementioned index is the Whittle index of the project in its restless reformulation. A numerical study demonstrates substantial runtime speed-ups of the new two-stage index algorithm versus a general one-stage Whittle index algorithm. The study further gives evidence that, in a multi-project setting, the index policy is consistently nearly optimal.

Download Full-text

Asymptotically efficient allocation rules for the multiarmed bandit problem with multiple plays-Part I: I.I.D. rewards

IEEE Transactions on Automatic Control ◽

10.1109/tac.1987.1104491 ◽

1987 ◽

Vol 32 (11) ◽

pp. 968-976 ◽

Cited By ~ 102

Author(s):

V. Anantharam ◽

P. Varaiya ◽

J. Walrand

Keyword(s):

Efficient Allocation ◽

Bandit Problem ◽

Allocation Rules ◽

Asymptotically Efficient ◽

Multiarmed Bandit

Download Full-text

A Marginal Productivity Index Rule for Scheduling Multiclass Queues with Setups

Lecture Notes in Computer Science - Network Control and Optimization ◽

10.1007/978-3-642-00393-6_10 ◽

2009 ◽

pp. 78-86

Author(s):

José Niño-Mora

Keyword(s):

Productivity Index ◽

Marginal Productivity ◽

Multiclass Queues

Download Full-text

Marginal productivity index policies for scheduling multiclass delay-/loss-sensitive traffic

Next Generation Internet Networks, 2005 ◽

10.1109/ngi.2005.1431648 ◽

2005 ◽

Cited By ~ 4

Author(s):

J. Nino-Mora

Keyword(s):

Productivity Index ◽

Marginal Productivity ◽

Index Policies

Download Full-text

Comments on “Finite-Time Analysis of the Multiarmed Bandit Problem”

2019 International Conference on Machine Learning and Cybernetics (ICMLC) ◽

10.1109/icmlc48188.2019.8949232 ◽

2019 ◽

Cited By ~ 1

Author(s):

Lu-Ning Zhang ◽

Xin Zuo ◽

Jian-Wei Liu ◽

Wei-Min Li ◽

Nobuyasu Ito

Keyword(s):

Finite Time ◽

Bandit Problem ◽

Time Analysis ◽

Multiarmed Bandit

Download Full-text

Scheduling Jobs That Are Subject to Deterministic Due Dates and Have Deteriorating Expected Rewards

Probability in the Engineering and Informational Sciences ◽

10.1017/s026996480000468x ◽

1997 ◽

Vol 11 (1) ◽

pp. 65-78 ◽

Cited By ~ 3

Author(s):

Takashi Ishikida ◽

Yat-wah Wan

Keyword(s):

Single Server ◽

Due Dates ◽

Bandit Problem ◽

Scheduling Policy ◽

Total Reward ◽

Multiarmed Bandit

A single server processes jobs that can yield rewards but expire on predetermined dates. Expected immediate rewards from each job are deteriorating. The instance is formulated as a multiarmed bandit problem, and an index-based scheduling policy is shown to maximize the expected total reward.

Download Full-text

Marginal Productivity Index Policies for Scheduling Multiclass Delay-/Loss-Sensitive Traffic with Delayed State Observation

2007 Next Generation Internet Networks ◽

10.1109/ngi.2007.371218 ◽

2007 ◽

Cited By ~ 2

Author(s):

Jose Nino-Mora

Keyword(s):

Productivity Index ◽

Marginal Productivity ◽

State Observation ◽

Index Policies

Download Full-text