Approximation algorithms for restless bandit problems

Sudipto Guha; Kamesh Munagala; Peng Shi

doi:10.1145/1870103.1870106

Approximation Algorithms for Restless Bandit Problems

Proceedings of the Twentieth Annual ACM-SIAM Symposium on Discrete Algorithms ◽

10.1137/1.9781611973068.4 ◽

2009 ◽

Cited By ~ 5

Author(s):

Sudipto Guha ◽

Kamesh Munagala ◽

Peng Shi

Keyword(s):

Approximation Algorithms ◽

Bandit Problems ◽

Restless Bandit

Download Full-text

Online Pandora’s Boxes and Bandits

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33011885 ◽

2019 ◽

Vol 33 ◽

pp. 1885-1892 ◽

Cited By ~ 1

Author(s):

Hossein Esfandiari ◽

MohammadTaghi HajiAghayi ◽

Brendan Lucier ◽

Michael Mitzenmacher

Keyword(s):

Decision Making ◽

Reinforcement Learning ◽

Approximation Algorithms ◽

Standard Model ◽

Decision Process ◽

Bandit Problems ◽

Knapsack Constraints ◽

Feasibility Constraints ◽

The Cost

We consider online variations of the Pandora’s box problem (Weitzman 1979), a standard model for understanding issues related to the cost of acquiring information for decision-making. Our problem generalizes both the classic Pandora’s box problem and the prophet inequality framework. Boxes are presented online, each with a random value and cost drawn jointly from some known distribution. Pandora chooses online whether to open each box given its cost, and then chooses irrevocably whether to keep the revealed prize or pass on it. We aim for approximation algorithms against adversaries that can choose the largest prize over any opened box, and use optimal offline policies to decide which boxes to open (without knowledge of the value inside)1. We consider variations where Pandora can collect multiple prizes subject to feasibility constraints, such as cardinality, matroid, or knapsack constraints. We also consider variations related to classic multi-armed bandit problems from reinforcement learning. Our results use a reduction-based framework where we separate the issues of the cost of acquiring information from the online decision process of which prizes to keep. Our work shows that in many scenarios, Pandora can achieve a good approximation to the best possible performance.

Download Full-text

Some indexable families of restless bandit problems

Advances in Applied Probability ◽

10.1239/aap/1158684996 ◽

2006 ◽

Vol 38 (3) ◽

pp. 643-672 ◽

Cited By ~ 32

Author(s):

K. D. Glazebrook ◽

D. Ruiz-Hernandez ◽

C. Kirkbride

Keyword(s):

Index Theory ◽

Stochastic Scheduling ◽

Gittins Index ◽

Scheduling Problems ◽

Bandit Problems ◽

Index Policy ◽

Restless Bandit ◽

Machine Maintenance ◽

State Evolution ◽

Strong Performance

In 1988 Whittle introduced an important but intractable class of restless bandit problems which generalise the multiarmed bandit problems of Gittins by allowing state evolution for passive projects. Whittle's account deployed a Lagrangian relaxation of the optimisation problem to develop an index heuristic. Despite a developing body of evidence (both theoretical and empirical) which underscores the strong performance of Whittle's index policy, a continuing challenge to implementation is the need to establish that the competing projects all pass an indexability test. In this paper we employ Gittins' index theory to establish the indexability of (inter alia) general families of restless bandits which arise in problems of machine maintenance and stochastic scheduling problems with switching penalties. We also give formulae for the resulting Whittle indices. Numerical investigations testify to the outstandingly strong performance of the index heuristics concerned.

Download Full-text

Index policies for a class of discounted restless bandits

Advances in Applied Probability ◽

10.1017/s0001867800011903 ◽

2002 ◽

Vol 34 (04) ◽

pp. 754-774 ◽

Cited By ~ 8

Author(s):

K. D. Glazebrook ◽

J. Niño-Mora ◽

P. S. Ansell

Keyword(s):

Conservation Laws ◽

Special Class ◽

Computational Study ◽

Bandit Problems ◽

Index Policy ◽

Restless Bandit ◽

Restless Bandits ◽

Index Policies ◽

Strong Performance ◽

Dual Speed

The paper concerns a class of discounted restless bandit problems which possess an indexability property. Conservation laws yield an expression for the reward suboptimality of a general policy. These results are utilised to study the closeness to optimality of an index policy for a special class of simple and natural dual speed restless bandits for which indexability is guaranteed. The strong performance of the index policy is confirmed by a computational study.

Download Full-text

On the Myopic Policy for a Class of Restless Bandit Problems with Applications in Dynamic Multichannel Access

10.21236/ada554809 ◽

2009 ◽

Author(s):

Keqin Liu ◽

Qing Zhao

Keyword(s):

Bandit Problems ◽

Restless Bandit ◽

Myopic Policy ◽

Multichannel Access

Download Full-text

On the myopic policy for a class of restless bandit problems with applications in dynamic multichannel access

Proceedings of the 48h IEEE Conference on Decision and Control (CDC) held jointly with 2009 28th Chinese Control Conference ◽

10.1109/cdc.2009.5400366 ◽

2009 ◽

Cited By ~ 4

Author(s):

Keqin Liu ◽

Qing Zhao

Keyword(s):

Bandit Problems ◽

Restless Bandit ◽

Myopic Policy ◽

Multichannel Access

Download Full-text

Some indexable families of restless bandit problems

Advances in Applied Probability ◽

10.1017/s000186780000121x ◽

2006 ◽

Vol 38 (03) ◽

pp. 643-672 ◽

Cited By ~ 4

Author(s):

K. D. Glazebrook ◽

D. Ruiz-Hernandez ◽

C. Kirkbride

Keyword(s):

Index Theory ◽

Stochastic Scheduling ◽

Gittins Index ◽

Scheduling Problems ◽

Bandit Problems ◽

Index Policy ◽

Restless Bandit ◽

Machine Maintenance ◽

State Evolution ◽

Strong Performance

In 1988 Whittle introduced an important but intractable class of restless bandit problems which generalise the multiarmed bandit problems of Gittins by allowing state evolution for passive projects. Whittle's account deployed a Lagrangian relaxation of the optimisation problem to develop an index heuristic. Despite a developing body of evidence (both theoretical and empirical) which underscores the strong performance of Whittle's index policy, a continuing challenge to implementation is the need to establish that the competing projects all pass an indexability test. In this paper we employ Gittins' index theory to establish the indexability of (inter alia) general families of restless bandits which arise in problems of machine maintenance and stochastic scheduling problems with switching penalties. We also give formulae for the resulting Whittle indices. Numerical investigations testify to the outstandingly strong performance of the index heuristics concerned.

Download Full-text

Index policies for a class of discounted restless bandits

Advances in Applied Probability ◽

10.1239/aap/1037990952 ◽

2002 ◽

Vol 34 (4) ◽

pp. 754-774 ◽

Cited By ~ 18

Author(s):

K. D. Glazebrook ◽

J. Niño-Mora ◽

P. S. Ansell

Keyword(s):

Conservation Laws ◽

Special Class ◽

Computational Study ◽

Bandit Problems ◽

Index Policy ◽

Restless Bandit ◽

Restless Bandits ◽

Index Policies ◽

Strong Performance ◽

Dual Speed

The paper concerns a class of discounted restless bandit problems which possess an indexability property. Conservation laws yield an expression for the reward suboptimality of a general policy. These results are utilised to study the closeness to optimality of an index policy for a special class of simple and natural dual speed restless bandits for which indexability is guaranteed. The strong performance of the index policy is confirmed by a computational study.

Download Full-text

Learning, risk attitude and hot stoves in restless bandit problems

Journal of Mathematical Psychology ◽

10.1016/j.jmp.2008.05.006 ◽

2009 ◽

Vol 53 (3) ◽

pp. 155-167 ◽

Cited By ~ 32

Author(s):

Guido Biele ◽

Ido Erev ◽

Eyal Ert

Keyword(s):

Risk Attitude ◽

Bandit Problems ◽

Restless Bandit

Download Full-text

INDEXABILITY AND OPTIMAL INDEX POLICIES FOR A CLASS OF REINITIALISING RESTLESS BANDITS

Probability in the Engineering and Informational Sciences ◽

10.1017/s026996481500025x ◽

2015 ◽

Vol 30 (1) ◽

pp. 1-23 ◽

Cited By ~ 1

Author(s):

Sofía S. Villar

Keyword(s):

Closed Form ◽

Surveillance Systems ◽

Bandit Problems ◽

Index Policy ◽

Initial State ◽

Restless Bandit ◽

Index Policies ◽

Markov Decision ◽

Whittle Index ◽

Partially Observable

Motivated by a class of Partially Observable Markov Decision Processes with application in surveillance systems in which a set of imperfectly observed state processes is to be inferred from a subset of available observations through a Bayesian approach, we formulate and analyze a special family of multi-armed restless bandit problems. We consider the problem of finding an optimal policy for observing the processes that maximizes the total expected net rewards over an infinite time horizon subject to the resource availability. From the Lagrangian relaxation of the original problem, an index policy can be derived, as long as the existence of the Whittle index is ensured. We demonstrate that such a class of reinitializing bandits in which the projects' state deteriorates while active and resets to its initial state when passive until its completion possesses the structural property of indexability and we further show how to compute the index in closed form. In general, the Whittle index rule for restless bandit problems does not achieve optimality. However, we show that the proposed Whittle index rule is optimal for the problem under study in the case of stochastically heterogenous arms under the expected total criterion, and it is further recovered by a simple tractable rule referred to as the 1-limited Round Robin rule. Moreover, we illustrate the significant suboptimality of other widely used heuristic: the Myopic index rule, by computing in closed form its suboptimality gap. We present numerical studies which illustrate for the more general instances the performance advantages of the Whittle index rule over other simple heuristics.

Download Full-text