Index policies for a class of discounted restless bandits

The paper concerns a class of discounted restless bandit problems which possess an indexability property. Conservation laws yield an expression for the reward suboptimality of a general policy. These results are utilised to study the closeness to optimality of an index policy for a special class of simple and natural dual speed restless bandits for which indexability is guaranteed. The strong performance of the index policy is confirmed by a computational study.

Download Full-text

Some indexable families of restless bandit problems

Advances in Applied Probability ◽

10.1239/aap/1158684996 ◽

2006 ◽

Vol 38 (3) ◽

pp. 643-672 ◽

Cited By ~ 32

Author(s):

K. D. Glazebrook ◽

D. Ruiz-Hernandez ◽

C. Kirkbride

Keyword(s):

Index Theory ◽

Stochastic Scheduling ◽

Gittins Index ◽

Scheduling Problems ◽

Bandit Problems ◽

Index Policy ◽

Restless Bandit ◽

Machine Maintenance ◽

State Evolution ◽

Strong Performance

In 1988 Whittle introduced an important but intractable class of restless bandit problems which generalise the multiarmed bandit problems of Gittins by allowing state evolution for passive projects. Whittle's account deployed a Lagrangian relaxation of the optimisation problem to develop an index heuristic. Despite a developing body of evidence (both theoretical and empirical) which underscores the strong performance of Whittle's index policy, a continuing challenge to implementation is the need to establish that the competing projects all pass an indexability test. In this paper we employ Gittins' index theory to establish the indexability of (inter alia) general families of restless bandits which arise in problems of machine maintenance and stochastic scheduling problems with switching penalties. We also give formulae for the resulting Whittle indices. Numerical investigations testify to the outstandingly strong performance of the index heuristics concerned.

Download Full-text

Some indexable families of restless bandit problems

Advances in Applied Probability ◽

10.1017/s000186780000121x ◽

2006 ◽

Vol 38 (03) ◽

pp. 643-672 ◽

Cited By ~ 4

Author(s):

K. D. Glazebrook ◽

D. Ruiz-Hernandez ◽

C. Kirkbride

Keyword(s):

Index Theory ◽

Stochastic Scheduling ◽

Gittins Index ◽

Scheduling Problems ◽

Bandit Problems ◽

Index Policy ◽

Restless Bandit ◽

Machine Maintenance ◽

State Evolution ◽

Strong Performance

In 1988 Whittle introduced an important but intractable class of restless bandit problems which generalise the multiarmed bandit problems of Gittins by allowing state evolution for passive projects. Whittle's account deployed a Lagrangian relaxation of the optimisation problem to develop an index heuristic. Despite a developing body of evidence (both theoretical and empirical) which underscores the strong performance of Whittle's index policy, a continuing challenge to implementation is the need to establish that the competing projects all pass an indexability test. In this paper we employ Gittins' index theory to establish the indexability of (inter alia) general families of restless bandits which arise in problems of machine maintenance and stochastic scheduling problems with switching penalties. We also give formulae for the resulting Whittle indices. Numerical investigations testify to the outstandingly strong performance of the index heuristics concerned.

Download Full-text

INDEXABILITY AND OPTIMAL INDEX POLICIES FOR A CLASS OF REINITIALISING RESTLESS BANDITS

Probability in the Engineering and Informational Sciences ◽

10.1017/s026996481500025x ◽

2015 ◽

Vol 30 (1) ◽

pp. 1-23 ◽

Cited By ~ 1

Author(s):

Sofía S. Villar

Keyword(s):

Closed Form ◽

Surveillance Systems ◽

Bandit Problems ◽

Index Policy ◽

Initial State ◽

Restless Bandit ◽

Index Policies ◽

Markov Decision ◽

Whittle Index ◽

Partially Observable

Motivated by a class of Partially Observable Markov Decision Processes with application in surveillance systems in which a set of imperfectly observed state processes is to be inferred from a subset of available observations through a Bayesian approach, we formulate and analyze a special family of multi-armed restless bandit problems. We consider the problem of finding an optimal policy for observing the processes that maximizes the total expected net rewards over an infinite time horizon subject to the resource availability. From the Lagrangian relaxation of the original problem, an index policy can be derived, as long as the existence of the Whittle index is ensured. We demonstrate that such a class of reinitializing bandits in which the projects' state deteriorates while active and resets to its initial state when passive until its completion possesses the structural property of indexability and we further show how to compute the index in closed form. In general, the Whittle index rule for restless bandit problems does not achieve optimality. However, we show that the proposed Whittle index rule is optimal for the problem under study in the case of stochastically heterogenous arms under the expected total criterion, and it is further recovered by a simple tractable rule referred to as the 1-limited Round Robin rule. Moreover, we illustrate the significant suboptimality of other widely used heuristic: the Myopic index rule, by computing in closed form its suboptimality gap. We present numerical studies which illustrate for the more general instances the performance advantages of the Whittle index rule over other simple heuristics.

Download Full-text

Index policies for discounted bandit problems with availability constraints

Advances in Applied Probability ◽

10.1017/s0001867800002573 ◽

2008 ◽

Vol 40 (02) ◽

pp. 377-400 ◽

Cited By ~ 1

Author(s):

Savas Dayanik ◽

Warren Powell ◽

Kazutoshi Yamazaki

Keyword(s):

Bandit Problem ◽

Bandit Problems ◽

Index Policy ◽

State Action ◽

Index Policies ◽

Availability Constraints ◽

Whittle Index ◽

Multiarmed Bandit

A multiarmed bandit problem is studied when the arms are not always available. The arms are first assumed to be intermittently available with some state/action-dependent probabilities. It is proven that no index policy can attain the maximum expected total discounted reward in every instance of that problem. The Whittle index policy is derived, and its properties are studied. Then it is assumed that the arms may break down, but repair is an option at some cost, and the new Whittle index policy is derived. Both problems are indexable. The proposed index policies cannot be dominated by any other index policy over all multiarmed bandit problems considered here. Whittle indices are evaluated for Bernoulli arms with unknown success probabilities.

Download Full-text

Index policies for discounted bandit problems with availability constraints

Advances in Applied Probability ◽

10.1239/aap/1214950209 ◽

2008 ◽

Vol 40 (2) ◽

pp. 377-400 ◽

Cited By ~ 5

Author(s):

Savas Dayanik ◽

Warren Powell ◽

Kazutoshi Yamazaki

Keyword(s):

Bandit Problem ◽

Bandit Problems ◽

Index Policy ◽

State Action ◽

Index Policies ◽

Availability Constraints ◽

Whittle Index ◽

Multiarmed Bandit

A multiarmed bandit problem is studied when the arms are not always available. The arms are first assumed to be intermittently available with some state/action-dependent probabilities. It is proven that no index policy can attain the maximum expected total discounted reward in every instance of that problem. The Whittle index policy is derived, and its properties are studied. Then it is assumed that the arms may break down, but repair is an option at some cost, and the new Whittle index policy is derived. Both problems are indexable. The proposed index policies cannot be dominated by any other index policy over all multiarmed bandit problems considered here. Whittle indices are evaluated for Bernoulli arms with unknown success probabilities.

Download Full-text

A Fast-Pivoting Algorithm for Whittle’s Restless Bandit Index

Mathematics ◽

10.3390/math8122226 ◽

2020 ◽

Vol 8 (12) ◽

pp. 2226 ◽

Cited By ~ 1

Author(s):

José Niño-Mora

Keyword(s):

Numerical Study ◽

Index Policy ◽

State Spaces ◽

Restless Bandit ◽

Restless Bandits ◽

Pivoting Algorithm ◽

Markov Decision ◽

Whittle Index ◽

Decision Epoch ◽

Change State

The Whittle index for restless bandits (two-action semi-Markov decision processes) provides an intuitively appealing optimal policy for controlling a single generic project that can be active (engaged) or passive (rested) at each decision epoch, and which can change state while passive. It further provides a practical heuristic priority-index policy for the computationally intractable multi-armed restless bandit problem, which has been widely applied over the last three decades in multifarious settings, yet mostly restricted to project models with a one-dimensional state. This is due in part to the difficulty of establishing indexability (existence of the index) and of computing the index for projects with large state spaces. This paper draws on the author’s prior results on sufficient indexability conditions and an adaptive-greedy algorithmic scheme for restless bandits to obtain a new fast-pivoting algorithm that computes the n Whittle index values of an n-state restless bandit by performing, after an initialization stage, n steps that entail (2/3)n3+O(n2) arithmetic operations. This algorithm also draws on the parametric simplex method, and is based on elucidating the pattern of parametric simplex tableaux, which allows to exploit special structure to substantially simplify and reduce the complexity of simplex pivoting steps. A numerical study demonstrates substantial runtime speed-ups versus alternative algorithms.

Download Full-text

Restless bandits, partial conservation laws and indexability

Advances in Applied Probability ◽

10.1017/s0001867800010648 ◽

2001 ◽

Vol 33 (1) ◽

pp. 76-98 ◽

Cited By ~ 29

Author(s):

José Niño-Mora

Keyword(s):

Conservation Laws ◽

Greedy Algorithm ◽

Stochastic Scheduling ◽

Model Parameters ◽

Scheduling Problems ◽

Index Policy ◽

Restless Bandits ◽

Partial Conservation ◽

Finite State ◽

Achievable Region

We show that if performance measures in a general stochastic scheduling problem satisfy partial conservation laws (PCL), which extend the generalized conservation laws (GCL) introduced by Bertsimas and Niño-Mora (1996), then the problem is solved optimally by a priority-index policy under a range of admissible linear performance objectives, with both this range and the optimal indices being determined by a one-pass adaptive-greedy algorithm that extends Klimov's: we call such scheduling problems PCL-indexable. We further apply the PCL framework to investigate the indexability property of restless bandits (two-action finite-state Markov decision chains) introduced by Whittle, obtaining the following results: (i) we present conditions on model parameters under which a single restless bandit is PCL-indexable, and hence indexable; membership of the class of PCL-indexable bandits is tested through a single run of the adaptive-greedy algorithm, which further computes the Whittle indices when the test is positive; this provides a tractable sufficient condition for indexability; (ii) we further introduce the subclass of GCL-indexable bandits (including classical bandits), which are indexable under arbitrary linear rewards. Our analysis is based on the achievable region approach to stochastic optimization, as the results follow from deriving and exploiting a new linear programming reformulation for single restless bandits.

Download Full-text

On transforming an index for generalised bandit problems

Journal of Applied Probability ◽

10.2307/3214927 ◽

1995 ◽

Vol 32 (1) ◽

pp. 168-182 ◽

Cited By ~ 4

Author(s):

K. D. Glazebrook ◽

S. Greatrix

Keyword(s):

Dynamic Programming ◽

Policy Evaluation ◽

Gittins Index ◽

Bandit Problem ◽

Bandit Problems ◽

Index Policies

Nash (1980) demonstrated that index policies are optimal for a class of generalised bandit problem. A transform of the index concerned has many of the attributes of the Gittins index. The transformed index is positive-valued, with maximal values yielding optimal actions. It may be characterised as the value of a restart problem and is hence computable via dynamic programming methodologies. The transformed index can also be used in procedures for policy evaluation.

Download Full-text

Opportunistic Scheduling Revisited Using Restless Bandits: Indexability and Index Policy

GLOBECOM 2017 - 2017 IEEE Global Communications Conference ◽

10.1109/glocom.2017.8254159 ◽

2017 ◽

Cited By ~ 3

Author(s):

Kehao Wang ◽

Jihong Yu ◽

Lin Chen ◽

Moe Win

Keyword(s):

Opportunistic Scheduling ◽

Index Policy ◽

Restless Bandits

Download Full-text