index policy Latest Research Papers

In this paper, we study a slotted-time system where a base station needs to update multiple users at the same time. Due to the limited resources, only part of the users can be updated in each time slot. We consider the problem of minimizing the Age of Incorrect Information (AoII) when imperfect Channel State Information (CSI) is available. Leveraging the notion of the Markov Decision Process (MDP), we obtain the structural properties of the optimal policy. By introducing a relaxed version of the original problem, we develop the Whittle’s index policy under a simple condition. However, indexability is required to ensure the existence of Whittle’s index. To avoid indexability, we develop Indexed priority policy based on the optimal policy for the relaxed problem. Finally, numerical results are laid out to showcase the application of the derived structural properties and highlight the performance of the developed scheduling policies.

Download Full-text

State-Aware Value Function Approximation with Attention Mechanism for Restless Multi-armed Bandits

Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2021/64 ◽

2021 ◽

Author(s):

Shuang Wu ◽

Jingyu Zhao ◽

Guangjian Tian ◽

Jun Wang

Keyword(s):

Function Approximation ◽

Bellman Equation ◽

Value Function ◽

Optimal Solution ◽

Attention Mechanism ◽

Great Success ◽

Index Policy ◽

Approximation Solution ◽

Value Function Approximation ◽

Function Approximator

The restless multi-armed bandit (RMAB) problem is a generalization of the multi-armed bandit with non-stationary rewards. Its optimal solution is intractable due to exponentially large state and action spaces with respect to the number of arms. Existing approximation approaches, e.g., Whittle's index policy, have difficulty in capturing either temporal or spatial factors such as impacts from other arms. We propose considering both factors using the attention mechanism, which has achieved great success in deep learning. Our state-aware value function approximation solution comprises an attention-based value function approximator and a Bellman equation solver. The attention-based coordination module capture both spatial and temporal factors for arm coordination. The Bellman equation solver utilizes the decoupling structure of RMABs to acquire solutions with significantly reduced computation overheads. In particular, the time complexity of our approximation is linear in the number of arms. Finally, we illustrate the effectiveness and investigate the properties of our proposed method with numerical experiments.

Download Full-text

A Restless Bandit Model for Resource Allocation, Competition, and Reservation

Operations Research ◽

10.1287/opre.2020.2066 ◽

2021 ◽

Author(s):

Jing Fu ◽

Bill Moran ◽

Peter G. Taylor

Keyword(s):

Resource Allocation ◽

Numerical Experiments ◽

Asymptotic Optimality ◽

Sufficient Condition ◽

Limited Capacity ◽

Index Policy ◽

Asymptotically Optimal ◽

Restless Bandit ◽

Simple Sufficient Condition ◽

Resource Capacity

In “A Restless Bandit Model for Resource Allocation, Competition and Reservation,” J. Fu, B. Moran, and P. G. Taylor study a resource allocation problem with varying requests and with resources of limited capacity shared by multiple requests. This problem is modeled as a set of heterogeneous restless multi-armed bandit problems (RMABPs) connected by constraints imposed by resource capacity. Following Whittle’s idea of relaxing the constraints and Weber and Weiss’s proof of asymptotic optimality, the authors propose an index policy and establish conditions for it to be asymptotically optimal in a regime where both arrival rates and capacities increase. In particular, they provide a simple sufficient condition for asymptotic optimality of the policy and, in complete generality, propose a method that generates a set of candidate policies for which asymptotic optimality can be checked. Via numerical experiments, they demonstrate the effectiveness of these results even in the pre-limit case.

Download Full-text

Whittle Index Policy for Opportunistic Scheduling: Heterogeneous Multistate Channels

Restless Multi-Armed Bandit in Opportunistic Scheduling ◽

10.1007/978-3-030-69959-8_5 ◽

2021 ◽

pp. 109-141

Author(s):

Kehao Wang ◽

Lin Chen

Keyword(s):

Opportunistic Scheduling ◽

Index Policy ◽

Whittle Index

Download Full-text

Whittle Index Policy for Opportunistic Scheduling: Heterogeneous Two-State Channels

Restless Multi-Armed Bandit in Opportunistic Scheduling ◽

10.1007/978-3-030-69959-8_3 ◽

2021 ◽

pp. 37-77

Author(s):

Kehao Wang ◽

Lin Chen

Keyword(s):

Opportunistic Scheduling ◽

Index Policy ◽

Whittle Index

Download Full-text

On the Global Optimality of Whittle’s Index Policy for Minimizing the Age of Information

IEEE Transactions on Information Theory ◽

10.1109/tit.2021.3121257 ◽

2021 ◽

pp. 1-1

Author(s):

Saad Kriouile ◽

Mohamad Assaad ◽

Ali Maatouk

Keyword(s):

Global Optimality ◽

Index Policy ◽

Age Of Information

Download Full-text

Fast Two-Stage Computation of an Index Policy for Multi-Armed Bandits with Setup Delays

Mathematics ◽

10.3390/math9010052 ◽

2020 ◽

Vol 9 (1) ◽

pp. 52

Author(s):

José Niño-Mora

Keyword(s):

Numerical Study ◽

Arithmetic Operation ◽

Bandit Problem ◽

Index Policy ◽

Two Stage ◽

Second Stage ◽

Whittle Index ◽

Set Up ◽

Computing Method ◽

Special Case

We consider the multi-armed bandit problem with penalties for switching that include setup delays and costs, extending the former results of the author for the special case with no switching delays. A priority index for projects with setup delays that characterizes, in part, optimal policies was introduced by Asawa and Teneketzis in 1996, yet without giving a means of computing it. We present a fast two-stage index computing method, which computes the continuation index (which applies when the project has been set up) in a first stage and certain extra quantities with cubic (arithmetic-operation) complexity in the number of project states and then computes the switching index (which applies when the project is not set up), in a second stage, with quadratic complexity. The approach is based on new methodological advances on restless bandit indexation, which are introduced and deployed herein, being motivated by the limitations of previous results, exploiting the fact that the aforementioned index is the Whittle index of the project in its restless reformulation. A numerical study demonstrates substantial runtime speed-ups of the new two-stage index algorithm versus a general one-stage Whittle index algorithm. The study further gives evidence that, in a multi-project setting, the index policy is consistently nearly optimal.

Download Full-text

Understanding the Performance of Capped Base-Stock Policies in Lost-Sales Inventory Models

Operations Research ◽

10.1287/opre.2020.2019 ◽

2020 ◽

Author(s):

Linwei Xin

Keyword(s):

Base Stock ◽

Superior Performance ◽

Lead Times ◽

Lost Sales ◽

Index Policy ◽

Stock Level ◽

New Family ◽

Base Stock Policy ◽

Stock Policy ◽

Base Stock Policies

Single-sourcing lost-sales inventory systems with lead times are notoriously difficult to optimize. In this paper, we propose a new family of capped base-stock policies and provide a new perspective on constructing a practical hybrid policy combining two well-known heuristics: base-stock and constant-order policies. Each capped base-stock policy is associated with two parameters: a base-stock level and an order cap. We prove that for any fixed order cap, the capped base-stock policy converges exponentially fast in the base-stock level to a constant-order policy, providing a theoretical foundation for a phenomenon by which a capped dual-index policy converges numerically to a tailored base-surge policy recently observed in other work in a different but related dual-sourcing inventory model. As a consequence, there exists a sequence of capped base-stock policies that are asymptotically optimal as the lead time grows. We also numerically demonstrate its superior performance in general (including small lead times) by comparing it with otherwell-known heuristics.

Download Full-text

A Fast-Pivoting Algorithm for Whittle’s Restless Bandit Index

Mathematics ◽

10.3390/math8122226 ◽

2020 ◽

Vol 8 (12) ◽

pp. 2226 ◽

Cited By ~ 1

Author(s):

José Niño-Mora

Keyword(s):

Numerical Study ◽

Index Policy ◽

State Spaces ◽

Restless Bandit ◽

Restless Bandits ◽

Pivoting Algorithm ◽

Markov Decision ◽

Whittle Index ◽

Decision Epoch ◽

Change State

The Whittle index for restless bandits (two-action semi-Markov decision processes) provides an intuitively appealing optimal policy for controlling a single generic project that can be active (engaged) or passive (rested) at each decision epoch, and which can change state while passive. It further provides a practical heuristic priority-index policy for the computationally intractable multi-armed restless bandit problem, which has been widely applied over the last three decades in multifarious settings, yet mostly restricted to project models with a one-dimensional state. This is due in part to the difficulty of establishing indexability (existence of the index) and of computing the index for projects with large state spaces. This paper draws on the author’s prior results on sufficient indexability conditions and an adaptive-greedy algorithmic scheme for restless bandits to obtain a new fast-pivoting algorithm that computes the n Whittle index values of an n-state restless bandit by performing, after an initialization stage, n steps that entail (2/3)n3+O(n2) arithmetic operations. This algorithm also draws on the parametric simplex method, and is based on elucidating the pattern of parametric simplex tableaux, which allows to exploit special structure to substantially simplify and reduce the complexity of simplex pivoting steps. A numerical study demonstrates substantial runtime speed-ups versus alternative algorithms.

Download Full-text

Index Policies and Performance Bounds for Dynamic Selection Problems

Management Science ◽

10.1287/mnsc.2019.3342 ◽

2020 ◽

Vol 66 (7) ◽

pp. 3029-3050 ◽

Cited By ~ 2

Author(s):

David B. Brown ◽

James E. Smith

Keyword(s):

Large Scale ◽

Stochastic Dynamic ◽

Performance Bounds ◽

Index Policy ◽

Dynamic Selection ◽

Demand Learning ◽

Selection Problems ◽

Index Policies ◽

Dynamic Programs ◽

And Performance

We consider dynamic selection problems, where a decision maker repeatedly selects a set of items from a larger collection of available items. A classic example is the dynamic assortment problem with demand learning, where a retailer chooses items to offer for sale subject to a display space constraint. The retailer may adjust the assortment over time in response to the observed demand. These dynamic selection problems are naturally formulated as stochastic dynamic programs (DPs) but are difficult to solve because the optimal selection decisions depend on the states of all items. In this paper, we study heuristic policies for dynamic selection problems and provide upper bounds on the performance of an optimal policy that can be used to assess the performance of a heuristic policy. The policies and bounds that we consider are based on a Lagrangian relaxation of the DP that relaxes the constraint limiting the number of items that may be selected. We characterize the performance of the Lagrangian index policy and bound and show that, under mild conditions, these policies and bounds are asymptotically optimal for problems with many items; mixed policies and tiebreaking play an essential role in the analysis of these index policies and can have a surprising impact on performance. We demonstrate these policies and bounds in two large scale examples: a dynamic assortment problem with demand learning and an applicant screening problem. This paper was accepted by Yinyu Ye, optimization.

Download Full-text

index policy
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

Scheduling to Minimize Age of Incorrect Information with Imperfect Channel State Information

State-Aware Value Function Approximation with Attention Mechanism for Restless Multi-armed Bandits

A Restless Bandit Model for Resource Allocation, Competition, and Reservation

Whittle Index Policy for Opportunistic Scheduling: Heterogeneous Multistate Channels

Whittle Index Policy for Opportunistic Scheduling: Heterogeneous Two-State Channels

On the Global Optimality of Whittle’s Index Policy for Minimizing the Age of Information

Fast Two-Stage Computation of an Index Policy for Multi-Armed Bandits with Setup Delays

Understanding the Performance of Capped Base-Stock Policies in Lost-Sales Inventory Models

A Fast-Pivoting Algorithm for Whittle’s Restless Bandit Index

Index Policies and Performance Bounds for Dynamic Selection Problems

Export Citation Format

index policyRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Scheduling to Minimize Age of Incorrect Information with Imperfect Channel State Information

State-Aware Value Function Approximation with Attention Mechanism for Restless Multi-armed Bandits

A Restless Bandit Model for Resource Allocation, Competition, and Reservation

Whittle Index Policy for Opportunistic Scheduling: Heterogeneous Multistate Channels

Whittle Index Policy for Opportunistic Scheduling: Heterogeneous Two-State Channels

On the Global Optimality of Whittle’s Index Policy for Minimizing the Age of Information

Fast Two-Stage Computation of an Index Policy for Multi-Armed Bandits with Setup Delays

Understanding the Performance of Capped Base-Stock Policies in Lost-Sales Inventory Models

A Fast-Pivoting Algorithm for Whittle’s Restless Bandit Index

Index Policies and Performance Bounds for Dynamic Selection Problems

index policy
Recently Published Documents