index policy
Recently Published Documents


TOTAL DOCUMENTS

67
(FIVE YEARS 19)

H-INDEX

14
(FIVE YEARS 2)

Entropy ◽  
2021 ◽  
Vol 23 (12) ◽  
pp. 1572
Author(s):  
Yutao Chen ◽  
Anthony Ephremides

In this paper, we study a slotted-time system where a base station needs to update multiple users at the same time. Due to the limited resources, only part of the users can be updated in each time slot. We consider the problem of minimizing the Age of Incorrect Information (AoII) when imperfect Channel State Information (CSI) is available. Leveraging the notion of the Markov Decision Process (MDP), we obtain the structural properties of the optimal policy. By introducing a relaxed version of the original problem, we develop the Whittle’s index policy under a simple condition. However, indexability is required to ensure the existence of Whittle’s index. To avoid indexability, we develop Indexed priority policy based on the optimal policy for the relaxed problem. Finally, numerical results are laid out to showcase the application of the derived structural properties and highlight the performance of the developed scheduling policies.


Author(s):  
Shuang Wu ◽  
Jingyu Zhao ◽  
Guangjian Tian ◽  
Jun Wang

The restless multi-armed bandit (RMAB) problem is a generalization of the multi-armed bandit with non-stationary rewards. Its optimal solution is intractable due to exponentially large state and action spaces with respect to the number of arms. Existing approximation approaches, e.g., Whittle's index policy, have difficulty in capturing either temporal or spatial factors such as impacts from other arms. We propose considering both factors using the attention mechanism, which has achieved great success in deep learning. Our state-aware value function approximation solution comprises an attention-based value function approximator and a Bellman equation solver. The attention-based coordination module capture both spatial and temporal factors for arm coordination. The Bellman equation solver utilizes the decoupling structure of RMABs to acquire solutions with significantly reduced computation overheads. In particular, the time complexity of our approximation is linear in the number of arms. Finally, we illustrate the effectiveness and investigate the properties of our proposed method with numerical experiments.


2021 ◽  
Author(s):  
Jing Fu ◽  
Bill Moran ◽  
Peter G. Taylor

In “A Restless Bandit Model for Resource Allocation, Competition and Reservation,” J. Fu, B. Moran, and P. G. Taylor study a resource allocation problem with varying requests and with resources of limited capacity shared by multiple requests. This problem is modeled as a set of heterogeneous restless multi-armed bandit problems (RMABPs) connected by constraints imposed by resource capacity. Following Whittle’s idea of relaxing the constraints and Weber and Weiss’s proof of asymptotic optimality, the authors propose an index policy and establish conditions for it to be asymptotically optimal in a regime where both arrival rates and capacities increase. In particular, they provide a simple sufficient condition for asymptotic optimality of the policy and, in complete generality, propose a method that generates a set of candidate policies for which asymptotic optimality can be checked. Via numerical experiments, they demonstrate the effectiveness of these results even in the pre-limit case.


Mathematics ◽  
2020 ◽  
Vol 9 (1) ◽  
pp. 52
Author(s):  
José Niño-Mora

We consider the multi-armed bandit problem with penalties for switching that include setup delays and costs, extending the former results of the author for the special case with no switching delays. A priority index for projects with setup delays that characterizes, in part, optimal policies was introduced by Asawa and Teneketzis in 1996, yet without giving a means of computing it. We present a fast two-stage index computing method, which computes the continuation index (which applies when the project has been set up) in a first stage and certain extra quantities with cubic (arithmetic-operation) complexity in the number of project states and then computes the switching index (which applies when the project is not set up), in a second stage, with quadratic complexity. The approach is based on new methodological advances on restless bandit indexation, which are introduced and deployed herein, being motivated by the limitations of previous results, exploiting the fact that the aforementioned index is the Whittle index of the project in its restless reformulation. A numerical study demonstrates substantial runtime speed-ups of the new two-stage index algorithm versus a general one-stage Whittle index algorithm. The study further gives evidence that, in a multi-project setting, the index policy is consistently nearly optimal.


2020 ◽  
Author(s):  
Linwei Xin

Single-sourcing lost-sales inventory systems with lead times are notoriously difficult to optimize. In this paper, we propose a new family of capped base-stock policies and provide a new perspective on constructing a practical hybrid policy combining two well-known heuristics: base-stock and constant-order policies. Each capped base-stock policy is associated with two parameters: a base-stock level and an order cap. We prove that for any fixed order cap, the capped base-stock policy converges exponentially fast in the base-stock level to a constant-order policy, providing a theoretical foundation for a phenomenon by which a capped dual-index policy converges numerically to a tailored base-surge policy recently observed in other work in a different but related dual-sourcing inventory model. As a consequence, there exists a sequence of capped base-stock policies that are asymptotically optimal as the lead time grows. We also numerically demonstrate its superior performance in general (including small lead times) by comparing it with otherwell-known heuristics.


Mathematics ◽  
2020 ◽  
Vol 8 (12) ◽  
pp. 2226 ◽  
Author(s):  
José Niño-Mora

The Whittle index for restless bandits (two-action semi-Markov decision processes) provides an intuitively appealing optimal policy for controlling a single generic project that can be active (engaged) or passive (rested) at each decision epoch, and which can change state while passive. It further provides a practical heuristic priority-index policy for the computationally intractable multi-armed restless bandit problem, which has been widely applied over the last three decades in multifarious settings, yet mostly restricted to project models with a one-dimensional state. This is due in part to the difficulty of establishing indexability (existence of the index) and of computing the index for projects with large state spaces. This paper draws on the author’s prior results on sufficient indexability conditions and an adaptive-greedy algorithmic scheme for restless bandits to obtain a new fast-pivoting algorithm that computes the n Whittle index values of an n-state restless bandit by performing, after an initialization stage, n steps that entail (2/3)n3+O(n2) arithmetic operations. This algorithm also draws on the parametric simplex method, and is based on elucidating the pattern of parametric simplex tableaux, which allows to exploit special structure to substantially simplify and reduce the complexity of simplex pivoting steps. A numerical study demonstrates substantial runtime speed-ups versus alternative algorithms.


2020 ◽  
Vol 66 (7) ◽  
pp. 3029-3050 ◽  
Author(s):  
David B. Brown ◽  
James E. Smith

We consider dynamic selection problems, where a decision maker repeatedly selects a set of items from a larger collection of available items. A classic example is the dynamic assortment problem with demand learning, where a retailer chooses items to offer for sale subject to a display space constraint. The retailer may adjust the assortment over time in response to the observed demand. These dynamic selection problems are naturally formulated as stochastic dynamic programs (DPs) but are difficult to solve because the optimal selection decisions depend on the states of all items. In this paper, we study heuristic policies for dynamic selection problems and provide upper bounds on the performance of an optimal policy that can be used to assess the performance of a heuristic policy. The policies and bounds that we consider are based on a Lagrangian relaxation of the DP that relaxes the constraint limiting the number of items that may be selected. We characterize the performance of the Lagrangian index policy and bound and show that, under mild conditions, these policies and bounds are asymptotically optimal for problems with many items; mixed policies and tiebreaking play an essential role in the analysis of these index policies and can have a surprising impact on performance. We demonstrate these policies and bounds in two large scale examples: a dynamic assortment problem with demand learning and an applicant screening problem. This paper was accepted by Yinyu Ye, optimization.


Sign in / Sign up

Export Citation Format

Share Document