restless bandit Latest Research Papers

Multi-armed restless bandit tasks are regularly applied in psychology and cognitive neuroscience to assess exploration and exploitation behavior in structured environments. These models are also readily applied to examine effects of (virtual) brain lesions on performance, and to infer neurocomputational mechanisms using neuroimaging or pharmacological approaches. However, to infer individual, psychologically meaningful parameters from such data, computational cognitive modeling is typically applied. Recent studies indicate that softmax (SM) decision rule models that include a representation of environmental dynamics (e.g. the Kalman Filter) and additional parameters for modeling exploration and perseveration (Kalman SMEP) fit human bandit task data better than competing models. Parameter and model recovery are two central requirements for computational models: parameter recovery refers to the ability to recover true data-generating parameters; model recovery refers to the ability to correctly identify the true data generating model using model comparison techniques. Here we comprehensively examined parameter and model recovery of the Kalman SMEP model as well as nested model versions, i.e. models without the additional parameters, using simulation and Bayesian inference. Parameter recovery improved with increasing trial numbers, from around .8 for 100 trials to around .93 for 300 trials. Model recovery analyses likewise confirmed acceptable recovery of the Kalman SMEP model. Model recovery was lower for nested Kalman filter models as well as delta rule models with fixed learning rates. Exploratory analyses examined associations of model parameters with model-free performance metrics. Random exploration, captured by the inverse softmax temperature, was associated with lower accuracy and more switches. For the exploration bonus parameter modeling directed exploration, we confirmed an inverse- U-shaped association with accuracy, such that both an excess and a lack of directed exploration reduced accuracy. Taken together, these analyses underline that the Kalman SMEP model fulfills basic requirements of a cognitive model.

Download Full-text

An online algorithm for the risk-aware restless bandit

European Journal of Operational Research ◽

10.1016/j.ejor.2020.08.028 ◽

2021 ◽

Vol 290 (2) ◽

pp. 622-639

Author(s):

Jianyu Xu ◽

Lujie Chen ◽

Ou Tang

Keyword(s):

Online Algorithm ◽

Restless Bandit

Download Full-text

A Restless Bandit Model for Resource Allocation, Competition, and Reservation

Operations Research ◽

10.1287/opre.2020.2066 ◽

2021 ◽

Author(s):

Jing Fu ◽

Bill Moran ◽

Peter G. Taylor

Keyword(s):

Resource Allocation ◽

Numerical Experiments ◽

Asymptotic Optimality ◽

Sufficient Condition ◽

Limited Capacity ◽

Index Policy ◽

Asymptotically Optimal ◽

Restless Bandit ◽

Simple Sufficient Condition ◽

Resource Capacity

In “A Restless Bandit Model for Resource Allocation, Competition and Reservation,” J. Fu, B. Moran, and P. G. Taylor study a resource allocation problem with varying requests and with resources of limited capacity shared by multiple requests. This problem is modeled as a set of heterogeneous restless multi-armed bandit problems (RMABPs) connected by constraints imposed by resource capacity. Following Whittle’s idea of relaxing the constraints and Weber and Weiss’s proof of asymptotic optimality, the authors propose an index policy and establish conditions for it to be asymptotically optimal in a regime where both arrival rates and capacities increase. In particular, they provide a simple sufficient condition for asymptotic optimality of the policy and, in complete generality, propose a method that generates a set of candidate policies for which asymptotic optimality can be checked. Via numerical experiments, they demonstrate the effectiveness of these results even in the pre-limit case.

Download Full-text

A Fast-Pivoting Algorithm for Whittle’s Restless Bandit Index

Mathematics ◽

10.3390/math8122226 ◽

2020 ◽

Vol 8 (12) ◽

pp. 2226 ◽

Cited By ~ 1

Author(s):

José Niño-Mora

Keyword(s):

Numerical Study ◽

Index Policy ◽

State Spaces ◽

Restless Bandit ◽

Restless Bandits ◽

Pivoting Algorithm ◽

Markov Decision ◽

Whittle Index ◽

Decision Epoch ◽

Change State

The Whittle index for restless bandits (two-action semi-Markov decision processes) provides an intuitively appealing optimal policy for controlling a single generic project that can be active (engaged) or passive (rested) at each decision epoch, and which can change state while passive. It further provides a practical heuristic priority-index policy for the computationally intractable multi-armed restless bandit problem, which has been widely applied over the last three decades in multifarious settings, yet mostly restricted to project models with a one-dimensional state. This is due in part to the difficulty of establishing indexability (existence of the index) and of computing the index for projects with large state spaces. This paper draws on the author’s prior results on sufficient indexability conditions and an adaptive-greedy algorithmic scheme for restless bandits to obtain a new fast-pivoting algorithm that computes the n Whittle index values of an n-state restless bandit by performing, after an initialization stage, n steps that entail (2/3)n3+O(n2) arithmetic operations. This algorithm also draws on the parametric simplex method, and is based on elucidating the pattern of parametric simplex tableaux, which allows to exploit special structure to substantially simplify and reduce the complexity of simplex pivoting steps. A numerical study demonstrates substantial runtime speed-ups versus alternative algorithms.

Download Full-text

Multi-machine preventive maintenance scheduling with imperfect interventions: A restless bandit approach

Computers & Operations Research ◽

10.1016/j.cor.2020.104927 ◽

2020 ◽

Vol 119 ◽

pp. 104927

Author(s):

Diego Ruiz-Hernández ◽

Jesús M. Pinar-Pérez ◽

David Delgado-Gómez

Keyword(s):

Preventive Maintenance ◽

Maintenance Scheduling ◽

Restless Bandit

Download Full-text

Dynamic spectrum access under partial observations: A restless bandit approach

2019 16th Canadian Workshop on Information Theory (CWIT) ◽

10.1109/cwit.2019.8929931 ◽

2019 ◽

Cited By ~ 1

Author(s):

Nima Akbarzadeh ◽

Aditya Mahajan

Keyword(s):

Dynamic Spectrum Access ◽

Dynamic Spectrum ◽

Spectrum Access ◽

Partial Observations ◽

Restless Bandit

Download Full-text

Optimal Screening for Hepatocellular Carcinoma: A Restless Bandit Model

Manufacturing & Service Operations Management ◽

10.1287/msom.2017.0697 ◽

2019 ◽

Vol 21 (1) ◽

pp. 198-212 ◽

Cited By ~ 6

Author(s):

Elliot Lee ◽

Mariel S. Lavieri ◽

Michael Volk

Keyword(s):

Hepatocellular Carcinoma ◽

Restless Bandit ◽

Optimal Screening

Download Full-text

Aversion to Option Loss in a Restless Bandit Task

Computational Brain & Behavior ◽

10.1007/s42113-018-0010-8 ◽

2018 ◽

Vol 1 (2) ◽

pp. 151-164 ◽

Cited By ~ 4

Author(s):

Danielle J. Navarro ◽

Peter Tran ◽

Nicole Baz

Keyword(s):

Restless Bandit

Download Full-text

Aversion to option loss in a restless bandit task

10.31234/osf.io/3g4p5 ◽

2018 ◽

Author(s):

Danielle Navarro ◽

Peter Tran ◽

Nicole Baz

Keyword(s):

Loss Aversion ◽

Kalman Filters ◽

Dynamic Environments ◽

Human Learning ◽

Judgment And Decision Making ◽

Full Information ◽

Filter Model ◽

Restless Bandit ◽

Thompson Sampling ◽

The World

In everyday life people need to make choices without full information about the environment, which poses an explore-exploit dilemma in which one needs to balance the need to learn about the world and the need to obtain rewards from it. The explore-exploit dilemma is often studied using the multi-armed restless bandit task, in which people repeatedly select from multiple options, and human behaviour is modelled as a form of reinforcement learning via Kalman filters. Inspired by work in the judgment and decision-making literature, we present two experiments using multi- armed bandit tasks in both static and dynamic environments, in situations where options can become unviable and vanish if they are not pursued. A Kalman filter model using Thompson sampling provides an excellent account of human learning in a standard restless bandit task, but there are systematic departures in the vanishing bandit task. We estimate the structure of this loss aversion signal and consider theoretical explanations for the results.

Download Full-text

restless bandit
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

Optimal Myopic Policy for Restless Bandit: A Perspective of Eigendecomposition

Parameter and model recovery of reinforcement learning models for restless bandit problems

An online algorithm for the risk-aware restless bandit

A Restless Bandit Model for Resource Allocation, Competition, and Reservation

A Fast-Pivoting Algorithm for Whittle’s Restless Bandit Index

Multi-machine preventive maintenance scheduling with imperfect interventions: A restless bandit approach

Dynamic spectrum access under partial observations: A restless bandit approach

Optimal Screening for Hepatocellular Carcinoma: A Restless Bandit Model

Aversion to Option Loss in a Restless Bandit Task

Aversion to option loss in a restless bandit task

Export Citation Format

restless banditRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Optimal Myopic Policy for Restless Bandit: A Perspective of Eigendecomposition

Parameter and model recovery of reinforcement learning models for restless bandit problems

An online algorithm for the risk-aware restless bandit

A Restless Bandit Model for Resource Allocation, Competition, and Reservation

A Fast-Pivoting Algorithm for Whittle’s Restless Bandit Index

Multi-machine preventive maintenance scheduling with imperfect interventions: A restless bandit approach

Dynamic spectrum access under partial observations: A restless bandit approach

Optimal Screening for Hepatocellular Carcinoma: A Restless Bandit Model

Aversion to Option Loss in a Restless Bandit Task

Aversion to option loss in a restless bandit task

restless bandit
Recently Published Documents