restless bandit
Recently Published Documents


TOTAL DOCUMENTS

54
(FIVE YEARS 8)

H-INDEX

12
(FIVE YEARS 1)

Author(s):  
Kehao Wang ◽  
Jihong Yu ◽  
Lin Chen ◽  
Pan Zhou ◽  
Moe Win

2021 ◽  
Author(s):  
Ludwig Danwitz ◽  
David Mathar ◽  
Elke Smith ◽  
Deniz Tuzsus ◽  
Jan Peters

Multi-armed restless bandit tasks are regularly applied in psychology and cognitive neuroscience to assess exploration and exploitation behavior in structured environments. These models are also readily applied to examine effects of (virtual) brain lesions on performance, and to infer neurocomputational mechanisms using neuroimaging or pharmacological approaches. However, to infer individual, psychologically meaningful parameters from such data, computational cognitive modeling is typically applied. Recent studies indicate that softmax (SM) decision rule models that include a representation of environmental dynamics (e.g. the Kalman Filter) and additional parameters for modeling exploration and perseveration (Kalman SMEP) fit human bandit task data better than competing models. Parameter and model recovery are two central requirements for computational models: parameter recovery refers to the ability to recover true data-generating parameters; model recovery refers to the ability to correctly identify the true data generating model using model comparison techniques. Here we comprehensively examined parameter and model recovery of the Kalman SMEP model as well as nested model versions, i.e. models without the additional parameters, using simulation and Bayesian inference. Parameter recovery improved with increasing trial numbers, from around .8 for 100 trials to around .93 for 300 trials. Model recovery analyses likewise confirmed acceptable recovery of the Kalman SMEP model. Model recovery was lower for nested Kalman filter models as well as delta rule models with fixed learning rates. Exploratory analyses examined associations of model parameters with model-free performance metrics. Random exploration, captured by the inverse softmax temperature, was associated with lower accuracy and more switches. For the exploration bonus parameter modeling directed exploration, we confirmed an inverse- U-shaped association with accuracy, such that both an excess and a lack of directed exploration reduced accuracy. Taken together, these analyses underline that the Kalman SMEP model fulfills basic requirements of a cognitive model.


2021 ◽  
Vol 290 (2) ◽  
pp. 622-639
Author(s):  
Jianyu Xu ◽  
Lujie Chen ◽  
Ou Tang

2021 ◽  
Author(s):  
Jing Fu ◽  
Bill Moran ◽  
Peter G. Taylor

In “A Restless Bandit Model for Resource Allocation, Competition and Reservation,” J. Fu, B. Moran, and P. G. Taylor study a resource allocation problem with varying requests and with resources of limited capacity shared by multiple requests. This problem is modeled as a set of heterogeneous restless multi-armed bandit problems (RMABPs) connected by constraints imposed by resource capacity. Following Whittle’s idea of relaxing the constraints and Weber and Weiss’s proof of asymptotic optimality, the authors propose an index policy and establish conditions for it to be asymptotically optimal in a regime where both arrival rates and capacities increase. In particular, they provide a simple sufficient condition for asymptotic optimality of the policy and, in complete generality, propose a method that generates a set of candidate policies for which asymptotic optimality can be checked. Via numerical experiments, they demonstrate the effectiveness of these results even in the pre-limit case.


Mathematics ◽  
2020 ◽  
Vol 8 (12) ◽  
pp. 2226 ◽  
Author(s):  
José Niño-Mora

The Whittle index for restless bandits (two-action semi-Markov decision processes) provides an intuitively appealing optimal policy for controlling a single generic project that can be active (engaged) or passive (rested) at each decision epoch, and which can change state while passive. It further provides a practical heuristic priority-index policy for the computationally intractable multi-armed restless bandit problem, which has been widely applied over the last three decades in multifarious settings, yet mostly restricted to project models with a one-dimensional state. This is due in part to the difficulty of establishing indexability (existence of the index) and of computing the index for projects with large state spaces. This paper draws on the author’s prior results on sufficient indexability conditions and an adaptive-greedy algorithmic scheme for restless bandits to obtain a new fast-pivoting algorithm that computes the n Whittle index values of an n-state restless bandit by performing, after an initialization stage, n steps that entail (2/3)n3+O(n2) arithmetic operations. This algorithm also draws on the parametric simplex method, and is based on elucidating the pattern of parametric simplex tableaux, which allows to exploit special structure to substantially simplify and reduce the complexity of simplex pivoting steps. A numerical study demonstrates substantial runtime speed-ups versus alternative algorithms.


2020 ◽  
Vol 119 ◽  
pp. 104927
Author(s):  
Diego Ruiz-Hernández ◽  
Jesús M. Pinar-Pérez ◽  
David Delgado-Gómez

2019 ◽  
Vol 21 (1) ◽  
pp. 198-212 ◽  
Author(s):  
Elliot Lee ◽  
Mariel S. Lavieri ◽  
Michael Volk

2018 ◽  
Vol 1 (2) ◽  
pp. 151-164 ◽  
Author(s):  
Danielle J. Navarro ◽  
Peter Tran ◽  
Nicole Baz
Keyword(s):  

2018 ◽  
Author(s):  
Danielle Navarro ◽  
Peter Tran ◽  
Nicole Baz

In everyday life people need to make choices without full information about the environment, which poses an explore-exploit dilemma in which one needs to balance the need to learn about the world and the need to obtain rewards from it. The explore-exploit dilemma is often studied using the multi-armed restless bandit task, in which people repeatedly select from multiple options, and human behaviour is modelled as a form of reinforcement learning via Kalman filters. Inspired by work in the judgment and decision-making literature, we present two experiments using multi- armed bandit tasks in both static and dynamic environments, in situations where options can become unviable and vanish if they are not pursued. A Kalman filter model using Thompson sampling provides an excellent account of human learning in a standard restless bandit task, but there are systematic departures in the vanishing bandit task. We estimate the structure of this loss aversion signal and consider theoretical explanations for the results.


Sign in / Sign up

Export Citation Format

Share Document