gittins index
Recently Published Documents


TOTAL DOCUMENTS

50
(FIVE YEARS 2)

H-INDEX

15
(FIVE YEARS 1)

2020 ◽  
Author(s):  
Daniel Russo

This note gives a short, self-contained proof of a sharp connection between Gittins indices and Bayesian upper confidence bound algorithms. I consider a Gaussian multiarmed bandit problem with discount factor [Formula: see text]. The Gittins index of an arm is shown to equal the [Formula: see text]-quantile of the posterior distribution of the arm's mean plus an error term that vanishes as [Formula: see text]. In this sense, for sufficiently patient agents, a Gittins index measures the highest plausible mean-reward of an arm in a manner equivalent to an upper confidence bound.


2018 ◽  
Vol 12 (1) ◽  
pp. 110-118 ◽  
Author(s):  
Cheng Tan ◽  
Changbao Xu ◽  
Lin Yang ◽  
Wing Shing Wong

2016 ◽  
Vol 31 (2) ◽  
pp. 239-263 ◽  
Author(s):  
James Edwards ◽  
Paul Fearnhead ◽  
Kevin Glazebrook

The knowledge gradient (KG) policy was originally proposed for online ranking and selection problems but has recently been adapted for use in online decision-making in general and multi-armed bandit problems (MABs) in particular. We study its use in a class of exponential family MABs and identify weaknesses, including a propensity to take actions which are dominated with respect to both exploitation and exploration. We propose variants of KG which avoid such errors. These new policies include an index heuristic, which deploys a KG approach to develop an approximation to the Gittins index. A numerical study shows this policy to perform well over a range of MABs including those for which index policies are not optimal. While KG does not take dominated actions when bandits are Gaussian, it fails to be index consistent and appears not to enjoy a performance advantage over competitor policies when arms are correlated to compensate for its greater computational demands.


2016 ◽  
Vol 48 (1) ◽  
pp. 112-136 ◽  
Author(s):  
Zi Ding ◽  
Ilya O. Ryzhov

Abstract We propose a novel theoretical characterization of the optimal 'Gittins index' policy in multi-armed bandit problems with non-Gaussian, infinitely divisible reward distributions. We first construct a continuous-time, conditional Lévy process which probabilistically interpolates the sequence of discrete-time rewards. When the rewards are Gaussian, this approach enables an easy connection to the convenient time-change properties of a Brownian motion. Although no such device is available in general for the non-Gaussian case, we use optimal stopping theory to characterize the value of the optimal policy as the solution to a free-boundary partial integro-differential equation (PIDE). We provide the free-boundary PIDE in explicit form under the specific settings of exponential and Poisson rewards. We also prove continuity and monotonicity properties of the Gittins index in these two problems, and discuss how the PIDE can be solved numerically to find the optimal index value of a given belief state.


2014 ◽  
Vol 51 (02) ◽  
pp. 492-511
Author(s):  
Martin Klimmek

Consider the classic infinite-horizon problem of stopping a one-dimensional diffusion to optimise between running and terminal rewards, and suppose that we are given a parametrised family of such problems. We provide a general theory of parameter dependence in infinite-horizon stopping problems for which threshold strategies are optimal. The crux of the approach is a supermodularity condition which guarantees that the family of problems is indexable by a set-valued map which we call the indifference map. This map is a natural generalisation of the allocation (Gittins) index, a classical quantity in the theory of dynamic allocation. Importantly, the notion of indexability leads to a framework for inverse optimal stopping problems.


Sign in / Sign up

Export Citation Format

Share Document