Bandit problems with arbitrary side observations

Author(s):  
Chih-Chun Wang ◽  
S.R. Kulkami ◽  
H.V. Poor
Keyword(s):  
1995 ◽  
Vol 32 (1) ◽  
pp. 168-182 ◽  
Author(s):  
K. D. Glazebrook ◽  
S. Greatrix

Nash (1980) demonstrated that index policies are optimal for a class of generalised bandit problem. A transform of the index concerned has many of the attributes of the Gittins index. The transformed index is positive-valued, with maximal values yielding optimal actions. It may be characterised as the value of a restart problem and is hence computable via dynamic programming methodologies. The transformed index can also be used in procedures for policy evaluation.


Econometrica ◽  
2007 ◽  
Vol 75 (6) ◽  
pp. 1591-1611 ◽  
Author(s):  
Dinah Rosenberg ◽  
Eilon Solan ◽  
Nicolas Vieille

2021 ◽  
Vol 66 (1) ◽  
pp. 476-478
Author(s):  
Paul Reverdy ◽  
Vaibhav Srivastava ◽  
Naomi Ehrich Leonard

Author(s):  
Hossein Esfandiari ◽  
MohammadTaghi HajiAghayi ◽  
Brendan Lucier ◽  
Michael Mitzenmacher

We consider online variations of the Pandora’s box problem (Weitzman 1979), a standard model for understanding issues related to the cost of acquiring information for decision-making. Our problem generalizes both the classic Pandora’s box problem and the prophet inequality framework. Boxes are presented online, each with a random value and cost drawn jointly from some known distribution. Pandora chooses online whether to open each box given its cost, and then chooses irrevocably whether to keep the revealed prize or pass on it. We aim for approximation algorithms against adversaries that can choose the largest prize over any opened box, and use optimal offline policies to decide which boxes to open (without knowledge of the value inside)1. We consider variations where Pandora can collect multiple prizes subject to feasibility constraints, such as cardinality, matroid, or knapsack constraints. We also consider variations related to classic multi-armed bandit problems from reinforcement learning. Our results use a reduction-based framework where we separate the issues of the cost of acquiring information from the online decision process of which prizes to keep. Our work shows that in many scenarios, Pandora can achieve a good approximation to the best possible performance.


Sign in / Sign up

Export Citation Format

Share Document