Bandit problems with arbitrary side observations

Nash (1980) demonstrated that index policies are optimal for a class of generalised bandit problem. A transform of the index concerned has many of the attributes of the Gittins index. The transformed index is positive-valued, with maximal values yielding optimal actions. It may be characterised as the value of a restart problem and is hence computable via dynamic programming methodologies. The transformed index can also be used in procedures for policy evaluation.

Download Full-text

Gaussian multi-armed bandit problems with multiple objectives

2016 American Control Conference (ACC) ◽

10.1109/acc.2016.7526494 ◽

2016 ◽

Cited By ~ 2

Author(s):

Paul Reverdy

Keyword(s):

Multiple Objectives ◽

Bandit Problems

Download Full-text

Social Learning in One-Arm Bandit Problems

Econometrica ◽

10.1111/j.1468-0262.2007.00807.x ◽

2007 ◽

Vol 75 (6) ◽

pp. 1591-1611 ◽

Cited By ~ 43

Author(s):

Dinah Rosenberg ◽

Eilon Solan ◽

Nicolas Vieille

Keyword(s):

Social Learning ◽

Bandit Problems

Download Full-text

Foraging decisions as multi-armed bandit problems: Applying reinforcement learning algorithms to foraging data

Journal of Theoretical Biology ◽

10.1016/j.jtbi.2019.02.002 ◽

2019 ◽

Vol 467 ◽

pp. 48-56 ◽

Cited By ~ 2

Author(s):

Juliano Morimoto

Keyword(s):

Reinforcement Learning ◽

Learning Algorithms ◽

Bandit Problems ◽

Foraging Decisions

Download Full-text

Multi-armed bandit problems with heavy-tailed reward distributions

2011 49th Annual Allerton Conference on Communication, Control, and Computing (Allerton) ◽

10.1109/allerton.2011.6120206 ◽

2011 ◽

Cited By ~ 6

Author(s):

Keqin Liu ◽

Qing Zhao

Keyword(s):

Bandit Problems ◽

Heavy Tailed

Download Full-text

Approximate Indexability and Bandit Problems with Concave Rewards and Delayed Feedback

Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques - Lecture Notes in Computer Science ◽

10.1007/978-3-642-40328-6_14 ◽

2013 ◽

pp. 189-204

Author(s):

Sudipto Guha ◽

Kamesh Munagala

Keyword(s):

Delayed Feedback ◽

Bandit Problems

Download Full-text

One- and Two-Armed Bandit Problems

Encyclopedia of Statistical Sciences ◽

10.1002/0471667196.ess1852.pub2 ◽

2006 ◽

Author(s):

Donald A. Berry

Keyword(s):

Bandit Problems

Download Full-text

Corrections to “Satisficing in Multiarmed Bandit Problems”

IEEE Transactions on Automatic Control ◽

10.1109/tac.2020.2981433 ◽

2021 ◽

Vol 66 (1) ◽

pp. 476-478

Author(s):

Paul Reverdy ◽

Vaibhav Srivastava ◽

Naomi Ehrich Leonard

Keyword(s):

Bandit Problems ◽

Multiarmed Bandit

Download Full-text

Online Pandora’s Boxes and Bandits

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33011885 ◽

2019 ◽

Vol 33 ◽

pp. 1885-1892 ◽

Cited By ~ 1

Author(s):

Hossein Esfandiari ◽

MohammadTaghi HajiAghayi ◽

Brendan Lucier ◽

Michael Mitzenmacher

Keyword(s):

Decision Making ◽

Reinforcement Learning ◽

Approximation Algorithms ◽

Standard Model ◽

Decision Process ◽

Bandit Problems ◽

Knapsack Constraints ◽

Feasibility Constraints ◽

The Cost

We consider online variations of the Pandora’s box problem (Weitzman 1979), a standard model for understanding issues related to the cost of acquiring information for decision-making. Our problem generalizes both the classic Pandora’s box problem and the prophet inequality framework. Boxes are presented online, each with a random value and cost drawn jointly from some known distribution. Pandora chooses online whether to open each box given its cost, and then chooses irrevocably whether to keep the revealed prize or pass on it. We aim for approximation algorithms against adversaries that can choose the largest prize over any opened box, and use optimal offline policies to decide which boxes to open (without knowledge of the value inside)1. We consider variations where Pandora can collect multiple prizes subject to feasibility constraints, such as cardinality, matroid, or knapsack constraints. We also consider variations related to classic multi-armed bandit problems from reinforcement learning. Our results use a reduction-based framework where we separate the issues of the cost of acquiring information from the online decision process of which prizes to keep. Our work shows that in many scenarios, Pandora can achieve a good approximation to the best possible performance.

Download Full-text