bandit problems Latest Research Papers

We consider the general (stochastic) contextual bandit problem under the realizability assumption, that is, the expected reward, as a function of contexts and actions, belongs to a general function class [Formula: see text]. We design a fast and simple algorithm that achieves the statistically optimal regret with only [Formula: see text] calls to an offline regression oracle across all T rounds. The number of oracle calls can be further reduced to [Formula: see text] if T is known in advance. Our results provide the first universal and optimal reduction from contextual bandits to offline regression, solving an important open problem in the contextual bandit literature. A direct consequence of our results is that any advances in offline regression immediately translate to contextual bandits, statistically and computationally. This leads to faster algorithms and improved regret guarantees for broader classes of contextual bandit problems.

Download Full-text

Multiplayer Bandits Without Observing Collision Information

Mathematics of Operations Research ◽

10.1287/moor.2021.1168 ◽

2021 ◽

Author(s):

Gábor Lugosi ◽

Abbas Mehrabian

Keyword(s):

Nash Equilibria ◽

Square Root ◽

Bandit Problems ◽

Approximate Nash Equilibria ◽

Regret Bounds ◽

Multiarmed Bandit

We study multiplayer stochastic multiarmed bandit problems in which the players cannot communicate, and if two or more players pull the same arm, a collision occurs and the involved players receive zero reward. We consider two feedback models: a model in which the players can observe whether a collision has occurred and a more difficult setup in which no collision information is available. We give the first theoretical guarantees for the second model: an algorithm with a logarithmic regret and an algorithm with a square-root regret that does not depend on the gaps between the means. For the first model, we give the first square-root regret bounds that do not depend on the gaps. Building on these ideas, we also give an algorithm for reaching approximate Nash equilibria quickly in stochastic anticoordination games.

Download Full-text

Solving Bernoulli Bandit Problems for Weather-relative Overhead Distribution Line Failures Forecasting

10.1109/pesgm46819.2021.9638153 ◽

2021 ◽

Author(s):

Shengyi Wang ◽

Hongbo Sun ◽

Kyeong Jin Kim ◽

Jianlin Guo ◽

Daniel Nikovski

Keyword(s):

Bandit Problems ◽

Distribution Line

Download Full-text

An index-based deterministic convergent optimal algorithm for constrained multi-armed bandit problems

Automatica ◽

10.1016/j.automatica.2021.109673 ◽

2021 ◽

Vol 129 ◽

pp. 109673

Author(s):

Hyeong Soo Chang

Keyword(s):

Optimal Algorithm ◽

Bandit Problems

Download Full-text

Three heads are better than two: Comparing learning properties and performances across individuals, dyads, and triads through a computational approach

PLoS ONE ◽

10.1371/journal.pone.0252122 ◽

2021 ◽

Vol 16 (6) ◽

pp. e0252122

Author(s):

Tsutomu Harada

Keyword(s):

Group Size ◽

High Performance ◽

Computational Approach ◽

Inverse Temperature ◽

Bandit Problems ◽

Individual Group ◽

Group Members ◽

Instrumental Role ◽

Better Than ◽

Group Effects

Although it is considered that two heads are better than one, related studies argued that groups rarely outperform their best members. This study examined not only whether two heads are better than one but also whether three heads are better than two or one in the context of two-armed bandit problems where learning plays an instrumental role in achieving high performance. This research revealed that a U-shaped correlation exists between performance and group size. The performance was highest for either individuals or triads, but the lowest for dyads. Moreover, this study estimated learning properties and determined that high inverse temperature (exploitation) accounted for high performance. In particular, it was shown that group effects regarding the inverse temperatures in dyads did not generate higher values to surpass the averages of their two group members. In contrast, triads gave rise to higher values of the inverse temperatures than their averages of their individual group members. These results were consistent with our proposed hypothesis that learning coherence is likely to emerge in individuals and triads, but not in dyads, which in turn leads to higher performance. This hypothesis is based on the classical argument by Simmel stating that while dyads are likely to involve more emotion and generate greater variability, triads are the smallest structure which tends to constrain emotions, reduce individuality, and generate behavioral convergences or uniformity because of the ‘‘two against one” social pressures. As a result, three heads or one head were better than two in our study.

Download Full-text

Non-stationary Stochastic Multi-armed Bandit Problems with External Information on Stationarity

Transactions of the Japanese Society for Artificial Intelligence ◽

10.1527/tjsai.36-3_d-k84 ◽

2021 ◽

Vol 36 (3) ◽

pp. D-K84_1-11

Author(s):

Hiroyuki Namba

Keyword(s):

External Information ◽

Bandit Problems

Download Full-text

Globally Informative Thompson Sampling for Structured Bandit Problems with Application to CrowdTranscoding

2021 International Conference on Artificial Intelligence in Information and Communication (ICAIIC) ◽

10.1109/icaiic51459.2021.9415255 ◽

2021 ◽

Author(s):

Xingchi Liu ◽

Mahsa Derakhshani ◽

Ziming Zhu ◽

Sangarapillai Lambotharan

Keyword(s):

Bandit Problems ◽

Thompson Sampling

Download Full-text

Corrections to “Satisficing in Multiarmed Bandit Problems”

IEEE Transactions on Automatic Control ◽

10.1109/tac.2020.2981433 ◽

2021 ◽

Vol 66 (1) ◽

pp. 476-478

Author(s):

Paul Reverdy ◽

Vaibhav Srivastava ◽

Naomi Ehrich Leonard

Keyword(s):

Bandit Problems ◽

Multiarmed Bandit

Download Full-text

Attention Please!

Econometrica ◽

10.3982/ecta17173 ◽

2021 ◽

Vol 89 (4) ◽

pp. 1717-1751

Author(s):

Olivier Gossner ◽

Jakub Steiner ◽

Colin Stewart

Keyword(s):

Decision Maker ◽

Bandit Problems ◽

Outside Option ◽

The Impact ◽

Directing Attention

We study the impact of manipulating the attention of a decision‐maker who learns sequentially about a number of items before making a choice. Under natural assumptions on the decision‐maker's strategy, directing attention toward one item increases its likelihood of being chosen regardless of its value. This result applies when the decision‐maker can reject all items in favor of an outside option with known value; if no outside option is available, the direction of the effect of manipulation depends on the value of the item. A similar result applies to manipulation of choices in bandit problems.

Download Full-text

Perspective and Complements

Probability in Electrical Engineering and Computer Science ◽

10.1007/978-3-030-49995-2_15 ◽

2021 ◽

pp. 271-307

Author(s):

Jean Walrand

Keyword(s):

Information Theory ◽

Communication Systems ◽

Random Variable ◽

General Question ◽

Noisy Channel ◽

Bandit Problems ◽

Large Numbers ◽

Boosting Algorithm ◽

Main Ideas ◽

Important Notion

AbstractWe have explored a number of topics motivated by concrete applications. It is time to stitch together these ideas into a complete panorama. In addition, we provide some complements.Section 15.1 discusses the general question of inference: what can one deduce from observations? Section 15.2 explains the important notion of sufficient statistic: what is the relevant data in a set of observations? Section 15.3 presents the theory of Markov chains where the number of states is infinite. Section 15.4 explains the Poisson process. Section 15.5 discusses the boosting algorithm for choosing among experts. What drug should you further research; what noisy channel should one use? These are examples of multi-armed bandit problems. In such problems one faces the trade-off between exploiting known possibilities versus exploring potentially more rewarding but less well understood alternatives. Section 15.6 explains a key results for such multi-armed bandit problems. Information Theory studies the limits of communication systems: how fast can one transmit bits reliably over a noisy channel? How many bits should be transmitted to convey some information? Section 15.7 introduces some key concepts and results of Information Theory. When estimating the likelihood of errors or the reliability of some estimates, one usually has to calculate bounds on the probability that a random variable exceeds a given value. Section 15.8 discusses some useful probability bounds. Section 15.9 explains the main ideas of the theory of martingales and shows how it provides a proof of the law of large numbers.

Download Full-text

bandit problems
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

Bypassing the Monster: A Faster and Simpler Optimal Algorithm for Contextual Bandits Under Realizability

Multiplayer Bandits Without Observing Collision Information

Solving Bernoulli Bandit Problems for Weather-relative Overhead Distribution Line Failures Forecasting

An index-based deterministic convergent optimal algorithm for constrained multi-armed bandit problems

Three heads are better than two: Comparing learning properties and performances across individuals, dyads, and triads through a computational approach

Non-stationary Stochastic Multi-armed Bandit Problems with External Information on Stationarity

Globally Informative Thompson Sampling for Structured Bandit Problems with Application to CrowdTranscoding

Corrections to “Satisficing in Multiarmed Bandit Problems”

Attention Please!

Perspective and Complements

Export Citation Format

bandit problemsRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Bypassing the Monster: A Faster and Simpler Optimal Algorithm for Contextual Bandits Under Realizability

Multiplayer Bandits Without Observing Collision Information

Solving Bernoulli Bandit Problems for Weather-relative Overhead Distribution Line Failures Forecasting

An index-based deterministic convergent optimal algorithm for constrained multi-armed bandit problems

Three heads are better than two: Comparing learning properties and performances across individuals, dyads, and triads through a computational approach

Non-stationary Stochastic Multi-armed Bandit Problems with External Information on Stationarity

Globally Informative Thompson Sampling for Structured Bandit Problems with Application to CrowdTranscoding

Corrections to “Satisficing in Multiarmed Bandit Problems”

Attention Please!

Perspective and Complements

bandit problems
Recently Published Documents