bandit problems
Recently Published Documents


TOTAL DOCUMENTS

166
(FIVE YEARS 23)

H-INDEX

23
(FIVE YEARS 2)

Author(s):  
David Simchi-Levi ◽  
Yunzong Xu

We consider the general (stochastic) contextual bandit problem under the realizability assumption, that is, the expected reward, as a function of contexts and actions, belongs to a general function class [Formula: see text]. We design a fast and simple algorithm that achieves the statistically optimal regret with only [Formula: see text] calls to an offline regression oracle across all T rounds. The number of oracle calls can be further reduced to [Formula: see text] if T is known in advance. Our results provide the first universal and optimal reduction from contextual bandits to offline regression, solving an important open problem in the contextual bandit literature. A direct consequence of our results is that any advances in offline regression immediately translate to contextual bandits, statistically and computationally. This leads to faster algorithms and improved regret guarantees for broader classes of contextual bandit problems.


Author(s):  
Gábor Lugosi ◽  
Abbas Mehrabian

We study multiplayer stochastic multiarmed bandit problems in which the players cannot communicate, and if two or more players pull the same arm, a collision occurs and the involved players receive zero reward. We consider two feedback models: a model in which the players can observe whether a collision has occurred and a more difficult setup in which no collision information is available. We give the first theoretical guarantees for the second model: an algorithm with a logarithmic regret and an algorithm with a square-root regret that does not depend on the gaps between the means. For the first model, we give the first square-root regret bounds that do not depend on the gaps. Building on these ideas, we also give an algorithm for reaching approximate Nash equilibria quickly in stochastic anticoordination games.


2021 ◽  
Author(s):  
Shengyi Wang ◽  
Hongbo Sun ◽  
Kyeong Jin Kim ◽  
Jianlin Guo ◽  
Daniel Nikovski

PLoS ONE ◽  
2021 ◽  
Vol 16 (6) ◽  
pp. e0252122
Author(s):  
Tsutomu Harada

Although it is considered that two heads are better than one, related studies argued that groups rarely outperform their best members. This study examined not only whether two heads are better than one but also whether three heads are better than two or one in the context of two-armed bandit problems where learning plays an instrumental role in achieving high performance. This research revealed that a U-shaped correlation exists between performance and group size. The performance was highest for either individuals or triads, but the lowest for dyads. Moreover, this study estimated learning properties and determined that high inverse temperature (exploitation) accounted for high performance. In particular, it was shown that group effects regarding the inverse temperatures in dyads did not generate higher values to surpass the averages of their two group members. In contrast, triads gave rise to higher values of the inverse temperatures than their averages of their individual group members. These results were consistent with our proposed hypothesis that learning coherence is likely to emerge in individuals and triads, but not in dyads, which in turn leads to higher performance. This hypothesis is based on the classical argument by Simmel stating that while dyads are likely to involve more emotion and generate greater variability, triads are the smallest structure which tends to constrain emotions, reduce individuality, and generate behavioral convergences or uniformity because of the ‘‘two against one” social pressures. As a result, three heads or one head were better than two in our study.


2021 ◽  
Vol 66 (1) ◽  
pp. 476-478
Author(s):  
Paul Reverdy ◽  
Vaibhav Srivastava ◽  
Naomi Ehrich Leonard

Econometrica ◽  
2021 ◽  
Vol 89 (4) ◽  
pp. 1717-1751
Author(s):  
Olivier Gossner ◽  
Jakub Steiner ◽  
Colin Stewart

We study the impact of manipulating the attention of a decision‐maker who learns sequentially about a number of items before making a choice. Under natural assumptions on the decision‐maker's strategy, directing attention toward one item increases its likelihood of being chosen regardless of its value. This result applies when the decision‐maker can reject all items in favor of an outside option with known value; if no outside option is available, the direction of the effect of manipulation depends on the value of the item. A similar result applies to manipulation of choices in bandit problems.


Author(s):  
Jean Walrand

AbstractWe have explored a number of topics motivated by concrete applications. It is time to stitch together these ideas into a complete panorama. In addition, we provide some complements.Section 15.1 discusses the general question of inference: what can one deduce from observations? Section 15.2 explains the important notion of sufficient statistic: what is the relevant data in a set of observations? Section 15.3 presents the theory of Markov chains where the number of states is infinite. Section 15.4 explains the Poisson process. Section 15.5 discusses the boosting algorithm for choosing among experts. What drug should you further research; what noisy channel should one use? These are examples of multi-armed bandit problems. In such problems one faces the trade-off between exploiting known possibilities versus exploring potentially more rewarding but less well understood alternatives. Section 15.6 explains a key results for such multi-armed bandit problems. Information Theory studies the limits of communication systems: how fast can one transmit bits reliably over a noisy channel? How many bits should be transmitted to convey some information? Section 15.7 introduces some key concepts and results of Information Theory. When estimating the likelihood of errors or the reliability of some estimates, one usually has to calculate bounds on the probability that a random variable exceeds a given value. Section 15.8 discusses some useful probability bounds. Section 15.9 explains the main ideas of the theory of martingales and shows how it provides a proof of the law of large numbers.


Sign in / Sign up

Export Citation Format

Share Document