scholarly journals Multiplayer Bandits Without Observing Collision Information

Author(s):  
Gábor Lugosi ◽  
Abbas Mehrabian

We study multiplayer stochastic multiarmed bandit problems in which the players cannot communicate, and if two or more players pull the same arm, a collision occurs and the involved players receive zero reward. We consider two feedback models: a model in which the players can observe whether a collision has occurred and a more difficult setup in which no collision information is available. We give the first theoretical guarantees for the second model: an algorithm with a logarithmic regret and an algorithm with a square-root regret that does not depend on the gaps between the means. For the first model, we give the first square-root regret bounds that do not depend on the gaps. Building on these ideas, we also give an algorithm for reaching approximate Nash equilibria quickly in stochastic anticoordination games.

2008 ◽  
Vol 5 (4) ◽  
pp. 365-382 ◽  
Author(s):  
Haralampos Tsaknakis ◽  
Paul G. Spirakis

Author(s):  
Amir Ali Ahmadi ◽  
Jeffrey Zhang

We explore the power of semidefinite programming (SDP) for finding additive ɛ-approximate Nash equilibria in bimatrix games. We introduce an SDP relaxation for a quadratic programming formulation of the Nash equilibrium problem and provide a number of valid inequalities to improve the quality of the relaxation. If a rank-1 solution to this SDP is found, then an exact Nash equilibrium can be recovered. We show that, for a strictly competitive game, our SDP is guaranteed to return a rank-1 solution. We propose two algorithms based on the iterative linearization of smooth nonconvex objective functions whose global minima by design coincide with rank-1 solutions. Empirically, we demonstrate that these algorithms often recover solutions of rank at most 2 and ɛ close to zero. Furthermore, we prove that if a rank-2 solution to our SDP is found, then a [Formula: see text]-Nash equilibrium can be recovered for any game, or a [Formula: see text]-Nash equilibrium for a symmetric game. We then show how our SDP approach can address two (NP-hard) problems of economic interest: finding the maximum welfare achievable under any Nash equilibrium, and testing whether there exists a Nash equilibrium where a particular set of strategies is not played. Finally, we show the connection between our SDP and the first level of the Lasserre/sum of squares hierarchy.


Author(s):  
Constantinos Daskalakis ◽  
Aranyak Mehta ◽  
Christos Papadimitriou

2021 ◽  
Vol 66 (1) ◽  
pp. 476-478
Author(s):  
Paul Reverdy ◽  
Vaibhav Srivastava ◽  
Naomi Ehrich Leonard

2013 ◽  
Vol 9 (4) ◽  
pp. 384-405 ◽  
Author(s):  
Susanne Albers ◽  
Pascal Lenzner

2008 ◽  
Vol 40 (02) ◽  
pp. 377-400 ◽  
Author(s):  
Savas Dayanik ◽  
Warren Powell ◽  
Kazutoshi Yamazaki

A multiarmed bandit problem is studied when the arms are not always available. The arms are first assumed to be intermittently available with some state/action-dependent probabilities. It is proven that no index policy can attain the maximum expected total discounted reward in every instance of that problem. The Whittle index policy is derived, and its properties are studied. Then it is assumed that the arms may break down, but repair is an option at some cost, and the new Whittle index policy is derived. Both problems are indexable. The proposed index policies cannot be dominated by any other index policy over all multiarmed bandit problems considered here. Whittle indices are evaluated for Bernoulli arms with unknown success probabilities.


Sign in / Sign up

Export Citation Format

Share Document