Multiplayer Bandits Without Observing Collision Information

We study multiplayer stochastic multiarmed bandit problems in which the players cannot communicate, and if two or more players pull the same arm, a collision occurs and the involved players receive zero reward. We consider two feedback models: a model in which the players can observe whether a collision has occurred and a more difficult setup in which no collision information is available. We give the first theoretical guarantees for the second model: an algorithm with a logarithmic regret and an algorithm with a square-root regret that does not depend on the gaps between the means. For the first model, we give the first square-root regret bounds that do not depend on the gaps. Building on these ideas, we also give an algorithm for reaching approximate Nash equilibria quickly in stochastic anticoordination games.

An Optimization Approach for Approximate Nash Equilibria

Internet Mathematics ◽

10.1080/15427951.2008.10129172 ◽

2008 ◽

Vol 5 (4) ◽

pp. 365-382 ◽

Cited By ~ 31

Author(s):

Haralampos Tsaknakis ◽

Paul G. Spirakis

Keyword(s):

Nash Equilibria ◽

Optimization Approach ◽

Semidefinite Programming and Nash Equilibria in Bimatrix Games

INFORMS Journal on Computing ◽

10.1287/ijoc.2020.0960 ◽

2020 ◽

Author(s):

Amir Ali Ahmadi ◽

Jeffrey Zhang

Keyword(s):

Nash Equilibrium ◽

Semidefinite Programming ◽

Nash Equilibria ◽

Valid Inequalities ◽

Bimatrix Games ◽

Competitive Game ◽

Approximate Nash Equilibria ◽

Hard Problems ◽

Rank 2

We explore the power of semidefinite programming (SDP) for finding additive ɛ-approximate Nash equilibria in bimatrix games. We introduce an SDP relaxation for a quadratic programming formulation of the Nash equilibrium problem and provide a number of valid inequalities to improve the quality of the relaxation. If a rank-1 solution to this SDP is found, then an exact Nash equilibrium can be recovered. We show that, for a strictly competitive game, our SDP is guaranteed to return a rank-1 solution. We propose two algorithms based on the iterative linearization of smooth nonconvex objective functions whose global minima by design coincide with rank-1 solutions. Empirically, we demonstrate that these algorithms often recover solutions of rank at most 2 and ɛ close to zero. Furthermore, we prove that if a rank-2 solution to our SDP is found, then a [Formula: see text]-Nash equilibrium can be recovered for any game, or a [Formula: see text]-Nash equilibrium for a symmetric game. We then show how our SDP approach can address two (NP-hard) problems of economic interest: finding the maximum welfare achievable under any Nash equilibrium, and testing whether there exists a Nash equilibrium where a particular set of strategies is not played. Finally, we show the connection between our SDP and the first level of the Lasserre/sum of squares hierarchy.

Approximate Nash Equilibria for Multi-player Games

Algorithmic Game Theory - Lecture Notes in Computer Science ◽

10.1007/978-3-540-79309-0_24 ◽

2008 ◽

pp. 267-278 ◽

Cited By ~ 13

Author(s):

Sébastien Hémon ◽

Michel de Rougemont ◽

Miklos Santha

Keyword(s):

Nash Equilibria ◽

A Note on Approximate Nash Equilibria

Lecture Notes in Computer Science - Internet and Network Economics ◽

10.1007/11944874_27 ◽

2006 ◽

pp. 297-306 ◽

Cited By ~ 46

Author(s):

Constantinos Daskalakis ◽

Aranyak Mehta ◽

Christos Papadimitriou

Keyword(s):

Nash Equilibria ◽

Corrections to “Satisficing in Multiarmed Bandit Problems”

IEEE Transactions on Automatic Control ◽

10.1109/tac.2020.2981433 ◽

2021 ◽

Vol 66 (1) ◽

pp. 476-478

Author(s):

Paul Reverdy ◽

Vaibhav Srivastava ◽

Naomi Ehrich Leonard

Keyword(s):

Bandit Problems ◽

New Algorithms for Approximate Nash Equilibria in Bimatrix Games

Lecture Notes in Computer Science - Internet and Network Economics ◽

10.1007/978-3-540-77105-0_6 ◽

2007 ◽

pp. 17-29 ◽

Cited By ~ 23

Author(s):

Hartwig Bosse ◽

Jaroslaw Byrka ◽

Evangelos Markakis

Keyword(s):

Nash Equilibria ◽

Bimatrix Games ◽

Approximate Nash Equilibria ◽

New Algorithms

On Approximate Nash Equilibria in Network Design

Internet Mathematics ◽

10.1080/15427951.2012.754800 ◽

2013 ◽

Vol 9 (4) ◽

pp. 384-405 ◽

Cited By ~ 4

Author(s):

Susanne Albers ◽

Pascal Lenzner

Keyword(s):

Network Design ◽

Nash Equilibria ◽

Advances in Smart Vehicular Technology, Transportation, Communication and Applications - Smart Innovation, Systems and Technologies ◽

Adversarial Multiarmed Bandit Problems in Gradually Evolving Worlds

10.1007/978-3-319-70730-3_36 ◽

2017 ◽

pp. 305-311

Author(s):

Chia-Jung Lee ◽

Yalei Yang ◽

Sheng-Hui Meng ◽

Tien-Wen Sung

Keyword(s):

Bandit Problems ◽

Query complexity of approximate nash equilibria

Proceedings of the 46th Annual ACM Symposium on Theory of Computing - STOC '14 ◽

10.1145/2591796.2591829 ◽

2014 ◽

Cited By ~ 13

Author(s):

Yakov Babichenko

Keyword(s):

Nash Equilibria ◽

Query Complexity ◽

Index policies for discounted bandit problems with availability constraints

Advances in Applied Probability ◽

10.1017/s0001867800002573 ◽

2008 ◽

Vol 40 (02) ◽

pp. 377-400 ◽

Cited By ~ 1

Author(s):

Savas Dayanik ◽

Warren Powell ◽

Kazutoshi Yamazaki

Keyword(s):

Bandit Problem ◽

Bandit Problems ◽

Index Policy ◽

State Action ◽

Index Policies ◽

Availability Constraints ◽

Whittle Index ◽

A multiarmed bandit problem is studied when the arms are not always available. The arms are first assumed to be intermittently available with some state/action-dependent probabilities. It is proven that no index policy can attain the maximum expected total discounted reward in every instance of that problem. The Whittle index policy is derived, and its properties are studied. Then it is assumed that the arms may break down, but repair is an option at some cost, and the new Whittle index policy is derived. Both problems are indexable. The proposed index policies cannot be dominated by any other index policy over all multiarmed bandit problems considered here. Whittle indices are evaluated for Bernoulli arms with unknown success probabilities.