A Monte Carlo Neural Fictitious Self-Play approach to approximate Nash Equilibrium in imperfect-information dynamic games

Successful algorithms have been developed for computing Nash equilibrium in a variety of finite game classes. However, solving continuous games—in which the pure strategy space is (potentially uncountably) infinite—is far more challenging. Nonetheless, many real-world domains have continuous action spaces, e.g., where actions refer to an amount of time, money, or other resource that is naturally modeled as being real-valued as opposed to integral. We present a new algorithm for approximating Nash equilibrium strategies in continuous games. In addition to two-player zero-sum games, our algorithm also applies to multiplayer games and games with imperfect information. We experiment with our algorithm on a continuous imperfect-information Blotto game, in which two players distribute resources over multiple battlefields. Blotto games have frequently been used to model national security scenarios and have also been applied to electoral competition and auction theory. Experiments show that our algorithm is able to quickly compute close approximations of Nash equilibrium strategies for this game.

Download Full-text

Deep Reinforcement Learning from Self-Play in No-limit Texas Hold'em Poker

Studia Universitatis Babeș-Bolyai Informatica ◽

10.24193/subbi.2021.2.04 ◽

2021 ◽

Vol 66 (2) ◽

pp. 51

Author(s):

T.-V. Pricope

Keyword(s):

Nash Equilibrium ◽

Imperfect Information ◽

Large Scale ◽

Adaptive Methods ◽

Single Step ◽

Random Factor ◽

Approximate Nash Equilibrium ◽

New Variant ◽

Imperfect Information Games ◽

Information Games

Imperfect information games describe many practical applications found in the real world as the information space is rarely fully available. This particular set of problems is challenging due to the random factor that makes even adaptive methods fail to correctly model the problem and find the best solution. Neural Fictitious Self Play (NFSP) is a powerful algorithm for learning approximate Nash equilibrium of imperfect information games from self-play. However, it uses only crude data as input and its most successful experiment was on the in-limit version of Texas Hold’em Poker. In this paper, we develop a new variant of NFSP that combines the established fictitious self-play with neural gradient play in an attempt to improve the performance on large-scale zero-sum imperfect information games and to solve the more complex no-limit version of Texas Hold’em Poker using powerful handcrafted metrics and heuristics alongside crude, raw data. When applied to no-limit Hold’em Poker, the agents trained through self-play outperformed the ones that used fictitious play with a normal-form single-step approach to the game. Moreover, we showed that our algorithm converges close to a Nash equilibrium within the limited training process of our agents with very limited hardware. Finally, our best self-play-based agent learnt a strategy that rivals expert human level.

Download Full-text

On Nash equilibrium solutions in stochastic dynamic games

IEEE Transactions on Automatic Control ◽

10.1109/tac.1980.1102543 ◽

1980 ◽

Vol 25 (6) ◽

pp. 1146-1149 ◽

Cited By ~ 20

Author(s):

P. Kumar ◽

J. Van Schuppen

Keyword(s):

Nash Equilibrium ◽

Dynamic Games ◽

Stochastic Dynamic ◽

Equilibrium Solutions ◽

Nash Equilibrium Solutions ◽

Stochastic Dynamic Games

Download Full-text

Nash Equilibrium Analysis Model of Static and Dynamic Games of Grey Pair-Wise Matrices

Grey Game Theory and Its Applications in Economic Decision-Making - Systems Evaluation, Prediction and Decision-Making ◽

10.1201/9781420087406-c9 ◽

2009 ◽

pp. 233-254

Keyword(s):

Nash Equilibrium ◽

Dynamic Games ◽

Equilibrium Analysis ◽

Analysis Model

Download Full-text

Dinamicke igre ulaska na trziste

Economic Annals ◽

10.2298/eka0565121s ◽

2005 ◽

Vol 50 (165) ◽

pp. 121-144

Author(s):

Bozo Stojanovic

Keyword(s):

Nash Equilibrium ◽

Dynamic Games ◽

Nash Equilibria ◽

Dynamic Game ◽

Backward Induction ◽

Perfect Equilibrium ◽

Equilibrium Concept ◽

Reasonable Solution ◽

Multiple Nash Equilibria ◽

Entry Games

Market processes can be analyzed by means of dynamic games. In a number of dynamic games multiple Nash equilibria appear. These equilibria often involve no credible threats the implementation of which is not in the interests of the players making them. The concept of sub game perfect equilibrium rules out these situations by stating that a reasonable solution to a game cannot involve players believing and acting upon noncredible threats or promises. A simple way of finding the sub game perfect Nash equilibrium of a dynamic game is by using the principle of backward induction. To explain how this equilibrium concept is applied, we analyze the dynamic entry games.

Download Full-text

ON THE COMPENSATION OF IMPERFECT INFORMATION IN DYNAMIC GAMES

International Game Theory Review ◽

10.1142/s0219198900000160 ◽

2000 ◽

Vol 02 (02n03) ◽

pp. 229-248 ◽

Cited By ~ 3

Author(s):

JOSEF SHINAR ◽

TAL SHIMA ◽

VALERY Y. GLIZER

Keyword(s):

Optimal Strategy ◽

Information Structure ◽

Dynamic Games ◽

Imperfect Information ◽

Substantial Improvement ◽

Perfect Information ◽

Lateral Acceleration ◽

Position Vector ◽

State Variables ◽

Delayed Information

A linear pursuit-evasion game with first-order acceleration dynamics and bounded controls is considered. In this game, the pursuer has to estimate the state variables of the game, including the lateral acceleration of the evader, based on the noise-corrupted measurements of the relative position vector. The estimation process inherently involves some delay, rendering the information structure of the pursuer imperfect. If the pursuer implements the optimal strategy of the perfect information game, an evader with perfect information can take advantage of the estimation delay. However, the performance degradation is minimised if the pursuer compensates for its own estimation delay by implementing the optimal strategy derived from the solution of the imperfect (delayed) information game. In this paper the analytical solution of the delayed information game, allowing to predict the value of the game, is presented. The theoretical results are tested in a noise-corrupted scenario by Monte Carlo simulations, using a Kalman filter type estimator. The simulation results confirm the substantial improvement achieved by the new pursuer strategy.

Download Full-text

Reinforcement Learning with Monte Carlo Sampling in Imperfect Information Problems

Lecture Notes in Computer Science - Cognitive Computing – ICCC 2018 ◽

10.1007/978-3-319-94307-7_5 ◽

2018 ◽

pp. 55-67 ◽

Cited By ~ 2

Author(s):

Jiajia Zhang ◽

Hong Liu

Keyword(s):

Monte Carlo ◽

Reinforcement Learning ◽

Imperfect Information ◽

Monte Carlo Sampling

Download Full-text

Normalized Overtaking Nash Equilibrium for a Class of Distributed Parameter Dynamic Games

Annals of the International Society of Dynamic Games - Advances in Dynamic Games ◽

10.1007/0-8176-4429-6_8 ◽

2005 ◽

pp. 163-182 ◽

Cited By ~ 2

Author(s):

Dean A. Carlson

Keyword(s):

Nash Equilibrium ◽

Dynamic Games ◽

Distributed Parameter

Download Full-text

Bounds and dynamics for empirical game theoretic analysis

Autonomous Agents and Multi-Agent Systems ◽

10.1007/s10458-019-09432-y ◽

2019 ◽

Vol 34 (1) ◽

Cited By ~ 2

Author(s):

Karl Tuyls ◽

Julien Perolat ◽

Marc Lanctot ◽

Edward Hughes ◽

Richard Everett ◽

...

Keyword(s):

Nash Equilibrium ◽

Evolutionary Dynamics ◽

Learning Algorithm ◽

Agent Interactions ◽

Approximate Nash Equilibrium ◽

Blotto Game ◽

Multi Agent ◽

Payoff Structure ◽

Game Theoretic ◽

Colonel Blotto

AbstractThis paper provides several theoretical results for empirical game theory. Specifically, we introduce bounds for empirical game theoretical analysis of complex multi-agent interactions. In doing so we provide insights in the empirical meta game showing that a Nash equilibrium of the estimated meta-game is an approximate Nash equilibrium of the true underlying meta-game. We investigate and show how many data samples are required to obtain a close enough approximation of the underlying game. Additionally, we extend the evolutionary dynamics analysis of meta-games using heuristic payoff tables (HPTs) to asymmetric games. The state-of-the-art has only considered evolutionary dynamics of symmetric HPTs in which agents have access to the same strategy sets and the payoff structure is symmetric, implying that agents are interchangeable. Finally, we carry out an empirical illustration of the generalised method in several domains, illustrating the theory and evolutionary dynamics of several versions of the AlphaGo algorithm (symmetric), the dynamics of the Colonel Blotto game played by human players on Facebook (symmetric), the dynamics of several teams of players in the capture the flag game (symmetric), and an example of a meta-game in Leduc Poker (asymmetric), generated by the policy-space response oracle multi-agent learning algorithm.

Download Full-text