A Monte Carlo Neural Fictitious Self-Play approach to approximate Nash Equilibrium in imperfect-information dynamic games

2021 ◽  
Vol 15 (5) ◽  
Author(s):  
Li Zhang ◽  
Yuxuan Chen ◽  
Wei Wang ◽  
Ziliang Han ◽  
Shijian Li ◽  
...  
Games ◽  
2021 ◽  
Vol 12 (2) ◽  
pp. 47
Author(s):  
Sam Ganzfried

Successful algorithms have been developed for computing Nash equilibrium in a variety of finite game classes. However, solving continuous games—in which the pure strategy space is (potentially uncountably) infinite—is far more challenging. Nonetheless, many real-world domains have continuous action spaces, e.g., where actions refer to an amount of time, money, or other resource that is naturally modeled as being real-valued as opposed to integral. We present a new algorithm for approximating Nash equilibrium strategies in continuous games. In addition to two-player zero-sum games, our algorithm also applies to multiplayer games and games with imperfect information. We experiment with our algorithm on a continuous imperfect-information Blotto game, in which two players distribute resources over multiple battlefields. Blotto games have frequently been used to model national security scenarios and have also been applied to electoral competition and auction theory. Experiments show that our algorithm is able to quickly compute close approximations of Nash equilibrium strategies for this game.


2021 ◽  
Vol 66 (2) ◽  
pp. 51
Author(s):  
T.-V. Pricope

Imperfect information games describe many practical applications found in the real world as the information space is rarely fully available. This particular set of problems is challenging due to the random factor that makes even adaptive methods fail to correctly model the problem and find the best solution. Neural Fictitious Self Play (NFSP) is a powerful algorithm for learning approximate Nash equilibrium of imperfect information games from self-play. However, it uses only crude data as input and its most successful experiment was on the in-limit version of Texas Hold’em Poker. In this paper, we develop a new variant of NFSP that combines the established fictitious self-play with neural gradient play in an attempt to improve the performance on large-scale zero-sum imperfect information games and to solve the more complex no-limit version of Texas Hold’em Poker using powerful handcrafted metrics and heuristics alongside crude, raw data. When applied to no-limit Hold’em Poker, the agents trained through self-play outperformed the ones that used fictitious play with a normal-form single-step approach to the game. Moreover, we showed that our algorithm converges close to a Nash equilibrium within the limited training process of our agents with very limited hardware. Finally, our best self-play-based agent learnt a strategy that rivals expert human level.  


2005 ◽  
Vol 50 (165) ◽  
pp. 121-144
Author(s):  
Bozo Stojanovic

Market processes can be analyzed by means of dynamic games. In a number of dynamic games multiple Nash equilibria appear. These equilibria often involve no credible threats the implementation of which is not in the interests of the players making them. The concept of sub game perfect equilibrium rules out these situations by stating that a reasonable solution to a game cannot involve players believing and acting upon noncredible threats or promises. A simple way of finding the sub game perfect Nash equilibrium of a dynamic game is by using the principle of backward induction. To explain how this equilibrium concept is applied, we analyze the dynamic entry games.


2000 ◽  
Vol 02 (02n03) ◽  
pp. 229-248 ◽  
Author(s):  
JOSEF SHINAR ◽  
TAL SHIMA ◽  
VALERY Y. GLIZER

A linear pursuit-evasion game with first-order acceleration dynamics and bounded controls is considered. In this game, the pursuer has to estimate the state variables of the game, including the lateral acceleration of the evader, based on the noise-corrupted measurements of the relative position vector. The estimation process inherently involves some delay, rendering the information structure of the pursuer imperfect. If the pursuer implements the optimal strategy of the perfect information game, an evader with perfect information can take advantage of the estimation delay. However, the performance degradation is minimised if the pursuer compensates for its own estimation delay by implementing the optimal strategy derived from the solution of the imperfect (delayed) information game. In this paper the analytical solution of the delayed information game, allowing to predict the value of the game, is presented. The theoretical results are tested in a noise-corrupted scenario by Monte Carlo simulations, using a Kalman filter type estimator. The simulation results confirm the substantial improvement achieved by the new pursuer strategy.


Author(s):  
Karl Tuyls ◽  
Julien Perolat ◽  
Marc Lanctot ◽  
Edward Hughes ◽  
Richard Everett ◽  
...  

AbstractThis paper provides several theoretical results for empirical game theory. Specifically, we introduce bounds for empirical game theoretical analysis of complex multi-agent interactions. In doing so we provide insights in the empirical meta game showing that a Nash equilibrium of the estimated meta-game is an approximate Nash equilibrium of the true underlying meta-game. We investigate and show how many data samples are required to obtain a close enough approximation of the underlying game. Additionally, we extend the evolutionary dynamics analysis of meta-games using heuristic payoff tables (HPTs) to asymmetric games. The state-of-the-art has only considered evolutionary dynamics of symmetric HPTs in which agents have access to the same strategy sets and the payoff structure is symmetric, implying that agents are interchangeable. Finally, we carry out an empirical illustration of the generalised method in several domains, illustrating the theory and evolutionary dynamics of several versions of the AlphaGo algorithm (symmetric), the dynamics of the Colonel Blotto game played by human players on Facebook (symmetric), the dynamics of several teams of players in the capture the flag game (symmetric), and an example of a meta-game in Leduc Poker (asymmetric), generated by the policy-space response oracle multi-agent learning algorithm.


Sign in / Sign up

Export Citation Format

Share Document