scholarly journals Learning Nash Equilibria in Zero-Sum Stochastic Games via Entropy-Regularized Policy Approximation

Author(s):  
Yue Guan ◽  
Qifan Zhang ◽  
Panagiotis Tsiotras

We explore the use of policy approximations to reduce the computational cost of learning Nash equilibria in zero-sum stochastic games. We propose a new Q-learning type algorithm that uses a sequence of entropy-regularized soft policies to approximate the Nash policy during the Q-function updates. We prove that under certain conditions, by updating the entropy regularization, the algorithm converges to a Nash equilibrium. We also demonstrate the proposed algorithm's ability to transfer previous training experiences, enabling the agents to adapt quickly to new environments. We provide a dynamic hyper-parameter scheduling scheme to further expedite convergence. Empirical results applied to a number of stochastic games verify that the proposed algorithm converges to the Nash equilibrium, while exhibiting a major speed-up over existing algorithms.

2020 ◽  
Vol 40 (1) ◽  
pp. 71-85
Author(s):  
HK Das ◽  
T Saha

This paper proposes a heuristic algorithm for the computation of Nash equilibrium of a bi-matrix game, which extends the idea of a single payoff matrix of two-person zero-sum game problems. As for auxiliary but making the comparison, we also introduce here the well-known definition of Nash equilibrium and a mathematical construction via a set-valued map for finding the Nash equilibrium and illustrates them. An important feature of our algorithm is that it finds a perfect equilibrium when at the start of all actions are played. Furthermore, we can find all Nash equilibria of repeated use of this algorithm. It is found from our illustrative examples and extensive experiment on the current phenomenon that some games have a single Nash equilibrium, some possess no Nash equilibrium, and others had many Nash equilibria. These suggest that our proposed algorithm is capable of solving all types of problems. Finally, we explore the economic behaviour of game theory and its social implications to draw a conclusion stating the privilege of our algorithm. GANIT J. Bangladesh Math. Soc.Vol. 40 (2020) 71-85


2003 ◽  
Vol 05 (04) ◽  
pp. 375-384 ◽  
Author(s):  
GRAZIANO PIERI ◽  
ANNA TORRE

We give a suitable definition of Hadamard well-posedness for Nash equilibria of a game, that is, the stability of Nash equilibrium point with respect to perturbations of payoff functions. Our definition generalizes the analogous notion for minimum problems. For a game with continuous payoff functions, we restrict ourselves to Hadamard well-posedness with respect to uniform convergence and compare this notion with Tykhonov well-posedness of the same game. The main results are: Hadamard implies Tykhonov well-posedness and the converse is true if the payoff functions are bounded. For a zero-sum game the two notions are equivalent.


1987 ◽  
Vol 24 (02) ◽  
pp. 386-401 ◽  
Author(s):  
John W. Mamer

We consider the extension of optimal stopping problems to non-zero-sum strategic settings called stopping games. By imposing a monotone structure on the pay-offs of the game we establish the existence of a Nash equilibrium in non-randomized stopping times. As a corollary, we identify a class of games for which there are Nash equilibria in myopic stopping times. These games satisfy the strategic equivalent of the classical ‘monotone case' assumptions of the optimal stopping problem.


2021 ◽  
Vol 14 ◽  
pp. 290-301
Author(s):  
Dmitrii Lozovanu ◽  
◽  
Stefan Pickl ◽  

In this paper we consider the problem of the existence and determining stationary Nash equilibria for switching controller stochastic games with discounted and average payoffs. The set of states and the set of actions in the considered games are assumed to be finite. For a switching controller stochastic game with discounted payoffs we show that all stationary equilibria can be found by using an auxiliary continuous noncooperative static game in normal form in which the payoffs are quasi-monotonic (quasi-convex and quasi-concave) with respect to the corresponding strategies of the players. Based on this we propose an approach for determining the optimal stationary strategies of the players. In the case of average payoffs for a switching controller stochastic game we also formulate an auxiliary noncooperative static game in normal form with quasi-monotonic payoffs and show that such a game possesses a Nash equilibrium if the corresponding switching controller stochastic game has a stationary Nash equilibrium.


2021 ◽  
pp. 232102222110243
Author(s):  
M. Punniyamoorthy ◽  
Sarin Abraham ◽  
Jose Joy Thoppan

A non-zero sum bimatrix game may yield numerous Nash equilibrium solutions while solving the game. The selection of a good Nash equilibrium from among the many options poses a dilemma. In this article, three methods have been proposed to select a good Nash equilibrium. The first approach identifies the most payoff-dominant Nash equilibrium, while the second method selects the most risk-dominant Nash equilibrium. The third method combines risk dominance and payoff dominance by giving due weights to the two criteria. A sensitivity analysis is performed by changing the relative weights of criteria to check its effect on the ranks of the multiple Nash equilibria, infusing more confidence in deciding the best Nash equilibrium. JEL Codes: C7, C72, D81


1987 ◽  
Vol 24 (2) ◽  
pp. 386-401 ◽  
Author(s):  
John W. Mamer

We consider the extension of optimal stopping problems to non-zero-sum strategic settings called stopping games. By imposing a monotone structure on the pay-offs of the game we establish the existence of a Nash equilibrium in non-randomized stopping times. As a corollary, we identify a class of games for which there are Nash equilibria in myopic stopping times. These games satisfy the strategic equivalent of the classical ‘monotone case' assumptions of the optimal stopping problem.


2021 ◽  
Vol 0 (0) ◽  
pp. 0
Author(s):  
Athanasios Kehagias

<p style='text-indent:20px;'>In this short note we study a class of multi-player, turn-based games with deterministic state transitions and reachability / safety objectives (this class contains as special cases "classic" two-player reachability and safety games as well as multi-player and ""stay–in-a-set" and "reach-a-set" games). Quantitative and qualitative versions of the objectives are presented and for both cases we prove the existence of a deterministic and memoryless Nash equilibrium; the proof is short and simple, using only Fink's classic result about the existence of Nash equilibria for <i>multi-player discounted stochastic games</i></p>


Author(s):  
Jordi Grau-Moya ◽  
Felix Leibfried ◽  
Haitham Bou-Ammar

Within the context of video games the notion of perfectly rational agents can be undesirable as it leads to uninteresting situations, where humans face tough adversarial decision makers. Current frameworks for stochastic games and reinforcement learning prohibit tuneable strategies as they seek optimal performance. In this paper, we enable such tuneable behaviour by generalising soft Q-learning to stochastic games, where more than one agent interact strategically. We contribute both theoretically and empirically. On the theory side, we show that games with soft Q-learning exhibit a unique value and generalise team games and zero-sum games far beyond these two extremes to cover a continuous spectrum of gaming behaviour. Experimentally, we show how tuning agents' constraints affect performance and demonstrate, through a neural network architecture, how to reliably balance games with high-dimensional representations.


2012 ◽  
Vol 2 (2) ◽  
Author(s):  
Urszula Boryczka ◽  
Przemyslaw Juszczuk

AbstractIn this paper, we present the application of the Differential Evolution (DE) algorithm to the problem of finding approximate Nash equilibria in matrix, non-zero sum games for two players with finite number of strategies. Nash equilibrium is one of the main concepts in game theory. It may be classified as continuous problem, where two probability distributions over the set of strategies of both players should be found. Every deviation from the global optimum is interpreted as Nash approximation and called ε-Nash equilibrium. The main advantage of the proposed algorithm is self-adaptive mutation operator, which direct the search process. The approach used in this article is based on the probability of chosing single pure strategy. In optimal mixed strategy, every strategy has some probability of being chosen. Our goal is to determine this probability and maximize payoff for a single player.


Sign in / Sign up

Export Citation Format

Share Document