Off-policy Q-learning: Solving Nash equilibrium of multi-player games with network-induced delay and unmeasured state

Automatica ◽  
2022 ◽  
Vol 136 ◽  
pp. 110076
Author(s):  
Jinna Li ◽  
Zhenfei Xiao ◽  
Jialu Fan ◽  
Tianyou Chai ◽  
Frank L. Lewis
Keyword(s):  
Author(s):  
Yue Guan ◽  
Qifan Zhang ◽  
Panagiotis Tsiotras

We explore the use of policy approximations to reduce the computational cost of learning Nash equilibria in zero-sum stochastic games. We propose a new Q-learning type algorithm that uses a sequence of entropy-regularized soft policies to approximate the Nash policy during the Q-function updates. We prove that under certain conditions, by updating the entropy regularization, the algorithm converges to a Nash equilibrium. We also demonstrate the proposed algorithm's ability to transfer previous training experiences, enabling the agents to adapt quickly to new environments. We provide a dynamic hyper-parameter scheduling scheme to further expedite convergence. Empirical results applied to a number of stochastic games verify that the proposed algorithm converges to the Nash equilibrium, while exhibiting a major speed-up over existing algorithms.


Author(s):  
Johann Bauer ◽  
Mark Broom ◽  
Eduardo Alonso

The multi-population replicator dynamics is a dynamic approach to coevolving populations and multi-player games and is related to Cross learning. In general, not every equilibrium is a Nash equilibrium of the underlying game, and the convergence is not guaranteed. In particular, no interior equilibrium can be asymptotically stable in the multi-population replicator dynamics, e.g. resulting in cyclic orbits around a single interior Nash equilibrium. We introduce a new notion of equilibria of replicator dynamics, called mutation limits, based on a naturally arising, simple form of mutation, which is invariant under the specific choice of mutation parameters. We prove the existence of mutation limits for a large class of games, and consider a particularly interesting subclass called attracting mutation limits. Attracting mutation limits are approximated in every (mutation-)perturbed replicator dynamics, hence they offer an approximate dynamic solution to the underlying game even if the original dynamic is not convergent. Thus, mutation stabilizes the system in certain cases and makes attracting mutation limits near attainable. Hence, attracting mutation limits are relevant as a dynamic solution concept of games. We observe that they have some similarity to Q-learning in multi-agent reinforcement learning. Attracting mutation limits do not exist in all games, however, raising the question of their characterization.


2021 ◽  
Author(s):  
Cecilia Lindig-Leon ◽  
Gerrit Schmid ◽  
Daniel A. Braun

The Nash equilibrium concept has previously been shown to be an important tool to understand human sensorimotor interactions, where different actors vie for minimizing their respective effort while engaging in a multi-agent motor task. However, it is not clear how such equilibria are reached. Here, we compare different reinforcement learning models based on haptic feedback to human behavior in sensorimotor versions of three classic games, including the Prisoner's Dilemma, and the symmetric and asymmetric matching pennies games. We find that a discrete analysis that reduces the continuous sensorimotor interaction to binary choices as in classical matrix games does not allow to distinguish between the different learning algorithms, but that a more detailed continuous analysis with continuous formulations of the learning algorithms and the game-theoretic solutions affords different predictions. In particular, we find that Q-learning with intrinsic costs that disfavor deviations from average behavior explains the observed data best, even though all learning algorithms equally converge to admissible Nash equilibrium solutions. We therefore conclude that it is important to study different learning algorithms for understanding sensorimotor interactions, as such behavior cannot be inferred from a game-theoretic analysis alone, that simply focuses on the Nash equilibrium concept, as different learning algorithms impose preferences on the set of possible equilibrium solutions due to the inherent learning dynamics.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Cecilia Lindig-León ◽  
Gerrit Schmid ◽  
Daniel A. Braun

AbstractThe Nash equilibrium concept has previously been shown to be an important tool to understand human sensorimotor interactions, where different actors vie for minimizing their respective effort while engaging in a multi-agent motor task. However, it is not clear how such equilibria are reached. Here, we compare different reinforcement learning models to human behavior engaged in sensorimotor interactions with haptic feedback based on three classic games, including the prisoner’s dilemma, and the symmetric and asymmetric matching pennies games. We find that a discrete analysis that reduces the continuous sensorimotor interaction to binary choices as in classical matrix games does not allow to distinguish between the different learning algorithms, but that a more detailed continuous analysis with continuous formulations of the learning algorithms and the game-theoretic solutions affords different predictions. In particular, we find that Q-learning with intrinsic costs that disfavor deviations from average behavior explains the observed data best, even though all learning algorithms equally converge to admissible Nash equilibrium solutions. We therefore conclude that it is important to study different learning algorithms for understanding sensorimotor interactions, as such behavior cannot be inferred from a game-theoretic analysis alone, that simply focuses on the Nash equilibrium concept, as different learning algorithms impose preferences on the set of possible equilibrium solutions due to the inherent learning dynamics.


2011 ◽  
pp. 65-87 ◽  
Author(s):  
A. Rubinstein

The article considers some aspects of the patronized goods theory with respect to efficient and inefficient equilibria. The author analyzes specific features of patronized goods as well as their connection with market failures, and conjectures that they are related to the emergence of Pareto-inefficient Nash equilibria. The key problem is the analysis of the opportunities for transforming inefficient Nash equilibrium into Pareto-optimal Nash equilibrium for patronized goods by modifying the institutional environment. The paper analyzes social motivation for institutional modernization and equilibrium conditions in the generalized Wicksell-Lindahl model for patronized goods. The author also considers some applications of patronized goods theory to social policy issues.


Sign in / Sign up

Export Citation Format

Share Document