scholarly journals Nash equilibria in human sensorimotor interactions explained by Q-Learning

2021 ◽  
Author(s):  
Cecilia Lindig-Leon ◽  
Gerrit Schmid ◽  
Daniel A. Braun

The Nash equilibrium concept has previously been shown to be an important tool to understand human sensorimotor interactions, where different actors vie for minimizing their respective effort while engaging in a multi-agent motor task. However, it is not clear how such equilibria are reached. Here, we compare different reinforcement learning models based on haptic feedback to human behavior in sensorimotor versions of three classic games, including the Prisoner's Dilemma, and the symmetric and asymmetric matching pennies games. We find that a discrete analysis that reduces the continuous sensorimotor interaction to binary choices as in classical matrix games does not allow to distinguish between the different learning algorithms, but that a more detailed continuous analysis with continuous formulations of the learning algorithms and the game-theoretic solutions affords different predictions. In particular, we find that Q-learning with intrinsic costs that disfavor deviations from average behavior explains the observed data best, even though all learning algorithms equally converge to admissible Nash equilibrium solutions. We therefore conclude that it is important to study different learning algorithms for understanding sensorimotor interactions, as such behavior cannot be inferred from a game-theoretic analysis alone, that simply focuses on the Nash equilibrium concept, as different learning algorithms impose preferences on the set of possible equilibrium solutions due to the inherent learning dynamics.

2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Cecilia Lindig-León ◽  
Gerrit Schmid ◽  
Daniel A. Braun

AbstractThe Nash equilibrium concept has previously been shown to be an important tool to understand human sensorimotor interactions, where different actors vie for minimizing their respective effort while engaging in a multi-agent motor task. However, it is not clear how such equilibria are reached. Here, we compare different reinforcement learning models to human behavior engaged in sensorimotor interactions with haptic feedback based on three classic games, including the prisoner’s dilemma, and the symmetric and asymmetric matching pennies games. We find that a discrete analysis that reduces the continuous sensorimotor interaction to binary choices as in classical matrix games does not allow to distinguish between the different learning algorithms, but that a more detailed continuous analysis with continuous formulations of the learning algorithms and the game-theoretic solutions affords different predictions. In particular, we find that Q-learning with intrinsic costs that disfavor deviations from average behavior explains the observed data best, even though all learning algorithms equally converge to admissible Nash equilibrium solutions. We therefore conclude that it is important to study different learning algorithms for understanding sensorimotor interactions, as such behavior cannot be inferred from a game-theoretic analysis alone, that simply focuses on the Nash equilibrium concept, as different learning algorithms impose preferences on the set of possible equilibrium solutions due to the inherent learning dynamics.


2012 ◽  
Vol 27 (1) ◽  
pp. 1-31 ◽  
Author(s):  
Laetitia Matignon ◽  
Guillaume J. Laurent ◽  
Nadine Le Fort-Piat

AbstractIn the framework of fully cooperative multi-agent systems, independent (non-communicative) agents that learn by reinforcement must overcome several difficulties to manage to coordinate. This paper identifies several challenges responsible for the non-coordination of independent agents: Pareto-selection, non-stationarity, stochasticity, alter-exploration and shadowed equilibria. A selection of multi-agent domains is classified according to those challenges: matrix games, Boutilier's coordination game, predators pursuit domains and a special multi-state game. Moreover, the performance of a range of algorithms for independent reinforcement learners is evaluated empirically. Those algorithms are Q-learning variants: decentralized Q-learning, distributed Q-learning, hysteretic Q-learning, recursive frequency maximum Q-value and win-or-learn fast policy hill climbing. An overview of the learning algorithms’ strengths and weaknesses against each challenge concludes the paper and can serve as a basis for choosing the appropriate algorithm for a new domain. Furthermore, the distilled challenges may assist in the design of new learning algorithms that overcome these problems and achieve higher performance in multi-agent applications.


2021 ◽  
Vol 22 (2) ◽  
pp. 1-38
Author(s):  
Julian Gutierrez ◽  
Paul Harrenstein ◽  
Giuseppe Perelli ◽  
Michael Wooldridge

We define and investigate a novel notion of expressiveness for temporal logics that is based on game theoretic equilibria of multi-agent systems. We use iterated Boolean games as our abstract model of multi-agent systems [Gutierrez et al. 2013, 2015a]. In such a game, each agent  has a goal  , represented using (a fragment of) Linear Temporal Logic ( ) . The goal  captures agent  ’s preferences, in the sense that the models of  represent system behaviours that would satisfy  . Each player controls a subset of Boolean variables , and at each round in the game, player is at liberty to choose values for variables in any way that she sees fit. Play continues for an infinite sequence of rounds, and so as players act they collectively trace out a model for , which for every player will either satisfy or fail to satisfy their goal. Players are assumed to act strategically, taking into account the goals of other players, in an attempt to bring about computations satisfying their goal. In this setting, we apply the standard game-theoretic concept of (pure) Nash equilibria. The (possibly empty) set of Nash equilibria of an iterated Boolean game can be understood as inducing a set of computations, each computation representing one way the system could evolve if players chose strategies that together constitute a Nash equilibrium. Such a set of equilibrium computations expresses a temporal property—which may or may not be expressible within a particular fragment. The new notion of expressiveness that we formally define and investigate is then as follows: What temporal properties are characterised by the Nash equilibria of games in which agent goals are expressed in specific fragments of  ? We formally define and investigate this notion of expressiveness for a range of fragments. For example, a very natural question is the following: Suppose we have an iterated Boolean game in which every goal is represented using a particular fragment of : is it then always the case that the equilibria of the game can be characterised within ? We show that this is not true in general.


2021 ◽  
Author(s):  
Michael Richter ◽  
Ariel Rubinstein

Abstract Each member of a group chooses a position and has preferences regarding his chosen position. The group’s harmony depends on the profile of chosen positions meeting a specific condition. We analyse a solution concept (Richter and Rubinstein, 2020) based on a permissible set of individual positions, which plays a role analogous to that of prices in competitive equilibrium. Given the permissible set, members choose their most preferred position. The set is tightened if the chosen positions are inharmonious and relaxed if the restrictions are unnecessary. This new equilibrium concept yields more attractive outcomes than does Nash equilibrium in the corresponding game.


Author(s):  
Shihab Shamma ◽  
Prachi Patel ◽  
Shoutik Mukherjee ◽  
Guilhem Marion ◽  
Bahar Khalighinejad ◽  
...  

Abstract Action and Perception are closely linked in many behaviors necessitating a close coordination between sensory and motor neural processes so as to achieve a well-integrated smoothly evolving task performance. To investigate the detailed nature of these sensorimotor interactions, and their role in learning and executing the skilled motor task of speaking, we analyzed ECoG recordings of responses in the high-γ band (70 Hz-150 Hz) in human subjects while they listened to, spoke, or silently articulated speech. We found elaborate spectrotemporally-modulated neural activity projecting in both forward (motor-to-sensory) and inverse directions between the higher-auditory and motor cortical regions engaged during speaking. Furthermore, mathematical simulations demonstrate a key role for the forward projection in learning to control the vocal tract, beyond its commonly-postulated predictive role during execution. These results therefore offer a broader view of the functional role of the ubiquitous forward projection as an important ingredient in learning, rather than just control, of skilled sensorimotor tasks.


2009 ◽  
Vol 23 (03) ◽  
pp. 477-480 ◽  
Author(s):  
ZHILI TANG

The Taguchi robust design concept is combined with the multi-objective deterministic optimization method to overcome single point design problems in Aerodynamics. Starting from a statistical definition of stability, the method finds, Nash equilibrium solutions for performance and its stability simultaneously.


2005 ◽  
Vol 50 (165) ◽  
pp. 121-144
Author(s):  
Bozo Stojanovic

Market processes can be analyzed by means of dynamic games. In a number of dynamic games multiple Nash equilibria appear. These equilibria often involve no credible threats the implementation of which is not in the interests of the players making them. The concept of sub game perfect equilibrium rules out these situations by stating that a reasonable solution to a game cannot involve players believing and acting upon noncredible threats or promises. A simple way of finding the sub game perfect Nash equilibrium of a dynamic game is by using the principle of backward induction. To explain how this equilibrium concept is applied, we analyze the dynamic entry games.


Sign in / Sign up

Export Citation Format

Share Document