Time horizon and equilibrium selection in tacit coordination games: Experimental results

1998 ◽  
Vol 37 (2) ◽  
pp. 231-248 ◽  
Author(s):  
Siegfried K Berninghaus ◽  
Karl-Martin Ehrhart
Author(s):  
Nicolas Bougie ◽  
Ryutaro Ichise

Deep reinforcement learning (DRL) methods traditionally struggle with tasks where environment rewards are sparse or delayed, which entails that exploration remains one of the key challenges of DRL. Instead of solely relying on extrinsic rewards, many state-of-the-art methods use intrinsic curiosity as exploration signal. While they hold promise of better local exploration, discovering global exploration strategies is beyond the reach of current methods. We propose a novel end-to-end intrinsic reward formulation that introduces high-level exploration in reinforcement learning. Our curiosity signal is driven by a fast reward that deals with local exploration and a slow reward that incentivizes long-time horizon exploration strategies. We formulate curiosity as the error in an agent’s ability to reconstruct the observations given their contexts. Experimental results show that this high-level exploration enables our agents to outperform prior work in several Atari games.


Sensors ◽  
2020 ◽  
Vol 20 (24) ◽  
pp. 7026
Author(s):  
Dor Mizrahi ◽  
Inon Zuckerman ◽  
Ilan Laufer

In recent years collaborative robots have become major market drivers in industry 5.0, which aims to incorporate them alongside humans in a wide array of settings ranging from welding to rehabilitation. Improving human–machine collaboration entails using computational algorithms that will save processing as well as communication cost. In this study we have constructed an agent that can choose when to cooperate using an optimal strategy. The agent was designed to operate in the context of divergent interest tacit coordination games in which communication between the players is not possible and the payoff is not symmetric. The agent’s model was based on a behavioral model that can predict the probability of a player converging on prominent solutions with salient features (e.g., focal points) based on the player’s Social Value Orientation (SVO) and the specific game features. The SVO theory pertains to the preferences of decision makers when allocating joint resources between themselves and another player in the context of behavioral game theory. The agent selected stochastically between one of two possible policies, a greedy or a cooperative policy, based on the probability of a player to converge on a focal point. The distribution of the number of points obtained by the autonomous agent incorporating the SVO in the model was better than the results obtained by the human players who played against each other (i.e., the distribution associated with the agent had a higher mean value). Moreover, the distribution of points gained by the agent was better than any of the separate strategies the agent could choose from, namely, always choosing a greedy or a focal point solution. To the best of our knowledge, this is the first attempt to construct an intelligent agent that maximizes its utility by incorporating the belief system of the player in the context of tacit bargaining. This reward-maximizing strategy selection process based on the SVO can also be potentially applied in other human–machine contexts, including multiagent systems.


2011 ◽  
Vol 101 (6) ◽  
pp. 2562-2589 ◽  
Author(s):  
Roy Chen ◽  
Yan Chen

When does a common group identity improve efficiency in coordination games? To answer this question, we propose a group-contingent social preference model and derive conditions under which social identity changes equilibrium selection. We test our predictions in the minimum-effort game in the laboratory under parameter configurations which lead to an inefficient low-effort equilibrium for subjects with no group identity. For those with a salient group identity, consistent with our theory, we find that learning leads to ingroup coordination to the efficient high-effort equilibrium. Additionally, our theoretical framework reconciles findings from a number of coordination game experiments. (JEL C71, C91, D71)


2021 ◽  
Author(s):  
Tomasz Raducha ◽  
Maxi San Miguel

Abstract We study the role of local effects and finite size effects in reaching coordination and in equilibrium selection in two-player coordination games. We investigate three update rules – the replicator dynamics (RD), the best response (BR), and the unconditional imitation (UI). For the pure coordination game with two equivalent strategies we find a transition from a disordered state to coordination for a critical value of connectivity. The transition is system-size-independent for the BR and RD update rules. For the IU it is system-size-dependent, but coordination can always be reached below the connectivity of a complete graph. We also consider the general coordination game which covers a range of games, such as the stag hunt. For these games there is a payoff-dominant strategy and a risk-dominant strategy with associated states of equilibrium coordination. We analyse equilibrium selection analytically and numerically. For the RD and BR update rules mean-field predictions agree with simulations and the risk-dominant strategy is evolutionary favoured independently of local effects. When players use the unconditional imitation, however, we observe coordination in the payoff-dominant strategy. Surprisingly, the selection of pay-off dominant equilibrium only occurs below a critical value of the network connectivity and disappears in complete graphs. As we show, it is a combination of local effects and update rule that allows for coordination on the payoff-dominant strategy.


Sign in / Sign up

Export Citation Format

Share Document