scholarly journals MFVFD: A Multi-Agent Q-Learning Approach to Cooperative and Non-Cooperative Tasks

Author(s):  
Tianhao Zhang ◽  
Qiwei Ye ◽  
Jiang Bian ◽  
Guangming Xie ◽  
Tie-Yan Liu

Value function decomposition (VFD) methods under the popular paradigm of centralized training and decentralized execution (CTDE) have promoted multi-agent reinforcement learning progress. However, existing VFD methods proceed from a group's value function decomposition to only solve cooperative tasks. With the individual value function decomposition, we propose MFVFD, a novel multi-agent Q-learning approach for solving cooperative and non-cooperative tasks based on mean-field theory. Our analysis on the Hawk-Dove and Nonmonotonic Cooperation matrix games evaluate MFVFD's convergent solution. Empirical studies on the challenging mixed cooperative-competitive tasks where hundreds of agents coexist demonstrate that MFVFD significantly outperforms existing baselines.

2014 ◽  
Vol 2014 ◽  
pp. 1-12 ◽  
Author(s):  
Yaofeng Zhang ◽  
Renbin Xiao

With the strengthening of the social contradiction, the outbreak of vent collective behavior tends to be frequent. The essence of vent collective behavior is emergence of synchronization. In order to explore the threshold of consensus synchronization in vent collective behavior, a mathematic model and a corresponding simulation model based on multi-agent are proposed. The results of analysis by mean field theory and simulation experiments show the following. (1) There is a thresholdKcfor consensus synchronization in global-coupling and homogeneous group, and when the system parameterKis greater thanKc, consensus synchronization emerge. Otherwise the system cannot achieve synchronization. The conclusion is verified by further study of multiagent simulation. (2) Compared with the global-coupling situation, the process of synchronization is delayed in local-coupling and homogeneous group. (3) For local-coupling and heterogeneous group, consensus dissemination can achieve synchronization only when the effects of the parameters meet the threshold requirements of consensus synchronization.


2020 ◽  
Vol 8 (6) ◽  
pp. 5251-5255

Exploiting the efficiency and stability of Dynamic Crowd, the paper proposes a hybrid crowd simulation algorithm that runs using multi agents and it mainly focuses on identifying the crowd to simulate. An efficient measurement for both static and dynamic crowd simulation is applied in tracking and transportation applications. The proposed Hybrid Agent Reinforcement Learning (HARL) algorithm combines the Q-Learning off-policy value function and SARSA algorithm on-policy value function, which is used for dynamic crowd evacuation scenario. The HARL algorithm performs multiple value functions and combines the policy value function derived from the multi agent to improve the performance. In addition, the efficiency of the HARL algorithm is able to demonstrate in varied crowd sizes. Two kinds of applications are used in Reinforcement Learning such as tracking applications and transportation monitoring applications for pretending the crowd sizes.


2012 ◽  
Vol 27 (1) ◽  
pp. 1-31 ◽  
Author(s):  
Laetitia Matignon ◽  
Guillaume J. Laurent ◽  
Nadine Le Fort-Piat

AbstractIn the framework of fully cooperative multi-agent systems, independent (non-communicative) agents that learn by reinforcement must overcome several difficulties to manage to coordinate. This paper identifies several challenges responsible for the non-coordination of independent agents: Pareto-selection, non-stationarity, stochasticity, alter-exploration and shadowed equilibria. A selection of multi-agent domains is classified according to those challenges: matrix games, Boutilier's coordination game, predators pursuit domains and a special multi-state game. Moreover, the performance of a range of algorithms for independent reinforcement learners is evaluated empirically. Those algorithms are Q-learning variants: decentralized Q-learning, distributed Q-learning, hysteretic Q-learning, recursive frequency maximum Q-value and win-or-learn fast policy hill climbing. An overview of the learning algorithms’ strengths and weaknesses against each challenge concludes the paper and can serve as a basis for choosing the appropriate algorithm for a new domain. Furthermore, the distilled challenges may assist in the design of new learning algorithms that overcome these problems and achieve higher performance in multi-agent applications.


1993 ◽  
Vol 3 (3) ◽  
pp. 385-393 ◽  
Author(s):  
W. Helfrich

Sign in / Sign up

Export Citation Format

Share Document