multiagent reinforcement learning
Recently Published Documents


TOTAL DOCUMENTS

182
(FIVE YEARS 72)

H-INDEX

20
(FIVE YEARS 3)

2022 ◽  
Vol 119 (3) ◽  
pp. e2106028118
Author(s):  
Raphael Köster ◽  
Dylan Hadfield-Menell ◽  
Richard Everett ◽  
Laura Weidinger ◽  
Gillian K. Hadfield ◽  
...  

How do societies learn and maintain social norms? Here we use multiagent reinforcement learning to investigate the learning dynamics of enforcement and compliance behaviors. Artificial agents populate a foraging environment and need to learn to avoid a poisonous berry. Agents learn to avoid eating poisonous berries better when doing so is taboo, meaning the behavior is punished by other agents. The taboo helps overcome a credit assignment problem in discovering delayed health effects. Critically, introducing an additional taboo, which results in punishment for eating a harmless berry, further improves overall returns. This “silly rule” counterintuitively has a positive effect because it gives agents more practice in learning rule enforcement. By probing what individual agents have learned, we demonstrate that normative behavior relies on a sequence of learned skills. Learning rule compliance builds upon prior learning of rule enforcement by other agents. Our results highlight the benefit of employing a multiagent reinforcement learning computational model focused on learning to implement complex actions.


2021 ◽  
Vol 2021 ◽  
pp. 1-12
Author(s):  
Baolai Wang ◽  
Shengang Li ◽  
Xianzhong Gao ◽  
Tao Xie

With the development of unmanned aerial vehicle (UAV) technology, UAV swarm confrontation has attracted many researchers’ attention. However, the situation faced by the UAV swarm has substantial uncertainty and dynamic variability. The state space and action space increase exponentially with the number of UAVs, so that autonomous decision-making becomes a difficult problem in the confrontation environment. In this paper, a multiagent reinforcement learning method with macro action and human expertise is proposed for autonomous decision-making of UAVs. In the proposed approach, UAV swarm is modeled as a large multiagent system (MAS) with an individual UAV as an agent, and the sequential decision-making problem in swarm confrontation is modeled as a Markov decision process. Agents in the proposed method are trained based on the macro actions, where sparse and delayed rewards, large state space, and action space are effectively overcome. The key to the success of this method is the generation of the macro actions that allow the high-level policy to find a near-optimal solution. In this paper, we further leverage human expertise to design a set of good macro actions. Extensive empirical experiments in our constructed swarm confrontation environment show that our method performs better than the other algorithms.


2021 ◽  
Vol 2021 ◽  
pp. 1-14
Author(s):  
Siyuan Ding ◽  
Shengxiang Li ◽  
Guangyi Liu ◽  
Ou Li ◽  
Ke Ke ◽  
...  

The exponential explosion of joint actions and massive data collection are two main challenges in multiagent reinforcement learning algorithms with centralized training. To overcome these problems, in this paper, we propose a model-free and fully decentralized actor-critic multiagent reinforcement learning algorithm based on message diffusion. To this end, the agents are assumed to be placed in a time-varying communication network. Each agent makes limited observations regarding the global state and joint actions; therefore, it needs to obtain and share information with others over the network. In the proposed algorithm, agents hold local estimations of the global state and joint actions and update them with local observations and the messages received from neighbors. Under the hypothesis of the global value decomposition, the gradient of the global objective function to an individual agent is derived. The convergence of the proposed algorithm with linear function approximation is guaranteed according to the stochastic approximation theory. In the experiments, the proposed algorithm was applied to a passive location task multiagent environment and achieved superior performance compared to state-of-the-art algorithms.


2021 ◽  
Vol 2021 ◽  
pp. 1-17
Author(s):  
Tong Zhu ◽  
Xiaohu Li ◽  
Wei Fan ◽  
Changshuai Wang ◽  
Haoxue Liu ◽  
...  

Work zone areas are frequent congested sections considered as the freeway bottleneck. Connected and autonomous vehicle (CAV) trajectory optimization can improve the operating efficiency in bottleneck areas by harmonizing vehicles’ manipulations. This study presents a joint trajectory optimization of cooperative lane changing, merging, and car-following actions for CAV control at a local merging point together with upstream points. The multiagent reinforcement learning (MARL) method is applied in this system, with one agent providing a merging advisory service at the merging point and controlling the inner-lane vehicles’ headway for smooth outer-lane vehicle merging, while other agents provide lane-changing advisory services at advance lane-changing points to control how vehicles make lane changes in advance and perform corresponding headway adjustment, similar to and jointly with the merging advisory service. Uniting all agents, the coordination graph (CG) method is applied to seek the global optimum, overcoming the exponential growth problem in MARL. Using MATLAB and the VISSIM COM interface, an online simulation platform is established. The simulation results show that MARL is effective for online computation with in-timing response. More importantly, comparisons of the results obtained in various scenarios demonstrate that the proposed system obtained smoother vehicle trajectories in all controlled sections, rather than only in the merging area, indicating that it can achieve better traffic conditions in freeway work zone areas.


2021 ◽  
Vol 6 (4) ◽  
pp. 7461-7468
Author(s):  
Guanglin Ji ◽  
Junyan Yan ◽  
Jingxin Du ◽  
Wanquan Yan ◽  
Jibiao Chen ◽  
...  

2021 ◽  
Vol 2021 ◽  
pp. 1-7
Author(s):  
Wei Dai ◽  
Wei Wang ◽  
Zhongtian Mao ◽  
Ruwen Jiang ◽  
Fudong Nian ◽  
...  

The main objective of multiagent reinforcement learning is to achieve a global optimal policy. It is difficult to evaluate the value function with high-dimensional state space. Therefore, we transfer the problem of multiagent reinforcement learning into a distributed optimization problem with constraint terms. In this problem, all agents share the space of states and actions, but each agent only obtains its own local reward. Then, we propose a distributed optimization with fractional order dynamics to solve this problem. Moreover, we prove the convergence of the proposed algorithm and illustrate its effectiveness with a numerical example.


2021 ◽  
Vol 18 (5) ◽  
pp. 172988142110449
Author(s):  
Haolin Wu ◽  
Hui Li ◽  
Jianwei Zhang ◽  
Zhuang Wang ◽  
Jianeng Zhang

Multiagent reinforcement learning holds considerable promise to deal with cooperative multiagent tasks. Unfortunately, the only global reward shared by all agents in the cooperative tasks may lead to the lazy agent problem. To cope with such a problem, we propose a generating individual intrinsic reward algorithm, which introduces an intrinsic reward encoder to generate an individual intrinsic reward for each agent and utilizes the hypernetworks as the decoder to help to estimate the individual action values of the decomposition methods based on the generated individual intrinsic reward. Experimental results in the StarCraft II micromanagement benchmark prove that the proposed algorithm can increase learning efficiency and improve policy performance.


Sign in / Sign up

Export Citation Format

Share Document