multiagent reinforcement learning Latest Research Papers

How do societies learn and maintain social norms? Here we use multiagent reinforcement learning to investigate the learning dynamics of enforcement and compliance behaviors. Artificial agents populate a foraging environment and need to learn to avoid a poisonous berry. Agents learn to avoid eating poisonous berries better when doing so is taboo, meaning the behavior is punished by other agents. The taboo helps overcome a credit assignment problem in discovering delayed health effects. Critically, introducing an additional taboo, which results in punishment for eating a harmless berry, further improves overall returns. This “silly rule” counterintuitively has a positive effect because it gives agents more practice in learning rule enforcement. By probing what individual agents have learned, we demonstrate that normative behavior relies on a sequence of learned skills. Learning rule compliance builds upon prior learning of rule enforcement by other agents. Our results highlight the benefit of employing a multiagent reinforcement learning computational model focused on learning to implement complex actions.

Download Full-text

Distributed Actor-Critic Algorithms for Multiagent Reinforcement Learning Over Directed Graphs

IEEE Transactions on Neural Networks and Learning Systems ◽

10.1109/tnnls.2021.3139138 ◽

2022 ◽

pp. 1-12

Author(s):

Pengcheng Dai ◽

Wenwu Yu ◽

He Wang ◽

Simone Baldi

Keyword(s):

Reinforcement Learning ◽

Directed Graphs ◽

Multiagent Reinforcement Learning

Download Full-text

UAV Swarm Confrontation Using Hierarchical Multiagent Reinforcement Learning

International Journal of Aerospace Engineering ◽

10.1155/2021/3360116 ◽

2021 ◽

Vol 2021 ◽

pp. 1-12

Author(s):

Baolai Wang ◽

Shengang Li ◽

Xianzhong Gao ◽

Tao Xie

Keyword(s):

Decision Making ◽

Reinforcement Learning ◽

State Space ◽

Optimal Solution ◽

Difficult Problem ◽

Action Space ◽

Sequential Decision ◽

Autonomous Decision ◽

Multiagent Reinforcement Learning ◽

Uav Swarm

With the development of unmanned aerial vehicle (UAV) technology, UAV swarm confrontation has attracted many researchers’ attention. However, the situation faced by the UAV swarm has substantial uncertainty and dynamic variability. The state space and action space increase exponentially with the number of UAVs, so that autonomous decision-making becomes a difficult problem in the confrontation environment. In this paper, a multiagent reinforcement learning method with macro action and human expertise is proposed for autonomous decision-making of UAVs. In the proposed approach, UAV swarm is modeled as a large multiagent system (MAS) with an individual UAV as an agent, and the sequential decision-making problem in swarm confrontation is modeled as a Markov decision process. Agents in the proposed method are trained based on the macro actions, where sparse and delayed rewards, large state space, and action space are effectively overcome. The key to the success of this method is the generation of the macro actions that allow the high-level policy to find a near-optimal solution. In this paper, we further leverage human expertise to design a set of good macro actions. Extensive empirical experiments in our constructed swarm confrontation environment show that our method performs better than the other algorithms.

Download Full-text

Decentralized Multiagent Actor-Critic Algorithm Based on Message Diffusion

Journal of Sensors ◽

10.1155/2021/8739206 ◽

2021 ◽

Vol 2021 ◽

pp. 1-14

Author(s):

Siyuan Ding ◽

Shengxiang Li ◽

Guangyi Liu ◽

Ou Li ◽

Ke Ke ◽

...

Keyword(s):

Reinforcement Learning ◽

Learning Algorithm ◽

Superior Performance ◽

Global State ◽

Individual Agent ◽

Joint Actions ◽

Model Free ◽

Linear Function Approximation ◽

Multiagent Reinforcement Learning ◽

Value Decomposition

The exponential explosion of joint actions and massive data collection are two main challenges in multiagent reinforcement learning algorithms with centralized training. To overcome these problems, in this paper, we propose a model-free and fully decentralized actor-critic multiagent reinforcement learning algorithm based on message diffusion. To this end, the agents are assumed to be placed in a time-varying communication network. Each agent makes limited observations regarding the global state and joint actions; therefore, it needs to obtain and share information with others over the network. In the proposed algorithm, agents hold local estimations of the global state and joint actions and update them with local observations and the messages received from neighbors. Under the hypothesis of the global value decomposition, the gradient of the global objective function to an individual agent is derived. The convergence of the proposed algorithm with linear function approximation is guaranteed according to the stochastic approximation theory. In the experiments, the proposed algorithm was applied to a passive location task multiagent environment and achieved superior performance compared to state-of-the-art algorithms.

Download Full-text

A Target-coupled Multiagent Reinforcement Learning Approach for Teams of Mobile Sensing Robots

10.1109/iai53119.2021.9619378 ◽

2021 ◽

Author(s):

Xin Wang ◽

Chuanzhi Zang ◽

Shuqing Xu ◽

Peng Zeng

Keyword(s):

Reinforcement Learning ◽

Mobile Sensing ◽

Learning Approach ◽

Multiagent Reinforcement Learning

Download Full-text

Trajectory Optimization of CAVs in Freeway Work Zone considering Car-Following Behaviors Using Online Multiagent Reinforcement Learning

Journal of Advanced Transportation ◽

10.1155/2021/9805560 ◽

2021 ◽

Vol 2021 ◽

pp. 1-17

Author(s):

Tong Zhu ◽

Xiaohu Li ◽

Wei Fan ◽

Changshuai Wang ◽

Haoxue Liu ◽

...

Keyword(s):

Reinforcement Learning ◽

Trajectory Optimization ◽

Autonomous Vehicle ◽

Work Zone ◽

Operating Efficiency ◽

Lane Changing ◽

Car Following ◽

Advisory Service ◽

Online Computation ◽

Multiagent Reinforcement Learning

Work zone areas are frequent congested sections considered as the freeway bottleneck. Connected and autonomous vehicle (CAV) trajectory optimization can improve the operating efficiency in bottleneck areas by harmonizing vehicles’ manipulations. This study presents a joint trajectory optimization of cooperative lane changing, merging, and car-following actions for CAV control at a local merging point together with upstream points. The multiagent reinforcement learning (MARL) method is applied in this system, with one agent providing a merging advisory service at the merging point and controlling the inner-lane vehicles’ headway for smooth outer-lane vehicle merging, while other agents provide lane-changing advisory services at advance lane-changing points to control how vehicles make lane changes in advance and perform corresponding headway adjustment, similar to and jointly with the merging advisory service. Uniting all agents, the coordination graph (CG) method is applied to seek the global optimum, overcoming the exponential growth problem in MARL. Using MATLAB and the VISSIM COM interface, an online simulation platform is established. The simulation results show that MARL is effective for online computation with in-timing response. More importantly, comparisons of the results obtained in various scenarios demonstrate that the proposed system obtained smoother vehicle trajectories in all controlled sections, rather than only in the merging area, indicating that it can achieve better traffic conditions in freeway work zone areas.

Download Full-text

Towards Safe Control of Continuum Manipulator Using Shielded Multiagent Reinforcement Learning

IEEE Robotics and Automation Letters ◽

10.1109/lra.2021.3097660 ◽

2021 ◽

Vol 6 (4) ◽

pp. 7461-7468

Author(s):

Guanglin Ji ◽

Junyan Yan ◽

Jingxin Du ◽

Wanquan Yan ◽

Jibiao Chen ◽

...

Keyword(s):

Reinforcement Learning ◽

Multiagent Reinforcement Learning ◽

Safe Control ◽

Continuum Manipulator

Download Full-text

Distributed Policy Evaluation with Fractional Order Dynamics in Multiagent Reinforcement Learning

Security and Communication Networks ◽

10.1155/2021/1020466 ◽

2021 ◽

Vol 2021 ◽

pp. 1-7

Author(s):

Wei Dai ◽

Wei Wang ◽

Zhongtian Mao ◽

Ruwen Jiang ◽

Fudong Nian ◽

...

Keyword(s):

Reinforcement Learning ◽

Fractional Order ◽

Optimization Problem ◽

Value Function ◽

Distributed Optimization ◽

High Dimensional ◽

Multiagent Reinforcement Learning ◽

Dimensional State Space ◽

Global Optimal ◽

The Value Function

The main objective of multiagent reinforcement learning is to achieve a global optimal policy. It is difficult to evaluate the value function with high-dimensional state space. Therefore, we transfer the problem of multiagent reinforcement learning into a distributed optimization problem with constraint terms. In this problem, all agents share the space of states and actions, but each agent only obtains its own local reward. Then, we propose a distributed optimization with fractional order dynamics to solve this problem. Moreover, we prove the convergence of the proposed algorithm and illustrate its effectiveness with a numerical example.

Download Full-text

Generating individual intrinsic reward for cooperative multiagent reinforcement learning

International Journal of Advanced Robotic Systems ◽

10.1177/17298814211044946 ◽

2021 ◽

Vol 18 (5) ◽

pp. 172988142110449

Author(s):

Haolin Wu ◽

Hui Li ◽

Jianwei Zhang ◽

Zhuang Wang ◽

Jianeng Zhang

Keyword(s):

Reinforcement Learning ◽

Decomposition Methods ◽

Experimental Results ◽

Individual Action ◽

Policy Performance ◽

Intrinsic Reward ◽

Cooperative Tasks ◽

Considerable Promise ◽

Multiagent Reinforcement Learning ◽

The Individual

Multiagent reinforcement learning holds considerable promise to deal with cooperative multiagent tasks. Unfortunately, the only global reward shared by all agents in the cooperative tasks may lead to the lazy agent problem. To cope with such a problem, we propose a generating individual intrinsic reward algorithm, which introduces an intrinsic reward encoder to generate an individual intrinsic reward for each agent and utilizes the hypernetworks as the decoder to help to estimate the individual action values of the decomposition methods based on the generated individual intrinsic reward. Experimental results in the StarCraft II micromanagement benchmark prove that the proposed algorithm can increase learning efficiency and improve policy performance.

Download Full-text

multiagent reinforcement learning
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

Collective eXplainable AI: Explaining Cooperative Strategies and Agent Contribution in Multiagent Reinforcement Learning With Shapley Values

Spurious normativity enhances learning of compliance and enforcement behavior in artificial agents

Distributed Actor-Critic Algorithms for Multiagent Reinforcement Learning Over Directed Graphs

UAV Swarm Confrontation Using Hierarchical Multiagent Reinforcement Learning

Decentralized Multiagent Actor-Critic Algorithm Based on Message Diffusion

A Target-coupled Multiagent Reinforcement Learning Approach for Teams of Mobile Sensing Robots

Trajectory Optimization of CAVs in Freeway Work Zone considering Car-Following Behaviors Using Online Multiagent Reinforcement Learning

Towards Safe Control of Continuum Manipulator Using Shielded Multiagent Reinforcement Learning

Distributed Policy Evaluation with Fractional Order Dynamics in Multiagent Reinforcement Learning

Generating individual intrinsic reward for cooperative multiagent reinforcement learning

Export Citation Format

multiagent reinforcement learningRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Collective eXplainable AI: Explaining Cooperative Strategies and Agent Contribution in Multiagent Reinforcement Learning With Shapley Values

Spurious normativity enhances learning of compliance and enforcement behavior in artificial agents

Distributed Actor-Critic Algorithms for Multiagent Reinforcement Learning Over Directed Graphs

UAV Swarm Confrontation Using Hierarchical Multiagent Reinforcement Learning

Decentralized Multiagent Actor-Critic Algorithm Based on Message Diffusion

A Target-coupled Multiagent Reinforcement Learning Approach for Teams of Mobile Sensing Robots

Trajectory Optimization of CAVs in Freeway Work Zone considering Car-Following Behaviors Using Online Multiagent Reinforcement Learning

Towards Safe Control of Continuum Manipulator Using Shielded Multiagent Reinforcement Learning

Distributed Policy Evaluation with Fractional Order Dynamics in Multiagent Reinforcement Learning

Generating individual intrinsic reward for cooperative multiagent reinforcement learning

multiagent reinforcement learning
Recently Published Documents