Deep Reinforcement Learning Algorithms for Multiple Arc-Welding Robots

The applications of the deep reinforcement learning method to achieve the arcs welding by multi-robot systems are presented, where the states and the actions of each robot are continuous and obstacles are considered in the welding environment. In order to adapt to the time-varying welding task and local information available to each robot in the welding environment, the so-called multi-agent deep deterministic policy gradient (MADDPG) algorithm is designed with a new set of rewards. Based on the idea of the distributed execution and centralized training, the proposed MADDPG algorithm is distributed. Simulation results demonstrate the effectiveness of the proposed method.

Download Full-text

Robust Multi-Agent Reinforcement Learning via Minimax Deep Deterministic Policy Gradient

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33014213 ◽

2019 ◽

Vol 33 ◽

pp. 4213-4220 ◽

Cited By ~ 12

Author(s):

Shihui Li ◽

Yi Wu ◽

Xinyue Cui ◽

Honghua Dong ◽

Fei Fang ◽

...

Keyword(s):

Reinforcement Learning ◽

Gradient Algorithm ◽

Training Environment ◽

Local Optima ◽

Continuous Action ◽

Agent Learning ◽

Policy Gradient ◽

Multi Agent ◽

Continuous Actions ◽

Computational Intractability

Despite the recent advances of deep reinforcement learning (DRL), agents trained by DRL tend to be brittle and sensitive to the training environment, especially in the multi-agent scenarios. In the multi-agent setting, a DRL agent’s policy can easily get stuck in a poor local optima w.r.t. its training partners – the learned policy may be only locally optimal to other agents’ current policies. In this paper, we focus on the problem of training robust DRL agents with continuous actions in the multi-agent learning setting so that the trained agents can still generalize when its opponents’ policies alter. To tackle this problem, we proposed a new algorithm, MiniMax Multi-agent Deep Deterministic Policy Gradient (M3DDPG) with the following contributions: (1) we introduce a minimax extension of the popular multi-agent deep deterministic policy gradient algorithm (MADDPG), for robust policy learning; (2) since the continuous action space leads to computational intractability in our minimax learning objective, we propose Multi-Agent Adversarial Learning (MAAL) to efficiently solve our proposed formulation. We empirically evaluate our M3DDPG algorithm in four mixed cooperative and competitive multi-agent environments and the agents trained by our method significantly outperforms existing baselines.

Download Full-text

QSOD: Hybrid Policy Gradient for Deep Multi-agent Reinforcement Learning

IEEE Access ◽

10.1109/access.2021.3113350 ◽

2021 ◽

Vol 9 ◽

pp. 129728-129741

Author(s):

Hafiz Muhammad Raza Ur Rehman ◽

Byung-Won On ◽

Devarani Devi Ningombam ◽

Sungwon Yi ◽

Gyu Sang Choi

Keyword(s):

Reinforcement Learning ◽

Policy Gradient ◽

Multi Agent

Download Full-text

Adaptive Load Balancing: A Study in Multi-Agent Learning

Journal of Artificial Intelligence Research ◽

10.1613/jair.121 ◽

1995 ◽

Vol 2 ◽

pp. 475-500 ◽

Cited By ~ 80

Author(s):

A. Schaerf ◽

Y. Shoham ◽

M. Tennenholtz

Keyword(s):

Reinforcement Learning ◽

Load Balancing ◽

Distributed System ◽

Adaptive Behavior ◽

Local Information ◽

System Efficiency ◽

Agent Learning ◽

Multi Agent ◽

Explicit Communication

We study the process of multi-agent reinforcement learning in the context ofload balancing in a distributed system, without use of either centralcoordination or explicit communication. We first define a precise frameworkin which to study adaptive load balancing, important features of which are itsstochastic nature and the purely local information available to individualagents. Given this framework, we show illuminating results on the interplaybetween basic adaptive behavior parameters and their effect on systemefficiency. We then investigate the properties of adaptive load balancing inheterogeneous populations, and address the issue of exploration vs.exploitation in that context. Finally, we show that naive use ofcommunication may not improve, and might even harm system efficiency.

Download Full-text

Consensus in networked multi-robot systems via local state feedback robust control

International Journal of Advanced Robotic Systems ◽

10.1177/1729881419893549 ◽

2019 ◽

Vol 16 (6) ◽

pp. 172988141989354

Author(s):

Shijie Zhang ◽

Yi Cao

Keyword(s):

State Feedback ◽

Dynamical Equation ◽

Local Information ◽

Network Graph ◽

Robust Controller ◽

Common Value ◽

Robot Systems ◽

Simulation Results ◽

The Individual ◽

Multi Robot

In the article, the consensus problem is considered for networked multi-robot systems, in which the dynamical equation of all robots is non-holonomic and nonlinear systems. In the multi-robot systems, each robot updates its current states and receives the states from the neighboring robots. Under the assumption that if the network graph is bidirectional, a local information-based state feedback robust controller is designed to make sure the convergence of the individual robots’ states to a common value. Finally, the effectiveness of the presented method is illustrated by the simulation results of a group of four mobile robots.

Download Full-text

Co-Evolution of Predator-Prey Ecosystems by Reinforcement Learning Agents

Entropy ◽

10.3390/e23040461 ◽

2021 ◽

Vol 23 (4) ◽

pp. 461

Author(s):

Jeongho Park ◽

Juwon Lee ◽

Taehwan Kim ◽

Inkyung Ahn ◽

Jooyoung Park

Keyword(s):

Reinforcement Learning ◽

Expected Returns ◽

Dynamic Nature ◽

Predator Prey ◽

Learning Agents ◽

Complex Interactions ◽

Multi Agent ◽

Simulation Results ◽

Multiple Species ◽

And Robotics

The problem of finding adequate population models in ecology is important for understanding essential aspects of their dynamic nature. Since analyzing and accurately predicting the intelligent adaptation of multiple species is difficult due to their complex interactions, the study of population dynamics still remains a challenging task in computational biology. In this paper, we use a modern deep reinforcement learning (RL) approach to explore a new avenue for understanding predator-prey ecosystems. Recently, reinforcement learning methods have achieved impressive results in areas, such as games and robotics. RL agents generally focus on building strategies for taking actions in an environment in order to maximize their expected returns. Here we frame the co-evolution of predators and preys in an ecosystem as allowing agents to learn and evolve toward better ones in a manner appropriate for multi-agent reinforcement learning. Recent significant advancements in reinforcement learning allow for new perspectives on these types of ecological issues. Our simulation results show that throughout the scenarios with RL agents, predators can achieve a reasonable level of sustainability, along with their preys.

Download Full-text

Autonomous and cooperative control of UAV cluster with multi-agent reinforcement learning

The Aeronautical Journal ◽

10.1017/aer.2021.112 ◽

2022 ◽

pp. 1-20

Author(s):

D. Xu ◽

G. Chen

Keyword(s):

Reinforcement Learning ◽

Safety Factor ◽

Cooperative Control ◽

Learning Framework ◽

Control Stage ◽

Autonomous Planning ◽

Operational Safety ◽

Policy Gradient ◽

Multi Agent ◽

Reward Mechanism

Abstract In this paper, we expolore Multi-Agent Reinforcement Learning (MARL) methods for unmanned aerial vehicle (UAV) cluster. Considering that the current UAV cluster is still in the program control stage, the fully autonomous and intelligent cooperative combat has not been realised. In order to realise the autonomous planning of the UAV cluster according to the changing environment and cooperate with each other to complete the combat goal, we propose a new MARL framework. It adopts the policy of centralised training with decentralised execution, and uses Actor-Critic network to select the execution action and then to make the corresponding evaluation. The new algorithm makes three key improvements on the basis of Multi-Agent Deep Deterministic Policy Gradient (MADDPG) algorithm. The first is to improve learning framework; it makes the calculated Q value more accurate. The second is to add collision avoidance setting, which can increase the operational safety factor. And the third is to adjust reward mechanism; it can effectively improve the cluster’s cooperative ability. Then the improved MADDPG algorithm is tested by performing two conventional combat missions. The simulation results show that the learning efficiency is obviously improved, and the operational safety factor is further increased compared with the previous algorithm.

Download Full-text

Multi-Agent Pattern Formation with Deep Reinforcement Learning (Student Abstract)

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i10.7161 ◽

2020 ◽

Vol 34 (10) ◽

pp. 13779-13780

Author(s):

Elhadji Amadou Oury Diallo ◽

Toshiharu Sugawara

Keyword(s):

Reinforcement Learning ◽

Pattern Formation ◽

Trajectory Planning ◽

Local Information ◽

Complex Shapes ◽

Multi Agent ◽

Random Patterns

We propose a decentralized multi-agent deep reinforcement learning architecture to investigate pattern formation under the local information provided by the agents' sensors. It consists of tasking a large number of homogeneous agents to move to a set of specified goal locations, addressing both the assignment and trajectory planning sub-problems concurrently. We then show that agents trained on random patterns can organize themselves into very complex shapes.

Download Full-text

Robust Instance-Based Reinforcement Learning for Multi-Robot Systems(Multi-agent and Learning,Session: TP2-A)

The Abstracts of the international conference on advanced mechatronics toward evolutionary fusion of IT and mechatronics ICAM ◽

10.1299/jsmeicam.2004.4.51_1 ◽

2004 ◽

Vol 2004.4 (0) ◽

pp. 51 ◽

Cited By ~ 1

Author(s):

Kazuhiro Ohkura ◽

Ryota Washizaki

Keyword(s):

Reinforcement Learning ◽

Learning Session ◽

Robot Systems ◽

Multi Agent ◽

Multi Robot

Download Full-text

Message-Dropout: An Efficient Training Method for Multi-Agent Deep Reinforcement Learning

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33016079 ◽

2019 ◽

Vol 33 ◽

pp. 6079-6086 ◽

Cited By ~ 1

Author(s):

Woojun Kim ◽

Myungsik Cho ◽

Youngchul Sung

Keyword(s):

Reinforcement Learning ◽

Dropout Rate ◽

Learning Performance ◽

Multi Agent Systems ◽

Message Communication ◽

Execution Phase ◽

New Learning ◽

Policy Gradient ◽

Multi Agent ◽

Decentralized Execution

In this paper, we propose a new learning technique named message-dropout to improve the performance for multi-agent deep reinforcement learning under two application scenarios: 1) classical multi-agent reinforcement learning with direct message communication among agents and 2) centralized training with decentralized execution. In the first application scenario of multi-agent systems in which direct message communication among agents is allowed, the messagedropout technique drops out the received messages from other agents in a block-wise manner with a certain probability in the training phase and compensates for this effect by multiplying the weights of the dropped-out block units with a correction probability. The applied message-dropout technique effectively handles the increased input dimension in multi-agent reinforcement learning with communication and makes learning robust against communication errors in the execution phase. In the second application scenario of centralized training with decentralized execution, we particularly consider the application of the proposed messagedropout to Multi-Agent Deep Deterministic Policy Gradient (MADDPG), which uses a centralized critic to train a decentralized actor for each agent. We evaluate the proposed message-dropout technique for several games, and numerical results show that the proposed message-dropout technique with proper dropout rate improves the reinforcement learning performance significantly in terms of the training speed and the steady-state performance in the execution phase.

Download Full-text

Predictive control for trajectory tracking and decentralized navigation of multi-agent formations

International Journal of Applied Mathematics and Computer Science ◽

10.2478/amcs-2013-0008 ◽

2013 ◽

Vol 23 (1) ◽

pp. 91-102 ◽

Cited By ~ 16

Author(s):

Ionela Prodan ◽

Sorin Olaru ◽

Cristina Stoica ◽

Silviu-Iulian Niculescu

Keyword(s):

Predictive Control ◽

Control Strategy ◽

Local Information ◽

Main Concern ◽

Reference Trajectory ◽

Time Varying ◽

Effective Performance ◽

Field Methodology ◽

Multi Agent ◽

Position And Orientation

This paper addresses a predictive control strategy for a particular class of multi-agent formations with a time-varying topology. The goal is to guarantee tracking capabilities with respect to a reference trajectory which is pre-specified for an agent designed as the leader. Then, the remaining agents, designed as followers, track the position and orientation of the leader. In real-time, a predictive control strategy enhanced with the potential field methodology is used in order to derive a feedback control action based only on local information within the group of agents. The main concern is that the interconnections between the agents are time-varying, affecting the neighborhood around each agent. The proposed method exhibits effective performance validated through some illustrative examples.

Download Full-text