scholarly journals Deep Reinforcement Learning Algorithms for Multiple Arc-Welding Robots

2021 ◽  
Vol 2 ◽  
Author(s):  
Lei-Xin Xu ◽  
Yang-Yang Chen

The applications of the deep reinforcement learning method to achieve the arcs welding by multi-robot systems are presented, where the states and the actions of each robot are continuous and obstacles are considered in the welding environment. In order to adapt to the time-varying welding task and local information available to each robot in the welding environment, the so-called multi-agent deep deterministic policy gradient (MADDPG) algorithm is designed with a new set of rewards. Based on the idea of the distributed execution and centralized training, the proposed MADDPG algorithm is distributed. Simulation results demonstrate the effectiveness of the proposed method.

Author(s):  
Shihui Li ◽  
Yi Wu ◽  
Xinyue Cui ◽  
Honghua Dong ◽  
Fei Fang ◽  
...  

Despite the recent advances of deep reinforcement learning (DRL), agents trained by DRL tend to be brittle and sensitive to the training environment, especially in the multi-agent scenarios. In the multi-agent setting, a DRL agent’s policy can easily get stuck in a poor local optima w.r.t. its training partners – the learned policy may be only locally optimal to other agents’ current policies. In this paper, we focus on the problem of training robust DRL agents with continuous actions in the multi-agent learning setting so that the trained agents can still generalize when its opponents’ policies alter. To tackle this problem, we proposed a new algorithm, MiniMax Multi-agent Deep Deterministic Policy Gradient (M3DDPG) with the following contributions: (1) we introduce a minimax extension of the popular multi-agent deep deterministic policy gradient algorithm (MADDPG), for robust policy learning; (2) since the continuous action space leads to computational intractability in our minimax learning objective, we propose Multi-Agent Adversarial Learning (MAAL) to efficiently solve our proposed formulation. We empirically evaluate our M3DDPG algorithm in four mixed cooperative and competitive multi-agent environments and the agents trained by our method significantly outperforms existing baselines.


IEEE Access ◽  
2021 ◽  
Vol 9 ◽  
pp. 129728-129741
Author(s):  
Hafiz Muhammad Raza Ur Rehman ◽  
Byung-Won On ◽  
Devarani Devi Ningombam ◽  
Sungwon Yi ◽  
Gyu Sang Choi

1995 ◽  
Vol 2 ◽  
pp. 475-500 ◽  
Author(s):  
A. Schaerf ◽  
Y. Shoham ◽  
M. Tennenholtz

We study the process of multi-agent reinforcement learning in the context ofload balancing in a distributed system, without use of either centralcoordination or explicit communication. We first define a precise frameworkin which to study adaptive load balancing, important features of which are itsstochastic nature and the purely local information available to individualagents. Given this framework, we show illuminating results on the interplaybetween basic adaptive behavior parameters and their effect on systemefficiency. We then investigate the properties of adaptive load balancing inheterogeneous populations, and address the issue of exploration vs.exploitation in that context. Finally, we show that naive use ofcommunication may not improve, and might even harm system efficiency.


2019 ◽  
Vol 16 (6) ◽  
pp. 172988141989354
Author(s):  
Shijie Zhang ◽  
Yi Cao

In the article, the consensus problem is considered for networked multi-robot systems, in which the dynamical equation of all robots is non-holonomic and nonlinear systems. In the multi-robot systems, each robot updates its current states and receives the states from the neighboring robots. Under the assumption that if the network graph is bidirectional, a local information-based state feedback robust controller is designed to make sure the convergence of the individual robots’ states to a common value. Finally, the effectiveness of the presented method is illustrated by the simulation results of a group of four mobile robots.


Entropy ◽  
2021 ◽  
Vol 23 (4) ◽  
pp. 461
Author(s):  
Jeongho Park ◽  
Juwon Lee ◽  
Taehwan Kim ◽  
Inkyung Ahn ◽  
Jooyoung Park

The problem of finding adequate population models in ecology is important for understanding essential aspects of their dynamic nature. Since analyzing and accurately predicting the intelligent adaptation of multiple species is difficult due to their complex interactions, the study of population dynamics still remains a challenging task in computational biology. In this paper, we use a modern deep reinforcement learning (RL) approach to explore a new avenue for understanding predator-prey ecosystems. Recently, reinforcement learning methods have achieved impressive results in areas, such as games and robotics. RL agents generally focus on building strategies for taking actions in an environment in order to maximize their expected returns. Here we frame the co-evolution of predators and preys in an ecosystem as allowing agents to learn and evolve toward better ones in a manner appropriate for multi-agent reinforcement learning. Recent significant advancements in reinforcement learning allow for new perspectives on these types of ecological issues. Our simulation results show that throughout the scenarios with RL agents, predators can achieve a reasonable level of sustainability, along with their preys.


2022 ◽  
pp. 1-20
Author(s):  
D. Xu ◽  
G. Chen

Abstract In this paper, we expolore Multi-Agent Reinforcement Learning (MARL) methods for unmanned aerial vehicle (UAV) cluster. Considering that the current UAV cluster is still in the program control stage, the fully autonomous and intelligent cooperative combat has not been realised. In order to realise the autonomous planning of the UAV cluster according to the changing environment and cooperate with each other to complete the combat goal, we propose a new MARL framework. It adopts the policy of centralised training with decentralised execution, and uses Actor-Critic network to select the execution action and then to make the corresponding evaluation. The new algorithm makes three key improvements on the basis of Multi-Agent Deep Deterministic Policy Gradient (MADDPG) algorithm. The first is to improve learning framework; it makes the calculated Q value more accurate. The second is to add collision avoidance setting, which can increase the operational safety factor. And the third is to adjust reward mechanism; it can effectively improve the cluster’s cooperative ability. Then the improved MADDPG algorithm is tested by performing two conventional combat missions. The simulation results show that the learning efficiency is obviously improved, and the operational safety factor is further increased compared with the previous algorithm.


2020 ◽  
Vol 34 (10) ◽  
pp. 13779-13780
Author(s):  
Elhadji Amadou Oury Diallo ◽  
Toshiharu Sugawara

We propose a decentralized multi-agent deep reinforcement learning architecture to investigate pattern formation under the local information provided by the agents' sensors. It consists of tasking a large number of homogeneous agents to move to a set of specified goal locations, addressing both the assignment and trajectory planning sub-problems concurrently. We then show that agents trained on random patterns can organize themselves into very complex shapes.


Author(s):  
Woojun Kim ◽  
Myungsik Cho ◽  
Youngchul Sung

In this paper, we propose a new learning technique named message-dropout to improve the performance for multi-agent deep reinforcement learning under two application scenarios: 1) classical multi-agent reinforcement learning with direct message communication among agents and 2) centralized training with decentralized execution. In the first application scenario of multi-agent systems in which direct message communication among agents is allowed, the messagedropout technique drops out the received messages from other agents in a block-wise manner with a certain probability in the training phase and compensates for this effect by multiplying the weights of the dropped-out block units with a correction probability. The applied message-dropout technique effectively handles the increased input dimension in multi-agent reinforcement learning with communication and makes learning robust against communication errors in the execution phase. In the second application scenario of centralized training with decentralized execution, we particularly consider the application of the proposed messagedropout to Multi-Agent Deep Deterministic Policy Gradient (MADDPG), which uses a centralized critic to train a decentralized actor for each agent. We evaluate the proposed message-dropout technique for several games, and numerical results show that the proposed message-dropout technique with proper dropout rate improves the reinforcement learning performance significantly in terms of the training speed and the steady-state performance in the execution phase.


Author(s):  
Ionela Prodan ◽  
Sorin Olaru ◽  
Cristina Stoica ◽  
Silviu-Iulian Niculescu

This paper addresses a predictive control strategy for a particular class of multi-agent formations with a time-varying topology. The goal is to guarantee tracking capabilities with respect to a reference trajectory which is pre-specified for an agent designed as the leader. Then, the remaining agents, designed as followers, track the position and orientation of the leader. In real-time, a predictive control strategy enhanced with the potential field methodology is used in order to derive a feedback control action based only on local information within the group of agents. The main concern is that the interconnections between the agents are time-varying, affecting the neighborhood around each agent. The proposed method exhibits effective performance validated through some illustrative examples.


Sign in / Sign up

Export Citation Format

Share Document