Autonomous and cooperative control of UAV cluster with multi-agent reinforcement learning

The Aeronautical Journal ◽

10.1017/aer.2021.112 ◽

2022 ◽

pp. 1-20

Author(s):

D. Xu ◽

G. Chen

Keyword(s):

Reinforcement Learning ◽

Safety Factor ◽

Cooperative Control ◽

Learning Framework ◽

Control Stage ◽

Autonomous Planning ◽

Operational Safety ◽

Policy Gradient ◽

Multi Agent ◽

Reward Mechanism

Abstract In this paper, we expolore Multi-Agent Reinforcement Learning (MARL) methods for unmanned aerial vehicle (UAV) cluster. Considering that the current UAV cluster is still in the program control stage, the fully autonomous and intelligent cooperative combat has not been realised. In order to realise the autonomous planning of the UAV cluster according to the changing environment and cooperate with each other to complete the combat goal, we propose a new MARL framework. It adopts the policy of centralised training with decentralised execution, and uses Actor-Critic network to select the execution action and then to make the corresponding evaluation. The new algorithm makes three key improvements on the basis of Multi-Agent Deep Deterministic Policy Gradient (MADDPG) algorithm. The first is to improve learning framework; it makes the calculated Q value more accurate. The second is to add collision avoidance setting, which can increase the operational safety factor. And the third is to adjust reward mechanism; it can effectively improve the cluster’s cooperative ability. Then the improved MADDPG algorithm is tested by performing two conventional combat missions. The simulation results show that the learning efficiency is obviously improved, and the operational safety factor is further increased compared with the previous algorithm.

Download Full-text

Building a Connected Communication Network for UAV Clusters Using DE-MADDPG

Symmetry ◽

10.3390/sym13081537 ◽

2021 ◽

Vol 13 (8) ◽

pp. 1537

Author(s):

Zixiong Zhu ◽

Nianhao Xie ◽

Kang Zong ◽

Lei Chen

Keyword(s):

Communication Network ◽

Reinforcement Learning ◽

Control Method ◽

Control Mechanisms ◽

Motion Information ◽

Reward Function ◽

Learning Framework ◽

Policy Gradient ◽

Multi Agent ◽

Virtual Leader

Clusters of unmanned aerial vehicles (UAVs) are often used to perform complex tasks. In such clusters, the reliability of the communication network connecting the UAVs is an essential factor in their collective efficiency. Due to the complex wireless environment, however, communication malfunctions within the cluster are likely during the flight of UAVs. In such cases, it is important to control the cluster and rebuild the connected network. The asymmetry of the cluster topology also increases the complexity of the control mechanisms. The traditional control methods based on cluster consistency often rely on the motion information of the neighboring UAVs. The motion information, however, may become unavailable because of the interrupted communications. UAV control algorithms based on deep reinforcement learning have achieved outstanding results in many fields. Here, we propose a cluster control method based on the Decomposed Multi-Agent Deep Deterministic Policy Gradient (DE-MADDPG) to rebuild a communication network for UAV clusters. The DE-MADDPG improves the framework of the traditional multi-agent deep deterministic policy gradient (MADDPG) algorithm by decomposing the reward function. We further introduce the reward reshaping function to facilitate the convergence of the algorithm in sparse reward environments. To address the instability of the state-space in the reinforcement learning framework, we also propose the notion of the virtual leader–follower model. Extensive simulations show that the success rate of the DE-MADDPG is higher than that of the MADDPG algorithm, confirming the effectiveness of the proposed method.

Download Full-text

A Multi-Agent Reinforcement Learning Framework with Recurrent Communication Module for Traffic Light Control

10.1109/iciscae52414.2021.9590701 ◽

2021 ◽

Author(s):

Bo Qin ◽

Wei He ◽

Bin Zhang ◽

Jingchen Li

Keyword(s):

Reinforcement Learning ◽

Light Control ◽

Traffic Light ◽

Learning Framework ◽

Traffic Light Control ◽

Communication Module ◽

Multi Agent

Download Full-text

An Insulin Bolus Advisor for Type 1 Diabetes Using Deep Reinforcement Learning

Sensors ◽

10.3390/s20185058 ◽

2020 ◽

Vol 20 (18) ◽

pp. 5058 ◽

Cited By ~ 1

Author(s):

Taiyu Zhu ◽

Kezhi Li ◽

Lei Kuang ◽

Pau Herrero ◽

Pantelis Georgiou

Keyword(s):

Type 1 Diabetes ◽

Reinforcement Learning ◽

Glucose Monitoring ◽

Insulin Delivery ◽

Target Range ◽

Average Percentage ◽

Learning Framework ◽

Policy Gradient ◽

Bolus Insulin

(1) Background: People living with type 1 diabetes (T1D) require self-management to maintain blood glucose (BG) levels in a therapeutic range through the delivery of exogenous insulin. However, due to the various variability, uncertainty and complex glucose dynamics, optimizing the doses of insulin delivery to minimize the risk of hyperglycemia and hypoglycemia is still an open problem. (2) Methods: In this work, we propose a novel insulin bolus advisor which uses deep reinforcement learning (DRL) and continuous glucose monitoring to optimize insulin dosing at mealtime. In particular, an actor-critic model based on deep deterministic policy gradient is designed to compute mealtime insulin doses. The proposed system architecture uses a two-step learning framework, in which a population model is first obtained and then personalized by subject-specific data. Prioritized memory replay is adopted to accelerate the training process in clinical practice. To validate the algorithm, we employ a customized version of the FDA-accepted UVA/Padova T1D simulator to perform in silico trials on 10 adult subjects and 10 adolescent subjects. (3) Results: Compared to a standard bolus calculator as the baseline, the DRL insulin bolus advisor significantly improved the average percentage time in target range (70–180 mg/dL) from 74.1%±8.4% to 80.9%±6.9% (p<0.01) and 54.9%±12.4% to 61.6%±14.1% (p<0.01) in the the adult and adolescent cohorts, respectively, while reducing hypoglycemia. (4) Conclusions: The proposed algorithm has the potential to improve mealtime bolus insulin delivery in people with T1D and is a feasible candidate for future clinical validation.

Download Full-text

Robust Multi-Agent Reinforcement Learning via Minimax Deep Deterministic Policy Gradient

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33014213 ◽

2019 ◽

Vol 33 ◽

pp. 4213-4220 ◽

Cited By ~ 12

Author(s):

Shihui Li ◽

Yi Wu ◽

Xinyue Cui ◽

Honghua Dong ◽

Fei Fang ◽

...

Keyword(s):

Reinforcement Learning ◽

Gradient Algorithm ◽

Training Environment ◽

Local Optima ◽

Continuous Action ◽

Agent Learning ◽

Policy Gradient ◽

Multi Agent ◽

Continuous Actions ◽

Computational Intractability

Despite the recent advances of deep reinforcement learning (DRL), agents trained by DRL tend to be brittle and sensitive to the training environment, especially in the multi-agent scenarios. In the multi-agent setting, a DRL agent’s policy can easily get stuck in a poor local optima w.r.t. its training partners – the learned policy may be only locally optimal to other agents’ current policies. In this paper, we focus on the problem of training robust DRL agents with continuous actions in the multi-agent learning setting so that the trained agents can still generalize when its opponents’ policies alter. To tackle this problem, we proposed a new algorithm, MiniMax Multi-agent Deep Deterministic Policy Gradient (M3DDPG) with the following contributions: (1) we introduce a minimax extension of the popular multi-agent deep deterministic policy gradient algorithm (MADDPG), for robust policy learning; (2) since the continuous action space leads to computational intractability in our minimax learning objective, we propose Multi-Agent Adversarial Learning (MAAL) to efficiently solve our proposed formulation. We empirically evaluate our M3DDPG algorithm in four mixed cooperative and competitive multi-agent environments and the agents trained by our method significantly outperforms existing baselines.

Download Full-text

A cooperative multi-agent deep reinforcement learning framework for real-time residential load scheduling

Proceedings of the International Conference on Internet of Things Design and Implementation - IoTDI '19 ◽

10.1145/3302505.3310069 ◽

2019 ◽

Cited By ~ 1

Author(s):

Chi Zhang ◽

Sanmukh R. Kuppannagari ◽

Chuanxiu Xiong ◽

Rajgopal Kannan ◽

Viktor K. Prasanna

Keyword(s):

Reinforcement Learning ◽

Real Time ◽

Load Scheduling ◽

Learning Framework ◽

Multi Agent

Download Full-text

QSOD: Hybrid Policy Gradient for Deep Multi-agent Reinforcement Learning

IEEE Access ◽

10.1109/access.2021.3113350 ◽

2021 ◽

Vol 9 ◽

pp. 129728-129741

Author(s):

Hafiz Muhammad Raza Ur Rehman ◽

Byung-Won On ◽

Devarani Devi Ningombam ◽

Sungwon Yi ◽

Gyu Sang Choi

Keyword(s):

Reinforcement Learning ◽

Policy Gradient ◽

Multi Agent

Download Full-text

DTDE: A new cooperative multi-agent reinforcement learning framework

The Innovation ◽

10.1016/j.xinn.2021.100162 ◽

2021 ◽

pp. 100162

Author(s):

Guanghui Wen ◽

Junjie Fu ◽

Pengcheng Dai ◽

Jialing Zhou

Keyword(s):

Reinforcement Learning ◽

Learning Framework ◽

Multi Agent

Download Full-text

Dynamic holding control to avoid bus bunching: A multi-agent deep reinforcement learning framework

Transportation Research Part C Emerging Technologies ◽

10.1016/j.trc.2020.102661 ◽

2020 ◽

Vol 116 ◽

pp. 102661 ◽

Cited By ~ 6

Author(s):

Jiawei Wang ◽

Lijun Sun

Keyword(s):

Reinforcement Learning ◽

Learning Framework ◽

Bus Bunching ◽

Multi Agent

Download Full-text

Reinforcement Learning under Threats

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33019939 ◽

2019 ◽

Vol 33 ◽

pp. 9939-9940 ◽

Cited By ~ 1

Author(s):

Victor Gallego ◽

Roi Naveiro ◽

David Rios Insua

Keyword(s):

Reinforcement Learning ◽

Single Agent ◽

Potential Threat ◽

Q Learning ◽

Learning Framework ◽

Opponent Modeling ◽

Theoretical Approaches ◽

New Learning ◽

Markov Decision ◽

Multi Agent

In several reinforcement learning (RL) scenarios, mainly in security settings, there may be adversaries trying to interfere with the reward generating process. However, when non-stationary environments as such are considered, Q-learning leads to suboptimal results (Busoniu, Babuska, and De Schutter 2010). Previous game-theoretical approaches to this problem have focused on modeling the whole multi-agent system as a game. Instead, we shall face the problem of prescribing decisions to a single agent (the supported decision maker, DM) against a potential threat model (the adversary). We augment the MDP to account for this threat, introducing Threatened Markov Decision Processes (TMDPs). Furthermore, we propose a level-k thinking scheme resulting in a new learning framework to deal with TMDPs. We empirically test our framework, showing the benefits of opponent modeling.

Download Full-text

Graph neural network and reinforcement learning for multi‐agent cooperative control of connected autonomous vehicles

Computer-Aided Civil and Infrastructure Engineering ◽

10.1111/mice.12702 ◽

2021 ◽

Vol 36 (7) ◽

pp. 838-857

Author(s):

Sikai Chen ◽

Jiqian Dong ◽

Paul (Young Joun) Ha ◽

Yujie Li ◽

Samuel Labi

Keyword(s):

Neural Network ◽

Reinforcement Learning ◽

Autonomous Vehicles ◽

Cooperative Control ◽

Multi Agent

Download Full-text