Maneuver Strategy Generation of UCAV for within Visual Range Air Combat Based on Multi-Agent Reinforcement Learning and Target Position Prediction

With the development of unmanned combat air vehicles (UCAVs) and artificial intelligence (AI), within visual range (WVR) air combat confrontations utilizing intelligent UCAVs are expected to be widely used in future air combats. As controlling highly dynamic and uncertain WVR air combats from the ground stations of the UCAV is not feasible, it is necessary to develop an algorithm that can generate highly intelligent air combat strategies in order to enable UCAV to independently complete air combat missions. In this paper, a 1-vs.-1 WVR air combat strategy generation algorithm is proposed using the multi-agent deep deterministic policy gradient (MADDPG). A 1-vs.-1 WVR air combat is modeled as a two-player zero-sum Markov game (ZSMG). A method for predicting the position of the target is introduced into the model in order to enable the UCAV to predict the target’s actions and position. Moreover, to ensure that the UCAV is not limited by the constraints of the basic fighter maneuver (BFM) library, the action space is considered to be a continuous one. At the same time, a potential-based reward shaping method is proposed in order to improve the efficiency of the air combat strategy generation algorithm. Finally, the efficiency of the air combat strategy generation algorithm and the intelligence level of the resulting strategy is verified through simulation experiments. The results show that an air combat strategy using target position prediction is superior to the one that does not use target position prediction.

Download Full-text

Air combat autonomous maneuver decision for one-on-one within visual range engagement base on robust multi-agent reinforcement learning

2020 IEEE 16th International Conference on Control & Automation (ICCA) ◽

10.1109/icca51439.2020.9264567 ◽

2020 ◽

Author(s):

Weiren KONG ◽

Deyun ZHOU ◽

Kai ZHANG ◽

Zhen YANG

Keyword(s):

Reinforcement Learning ◽

Air Combat ◽

Multi Agent ◽

Visual Range

Download Full-text

Multi-agent hierarchical policy gradient for Air Combat Tactics emergence via self-play

Engineering Applications of Artificial Intelligence ◽

10.1016/j.engappai.2020.104112 ◽

2021 ◽

Vol 98 ◽

pp. 104112

Author(s):

Zhixiao Sun ◽

Haiyin Piao ◽

Zhen Yang ◽

Yiyang Zhao ◽

Guang Zhan ◽

...

Keyword(s):

Air Combat ◽

Policy Gradient ◽

Multi Agent

Download Full-text

UAV Autonomous Aerial Combat Maneuver Strategy Generation with Observation Error Based on State-Adversarial Deep Deterministic Policy Gradient and Inverse Reinforcement Learning

Electronics ◽

10.3390/electronics9071121 ◽

2020 ◽

Vol 9 (7) ◽

pp. 1121 ◽

Cited By ~ 2

Author(s):

Weiren Kong ◽

Deyun Zhou ◽

Zhen Yang ◽

Yiyang Zhao ◽

Kai Zhang

Keyword(s):

Reinforcement Learning ◽

High Performance ◽

Learning Algorithm ◽

Gradient Algorithm ◽

Observation Error ◽

Inverse Reinforcement Learning ◽

Generation Algorithm ◽

Air Combat ◽

Policy Gradient ◽

Aerial Combat

With the development of unmanned aerial vehicle (UAV) and artificial intelligence (AI) technology, Intelligent UAV will be widely used in future autonomous aerial combat. Previous researches on autonomous aerial combat within visual range (WVR) have limitations due to simplifying assumptions, limited robustness, and ignoring sensor errors. In this paper, in order to consider the error of the aircraft sensors, we model the aerial combat WVR as a state-adversarial Markov decision process (SA-MDP), which introduce the small adversarial perturbations on state observations and these perturbations do not alter the environment directly, but can mislead the agent into making suboptimal decisions. Meanwhile, we propose a novel autonomous aerial combat maneuver strategy generation algorithm with high-performance and high-robustness based on state-adversarial deep deterministic policy gradient algorithm (SA-DDPG), which add a robustness regularizers related to an upper bound on performance loss at the actor-network. At the same time, a reward shaping method based on maximum entropy (MaxEnt) inverse reinforcement learning algorithm (IRL) is proposed to improve the aerial combat strategy generation algorithm’s efficiency. Finally, the efficiency of the aerial combat strategy generation algorithm and the performance and robustness of the resulting aerial combat strategy is verified by simulation experiments. Our main contributions are three-fold. First, to introduce the observation errors of UAV, we are modeling air combat as SA-MDP. Second, to make the strategy network of air combat maneuver more robust in the presence of observation errors, we introduce regularizers into the policy gradient. Third, to solve the problem that air combat’s reward function is too sparse, we use MaxEnt IRL to design a shaping reward to accelerate the convergence of SA-DDPG.

Download Full-text

Improving Maneuver Strategy in Air Combat by Alternate Freeze Games with a Deep Reinforcement Learning Algorithm

Mathematical Problems in Engineering ◽

10.1155/2020/7180639 ◽

2020 ◽

Vol 2020 ◽

pp. 1-17

Author(s):

Zhuang Wang ◽

Hui Li ◽

Haolin Wu ◽

Zhaoxin Wu

Keyword(s):

Reinforcement Learning ◽

Learning Algorithm ◽

Adaptive Strategy ◽

Simulation Software ◽

Learning Agents ◽

Combat Simulation ◽

Air Combat ◽

Reward Shaping ◽

Flight Level ◽

The One

In a one-on-one air combat game, the opponent’s maneuver strategy is usually not deterministic, which leads us to consider a variety of opponent’s strategies when designing our maneuver strategy. In this paper, an alternate freeze game framework based on deep reinforcement learning is proposed to generate the maneuver strategy in an air combat pursuit. The maneuver strategy agents for aircraft guidance of both sides are designed in a flight level with fixed velocity and the one-on-one air combat scenario. Middleware which connects the agents and air combat simulation software is developed to provide a reinforcement learning environment for agent training. A reward shaping approach is used, by which the training speed is increased, and the performance of the generated trajectory is improved. Agents are trained by alternate freeze games with a deep reinforcement algorithm to deal with nonstationarity. A league system is adopted to avoid the red queen effect in the game where both sides implement adaptive strategies. Simulation results show that the proposed approach can be applied to maneuver guidance in air combat, and typical angle fight tactics can be learnt by the deep reinforcement learning agents. For the training of an opponent with the adaptive strategy, the winning rate can reach more than 50%, and the losing rate can be reduced to less than 15%. In a competition with all opponents, the winning rate of the strategic agent selected by the league system is more than 44%, and the probability of not losing is about 75%.

Download Full-text

Autonomous maneuver strategy of swarm air combat based on DDPG

Autonomous Intelligent Systems ◽

10.1007/s43684-021-00013-z ◽

2021 ◽

Vol 1 (1) ◽

Author(s):

Luhe Wang ◽

Jinwen Hu ◽

Zhao Xu ◽

Chunhui Zhao

Keyword(s):

Decision Making ◽

Unmanned Aerial Vehicles ◽

Strategy Training ◽

Motion Model ◽

High Complexity ◽

Aerial Vehicles ◽

Air Combat ◽

Policy Gradient ◽

Online Strategy ◽

Visual Range

AbstractUnmanned aerial vehicles (UAVs) have been found significantly important in the air combats, where intelligent and swarms of UAVs will be able to tackle with the tasks of high complexity and dynamics. The key to empower the UAVs with such capability is the autonomous maneuver decision making. In this paper, an autonomous maneuver strategy of UAV swarms in beyond visual range air combat based on reinforcement learning is proposed. First, based on the process of air combat and the constraints of the swarm, the motion model of UAV and the multi-to-one air combat model are established. Second, a two-stage maneuver strategy based on air combat principles is designed which include inter-vehicle collaboration and target-vehicle confrontation. Then, a swarm air combat algorithm based on deep deterministic policy gradient strategy (DDPG) is proposed for online strategy training. Finally, the effectiveness of the proposed algorithm is validated by multi-scene simulations. The results show that the algorithm is suitable for UAV swarms of different scales.

Download Full-text

Air Combat Strategies Generation of CGF Based on MADDPG and Reward Shaping

2020 International Conference on Computer Vision, Image and Deep Learning (CVIDL) ◽

10.1109/cvidl51233.2020.000-7 ◽

2020 ◽

Author(s):

Weiren KONG ◽

Deyun ZHOU ◽

Zhen YANG

Keyword(s):

Air Combat ◽

Reward Shaping

Download Full-text

Situation assessment for beyond-visual-range air combat situation assessment based on dynamic Bayesian network

2013 25th Chinese Control and Decision Conference (CCDC) ◽

10.1109/ccdc.2013.6560993 ◽

2013 ◽

Cited By ~ 1

Author(s):

Li Fu ◽

Jianbo Liu ◽

Feihu Chang ◽

Guanglei Meng

Keyword(s):

Bayesian Network ◽

Dynamic Bayesian Network ◽

Situation Assessment ◽

Air Combat ◽

Visual Range

Download Full-text

A Fault Data Generation Algorithm Based on GAN and Policy Gradient Mechanism

10.1109/bmsb53066.2021.9547152 ◽

2021 ◽

Author(s):

Yonghua Huo ◽

Yingjun Shang ◽

Bo Xu ◽

Yuting Li ◽

Yang Yang

Keyword(s):

Data Generation ◽

Generation Algorithm ◽

Policy Gradient

Download Full-text

UAV cooperative air combat maneuver decision based on multi-agent reinforcement learning

Journal of Systems Engineering and Electronics ◽

10.23919/jsee.2021.000121 ◽

2021 ◽

Vol 32 (6) ◽

pp. 1421-1438

Author(s):

Zhang Jiandong ◽

Yang Qiming ◽

Shi Guoqing ◽

Lu Yi ◽

Wu Yong

Keyword(s):

Reinforcement Learning ◽

Air Combat ◽

Multi Agent

Download Full-text

A Distributed Approach for Accessibility and Maintainability Check With a Manikin

Volume 1: 25th Design Automation Conference ◽

10.1115/detc99/dac-8677 ◽

1999 ◽

Author(s):

Patrick Chedmail ◽

Christophe Le Roy

Keyword(s):

Virtual Reality ◽

Original Method ◽

Direct Manipulation ◽

Agent Architecture ◽

Constrained Optimisation ◽

Cluttered Environment ◽

Distributed Approach ◽

Multi Agent ◽

The One ◽

Algorithmic Approaches

Abstract The validation of the accessibility, maintainability, mounting/dismantle simulation in a cluttered environment is a key problem during the design process of a mechanical system. On the one hand research in path planning lead to automatic trajectory definition. These systems are really efficient for simple problems. On the other hand direct manipulation is possible thanks to robotic CAD systems. Another direct manipulation is possible with common virtual reality tools that allow the designer immersion in a whole mechanical environment. In such an environment the designer can handle an object in order to check its accessibility. Thanks to the use of a multi-agent architecture we greatly improve the effectiveness of virtual reality tools while coupling algorithmic approaches and direct manipulation. This original method is a solution of a multi-criteria constrained optimisation problem. Theoretical and practical aspects are presented.

Download Full-text