scholarly journals Maneuver Strategy Generation of UCAV for within Visual Range Air Combat Based on Multi-Agent Reinforcement Learning and Target Position Prediction

2020 ◽  
Vol 10 (15) ◽  
pp. 5198
Author(s):  
Weiren Kong ◽  
Deyun Zhou ◽  
Zhen Yang ◽  
Kai Zhang ◽  
Lina Zeng

With the development of unmanned combat air vehicles (UCAVs) and artificial intelligence (AI), within visual range (WVR) air combat confrontations utilizing intelligent UCAVs are expected to be widely used in future air combats. As controlling highly dynamic and uncertain WVR air combats from the ground stations of the UCAV is not feasible, it is necessary to develop an algorithm that can generate highly intelligent air combat strategies in order to enable UCAV to independently complete air combat missions. In this paper, a 1-vs.-1 WVR air combat strategy generation algorithm is proposed using the multi-agent deep deterministic policy gradient (MADDPG). A 1-vs.-1 WVR air combat is modeled as a two-player zero-sum Markov game (ZSMG). A method for predicting the position of the target is introduced into the model in order to enable the UCAV to predict the target’s actions and position. Moreover, to ensure that the UCAV is not limited by the constraints of the basic fighter maneuver (BFM) library, the action space is considered to be a continuous one. At the same time, a potential-based reward shaping method is proposed in order to improve the efficiency of the air combat strategy generation algorithm. Finally, the efficiency of the air combat strategy generation algorithm and the intelligence level of the resulting strategy is verified through simulation experiments. The results show that an air combat strategy using target position prediction is superior to the one that does not use target position prediction.

2021 ◽  
Vol 98 ◽  
pp. 104112
Author(s):  
Zhixiao Sun ◽  
Haiyin Piao ◽  
Zhen Yang ◽  
Yiyang Zhao ◽  
Guang Zhan ◽  
...  

Electronics ◽  
2020 ◽  
Vol 9 (7) ◽  
pp. 1121 ◽  
Author(s):  
Weiren Kong ◽  
Deyun Zhou ◽  
Zhen Yang ◽  
Yiyang Zhao ◽  
Kai Zhang

With the development of unmanned aerial vehicle (UAV) and artificial intelligence (AI) technology, Intelligent UAV will be widely used in future autonomous aerial combat. Previous researches on autonomous aerial combat within visual range (WVR) have limitations due to simplifying assumptions, limited robustness, and ignoring sensor errors. In this paper, in order to consider the error of the aircraft sensors, we model the aerial combat WVR as a state-adversarial Markov decision process (SA-MDP), which introduce the small adversarial perturbations on state observations and these perturbations do not alter the environment directly, but can mislead the agent into making suboptimal decisions. Meanwhile, we propose a novel autonomous aerial combat maneuver strategy generation algorithm with high-performance and high-robustness based on state-adversarial deep deterministic policy gradient algorithm (SA-DDPG), which add a robustness regularizers related to an upper bound on performance loss at the actor-network. At the same time, a reward shaping method based on maximum entropy (MaxEnt) inverse reinforcement learning algorithm (IRL) is proposed to improve the aerial combat strategy generation algorithm’s efficiency. Finally, the efficiency of the aerial combat strategy generation algorithm and the performance and robustness of the resulting aerial combat strategy is verified by simulation experiments. Our main contributions are three-fold. First, to introduce the observation errors of UAV, we are modeling air combat as SA-MDP. Second, to make the strategy network of air combat maneuver more robust in the presence of observation errors, we introduce regularizers into the policy gradient. Third, to solve the problem that air combat’s reward function is too sparse, we use MaxEnt IRL to design a shaping reward to accelerate the convergence of SA-DDPG.


2020 ◽  
Vol 2020 ◽  
pp. 1-17
Author(s):  
Zhuang Wang ◽  
Hui Li ◽  
Haolin Wu ◽  
Zhaoxin Wu

In a one-on-one air combat game, the opponent’s maneuver strategy is usually not deterministic, which leads us to consider a variety of opponent’s strategies when designing our maneuver strategy. In this paper, an alternate freeze game framework based on deep reinforcement learning is proposed to generate the maneuver strategy in an air combat pursuit. The maneuver strategy agents for aircraft guidance of both sides are designed in a flight level with fixed velocity and the one-on-one air combat scenario. Middleware which connects the agents and air combat simulation software is developed to provide a reinforcement learning environment for agent training. A reward shaping approach is used, by which the training speed is increased, and the performance of the generated trajectory is improved. Agents are trained by alternate freeze games with a deep reinforcement algorithm to deal with nonstationarity. A league system is adopted to avoid the red queen effect in the game where both sides implement adaptive strategies. Simulation results show that the proposed approach can be applied to maneuver guidance in air combat, and typical angle fight tactics can be learnt by the deep reinforcement learning agents. For the training of an opponent with the adaptive strategy, the winning rate can reach more than 50%, and the losing rate can be reduced to less than 15%. In a competition with all opponents, the winning rate of the strategic agent selected by the league system is more than 44%, and the probability of not losing is about 75%.


2021 ◽  
Vol 1 (1) ◽  
Author(s):  
Luhe Wang ◽  
Jinwen Hu ◽  
Zhao Xu ◽  
Chunhui Zhao

AbstractUnmanned aerial vehicles (UAVs) have been found significantly important in the air combats, where intelligent and swarms of UAVs will be able to tackle with the tasks of high complexity and dynamics. The key to empower the UAVs with such capability is the autonomous maneuver decision making. In this paper, an autonomous maneuver strategy of UAV swarms in beyond visual range air combat based on reinforcement learning is proposed. First, based on the process of air combat and the constraints of the swarm, the motion model of UAV and the multi-to-one air combat model are established. Second, a two-stage maneuver strategy based on air combat principles is designed which include inter-vehicle collaboration and target-vehicle confrontation. Then, a swarm air combat algorithm based on deep deterministic policy gradient strategy (DDPG) is proposed for online strategy training. Finally, the effectiveness of the proposed algorithm is validated by multi-scene simulations. The results show that the algorithm is suitable for UAV swarms of different scales.


2021 ◽  
Author(s):  
Yonghua Huo ◽  
Yingjun Shang ◽  
Bo Xu ◽  
Yuting Li ◽  
Yang Yang

2021 ◽  
Vol 32 (6) ◽  
pp. 1421-1438
Author(s):  
Zhang Jiandong ◽  
Yang Qiming ◽  
Shi Guoqing ◽  
Lu Yi ◽  
Wu Yong

Author(s):  
Patrick Chedmail ◽  
Christophe Le Roy

Abstract The validation of the accessibility, maintainability, mounting/dismantle simulation in a cluttered environment is a key problem during the design process of a mechanical system. On the one hand research in path planning lead to automatic trajectory definition. These systems are really efficient for simple problems. On the other hand direct manipulation is possible thanks to robotic CAD systems. Another direct manipulation is possible with common virtual reality tools that allow the designer immersion in a whole mechanical environment. In such an environment the designer can handle an object in order to check its accessibility. Thanks to the use of a multi-agent architecture we greatly improve the effectiveness of virtual reality tools while coupling algorithmic approaches and direct manipulation. This original method is a solution of a multi-criteria constrained optimisation problem. Theoretical and practical aspects are presented.


Sign in / Sign up

Export Citation Format

Share Document