UAV maneuvering decision -making algorithm based on Twin Delayed Deep Deterministic Policy Gradient Algorithm

Aiming at intelligent decision-making of UAV based on situation information in air combat, a novel maneuvering decision method based on deep reinforcement learning is proposed in this paper. The autonomous maneuvering model of UAV is established by Markov Decision Process. The Twin Delayed Deep Deterministic Policy Gradient(TD3) algorithm and the Deep Deterministic Policy Gradient (DDPG) algorithm in deep reinforcement learning are used to train the model, and the experimental results of the two algorithms are analyzed and compared. The simulation experiment results show that compared with the DDPG algorithm, the TD3 algorithm has stronger decision-making performance and faster convergence speed, and is more suitable forsolving combat problems. The algorithm proposed in this paper enables UAVs to autonomously make maneuvering decisions based on situation information such as position, speed, and relative azimuth, adjust their actions to approach and successfully strike the enemy, providing a new method for UAVs to make intelligent maneuvering decisions during air combat.

Download Full-text

Intelligent decision making for overtaking maneuver using mixed observable Markov decision process

Journal of Intelligent Transportation Systems ◽

10.1080/15472450.2017.1334558 ◽

2017 ◽

Vol 22 (3) ◽

pp. 201-217 ◽

Cited By ~ 5

Author(s):

Volkan Sezer

Keyword(s):

Decision Making ◽

Markov Decision Process ◽

Decision Process ◽

Intelligent Decision Making ◽

Intelligent Decision ◽

Markov Decision

Download Full-text

Intelligent Decision-Making of MAV/UAV in Air Combat Based on DDPG Algorithm

10.1007/978-981-15-8155-7_403 ◽

2021 ◽

pp. 4881-4891

Author(s):

Yue Li ◽

Wei Han ◽

Weiguo Zhong ◽

Jiazheng Ji ◽

Wanhui Mu

Keyword(s):

Decision Making ◽

Intelligent Decision Making ◽

Intelligent Decision ◽

Air Combat

Download Full-text

Research on Air Combat Maneuver Decision-Making Method Based on Reinforcement Learning

Electronics ◽

10.3390/electronics7110279 ◽

2018 ◽

Vol 7 (11) ◽

pp. 279 ◽

Cited By ~ 6

Author(s):

Xianbing Zhang ◽

Guoqing Liu ◽

Chaojie Yang ◽

Jiang Wu

Keyword(s):

Decision Making ◽

Reinforcement Learning ◽

Learning Algorithm ◽

Signal Design ◽

Training Environment ◽

Strategy Space ◽

Intelligent Decision Making ◽

Combat Training ◽

Network Method ◽

Air Combat

With the development of information technology, the degree of intelligence in air combat is increasing, and the demand for automated intelligent decision-making systems is becoming more intense. Based on the characteristics of over-the-horizon air combat, this paper constructs a super-horizon air combat training environment, which includes aircraft model modeling, air combat scene design, enemy aircraft strategy design, and reward and punishment signal design. In order to improve the efficiency of the reinforcement learning algorithm for the exploration of strategy space, this paper proposes a heuristic Q-Network method that integrates expert experience, and uses expert experience as a heuristic signal to guide the search process. At the same time, heuristic exploration and random exploration are combined. Aiming at the over-the-horizon air combat maneuver decision problem, the heuristic Q-Network method is adopted to train the neural network model in the over-the-horizon air combat training environment. Through continuous interaction with the environment, self-learning of the air combat maneuver strategy is realized. The efficiency of the heuristic Q-Network method and effectiveness of the air combat maneuver strategy are verified by simulation experiments.

Download Full-text

Intelligent Decision-Making for 3-Dimensional Dynamic Obstacle Avoidance of UAV Based on Deep Reinforcement Learning

2019 11th International Conference on Wireless Communications and Signal Processing (WCSP) ◽

10.1109/wcsp.2019.8928110 ◽

2019 ◽

Author(s):

Xiao Han ◽

Jing Wang ◽

Jiayin Xue ◽

Qinyu Zhang

Keyword(s):

Decision Making ◽

Reinforcement Learning ◽

Obstacle Avoidance ◽

3 Dimensional ◽

Intelligent Decision Making ◽

Intelligent Decision ◽

Dynamic Obstacle Avoidance ◽

Dynamic Obstacle

Download Full-text

Deep reinforcement learning based intelligent decision making for multi-player sequential game with uncertain irrational players (Conference Presentation)

Sensors and Systems for Space Applications XIII ◽

10.1117/12.2556224 ◽

2020 ◽

Author(s):

Zejian Zhou ◽

Hao Xu

Keyword(s):

Decision Making ◽

Reinforcement Learning ◽

Sequential Game ◽

Intelligent Decision Making ◽

Intelligent Decision

Download Full-text

UAV Autonomous Aerial Combat Maneuver Strategy Generation with Observation Error Based on State-Adversarial Deep Deterministic Policy Gradient and Inverse Reinforcement Learning

Electronics ◽

10.3390/electronics9071121 ◽

2020 ◽

Vol 9 (7) ◽

pp. 1121 ◽

Cited By ~ 2

Author(s):

Weiren Kong ◽

Deyun Zhou ◽

Zhen Yang ◽

Yiyang Zhao ◽

Kai Zhang

Keyword(s):

Reinforcement Learning ◽

High Performance ◽

Learning Algorithm ◽

Gradient Algorithm ◽

Observation Error ◽

Inverse Reinforcement Learning ◽

Generation Algorithm ◽

Air Combat ◽

Policy Gradient ◽

Aerial Combat

With the development of unmanned aerial vehicle (UAV) and artificial intelligence (AI) technology, Intelligent UAV will be widely used in future autonomous aerial combat. Previous researches on autonomous aerial combat within visual range (WVR) have limitations due to simplifying assumptions, limited robustness, and ignoring sensor errors. In this paper, in order to consider the error of the aircraft sensors, we model the aerial combat WVR as a state-adversarial Markov decision process (SA-MDP), which introduce the small adversarial perturbations on state observations and these perturbations do not alter the environment directly, but can mislead the agent into making suboptimal decisions. Meanwhile, we propose a novel autonomous aerial combat maneuver strategy generation algorithm with high-performance and high-robustness based on state-adversarial deep deterministic policy gradient algorithm (SA-DDPG), which add a robustness regularizers related to an upper bound on performance loss at the actor-network. At the same time, a reward shaping method based on maximum entropy (MaxEnt) inverse reinforcement learning algorithm (IRL) is proposed to improve the aerial combat strategy generation algorithm’s efficiency. Finally, the efficiency of the aerial combat strategy generation algorithm and the performance and robustness of the resulting aerial combat strategy is verified by simulation experiments. Our main contributions are three-fold. First, to introduce the observation errors of UAV, we are modeling air combat as SA-MDP. Second, to make the strategy network of air combat maneuver more robust in the presence of observation errors, we introduce regularizers into the policy gradient. Third, to solve the problem that air combat’s reward function is too sparse, we use MaxEnt IRL to design a shaping reward to accelerate the convergence of SA-DDPG.

Download Full-text

Deep Reinforcement Learning Based Intelligent Decision Making for Two-player Sequential Game with Uncertain Irrational Player

2019 IEEE Symposium Series on Computational Intelligence (SSCI) ◽

10.1109/ssci44817.2019.9002811 ◽

2019 ◽

Author(s):

Zejian Zhou ◽

Hao Xu

Keyword(s):

Decision Making ◽

Reinforcement Learning ◽

Sequential Game ◽

Intelligent Decision Making ◽

Intelligent Decision

Download Full-text

Intelligent Decision-Making of Scheduling for Dynamic Permutation Flowshop via Deep Reinforcement Learning

Sensors ◽

10.3390/s21031019 ◽

2021 ◽

Vol 21 (3) ◽

pp. 1019

Author(s):

Shengluo Yang ◽

Zhigang Xu ◽

Junyi Wang

Keyword(s):

Decision Making ◽

Reinforcement Learning ◽

Real Time ◽

Dynamic Scheduling ◽

Scheduling Problems ◽

Permutation Flowshop ◽

Scheduling System ◽

Single Action ◽

Intelligent Decision Making ◽

Intelligent Decision

Dynamic scheduling problems have been receiving increasing attention in recent years due to their practical implications. To realize real-time and the intelligent decision-making of dynamic scheduling, we studied dynamic permutation flowshop scheduling problem (PFSP) with new job arrival using deep reinforcement learning (DRL). A system architecture for solving dynamic PFSP using DRL is proposed, and the mathematical model to minimize total tardiness cost is established. Additionally, the intelligent scheduling system based on DRL is modeled, with state features, actions, and reward designed. Moreover, the advantage actor-critic (A2C) algorithm is adapted to train the scheduling agent. The learning curve indicates that the scheduling agent learned to generate better solutions efficiently during training. Extensive experiments are carried out to compare the A2C-based scheduling agent with every single action, other DRL algorithms, and meta-heuristics. The results show the well performance of the A2C-based scheduling agent considering solution quality, CPU times, and generalization. Notably, the trained agent generates a scheduling action only in 2.16 ms on average, which is almost instantaneous and can be used for real-time scheduling. Our work can help to build a self-learning, real-time optimizing, and intelligent decision-making scheduling system.

Download Full-text

An intelligent decision-making method for anti-jamming communication based on deep reinforcement learning

Xibei Gongye Daxue Xuebao/Journal of Northwestern Polytechnical University ◽

10.1051/jnwpu/20213930641 ◽

2021 ◽

Vol 39 (3) ◽

pp. 641-649

Author(s):

Bailin Song ◽

Hua Xu ◽

Lei Jiang ◽

Ning Rao

Keyword(s):

Decision Making ◽

Reinforcement Learning ◽

Success Rate ◽

Intelligent Decision Making ◽

Decision Network ◽

Intelligent Decision ◽

Experience Replay ◽

Average Success Rate ◽

Convergent Algorithm ◽

Fast Decision

In order to solve the problem of intelligent anti-jamming decision-making in battlefield communication, this paper designs an intelligent decision-making method for communication anti-jamming based on deep reinforcement learning. Introducing experience replay and dynamic epsilon mechanism based on PHC under the framework of DQN algorithm, a dynamic epsilon-DQN intelligent decision-making method is proposed. The algorithm can better select the value of epsilon according to the state of the decision network and improve the convergence speed and decision success rate. During the decision-making process, the jamming signals of all communication frequencies are detected, and the results are input into the decision-making algorithm as jamming discriminant information, so that we can effectively avoid being jammed under the condition of no prior jamming information. The experimental results show that the proposed method adapts to various communication models, has a fast decision-making speed, and the average success rate of the convergent algorithm can reach more than 95%, which has a great advantage over the existing decision-making methods.

Download Full-text