Research on Air Combat Maneuver Decision-Making Method Based on Reinforcement Learning

Xianbing Zhang; Guoqing Liu; Chaojie Yang; Jiang Wu

doi:10.3390/electronics7110279

Research on Air Combat Maneuver Decision-Making Method Based on Reinforcement Learning

Electronics ◽

10.3390/electronics7110279 ◽

2018 ◽

Vol 7 (11) ◽

pp. 279 ◽

Cited By ~ 6

Author(s):

Xianbing Zhang ◽

Guoqing Liu ◽

Chaojie Yang ◽

Jiang Wu

Keyword(s):

Decision Making ◽

Reinforcement Learning ◽

Learning Algorithm ◽

Signal Design ◽

Training Environment ◽

Strategy Space ◽

Intelligent Decision Making ◽

Combat Training ◽

Network Method ◽

Air Combat

With the development of information technology, the degree of intelligence in air combat is increasing, and the demand for automated intelligent decision-making systems is becoming more intense. Based on the characteristics of over-the-horizon air combat, this paper constructs a super-horizon air combat training environment, which includes aircraft model modeling, air combat scene design, enemy aircraft strategy design, and reward and punishment signal design. In order to improve the efficiency of the reinforcement learning algorithm for the exploration of strategy space, this paper proposes a heuristic Q-Network method that integrates expert experience, and uses expert experience as a heuristic signal to guide the search process. At the same time, heuristic exploration and random exploration are combined. Aiming at the over-the-horizon air combat maneuver decision problem, the heuristic Q-Network method is adopted to train the neural network model in the over-the-horizon air combat training environment. Through continuous interaction with the environment, self-learning of the air combat maneuver strategy is realized. The efficiency of the heuristic Q-Network method and effectiveness of the air combat maneuver strategy are verified by simulation experiments.

Download Full-text

A Novel Adaptive Sampling Strategy for Deep Reinforcement Learning

International Journal of Computational Intelligence and Applications ◽

10.1142/s1469026821500115 ◽

2021 ◽

Vol 20 (02) ◽

pp. 2150011

Author(s):

Xingxing Liang ◽

Li Chen ◽

Yanghe Feng ◽

Zhong Liu ◽

Yang Ma ◽

...

Keyword(s):

Decision Making ◽

Reinforcement Learning ◽

Adaptive Sampling ◽

Learning Algorithm ◽

Sampling Strategy ◽

Sequential Decision ◽

Fixed Temperature ◽

Sample Distribution ◽

Intelligent Decision Making ◽

Experience Replay

Reinforcement learning, as an effective method to solve complex sequential decision-making problems, plays an important role in areas such as intelligent decision-making and behavioral cognition. It is well known that the sample experience replay mechanism contributes to the development of current deep reinforcement learning by reusing past samples to improve the efficiency of samples. However, the existing priority experience replay mechanism changes the sample distribution in the sample set due to the higher sampling frequency assigned to a specific transition, and it cannot be applied to actor-critic and other on-policy reinforcement learning algorithm. To address this, we propose an adaptive factor based on TD-error, which further increases sample utilization by giving more attention weight to samples of larger TD-error, and embeds it flexibly into the original Deep Q Network and Advantage Actor-Critic algorithm to improve their performance. Then we carried out the performance evaluation for the proposed architecture in the context of CartPole-V1 and 6 environments of Atari game experiments, respectively, and the obtained results either on the conditions of fixed temperature or annealing temperature, when compared to those produced by the vanilla DQN and original A2C, highlight the advantages in cumulative rewards and climb speed of the improved algorithms.

Download Full-text

UAV maneuvering decision -making algorithm based on Twin Delayed Deep Deterministic Policy Gradient Algorithm

Journal of Artificial Intelligence and Technology ◽

10.37965/jait.2021.12003 ◽

2021 ◽

Author(s):

Shuangxia Bai ◽

Shaomei Song ◽

Shiyang Liang ◽

Jianmei Wang ◽

Bo Li ◽

...

Keyword(s):

Decision Making ◽

Reinforcement Learning ◽

Simulation Experiment ◽

Gradient Algorithm ◽

Intelligent Decision Making ◽

Intelligent Decision ◽

Air Combat ◽

Policy Gradient ◽

Markov Decision ◽

Combat Problems

Aiming at intelligent decision-making of UAV based on situation information in air combat, a novel maneuvering decision method based on deep reinforcement learning is proposed in this paper. The autonomous maneuvering model of UAV is established by Markov Decision Process. The Twin Delayed Deep Deterministic Policy Gradient(TD3) algorithm and the Deep Deterministic Policy Gradient (DDPG) algorithm in deep reinforcement learning are used to train the model, and the experimental results of the two algorithms are analyzed and compared. The simulation experiment results show that compared with the DDPG algorithm, the TD3 algorithm has stronger decision-making performance and faster convergence speed, and is more suitable forsolving combat problems. The algorithm proposed in this paper enables UAVs to autonomously make maneuvering decisions based on situation information such as position, speed, and relative azimuth, adjust their actions to approach and successfully strike the enemy, providing a new method for UAVs to make intelligent maneuvering decisions during air combat.

Download Full-text

Improved Q-Learning Algorithm Based on Approximate State Matching in Agricultural Plant Protection Environment

Entropy ◽

10.3390/e23060737 ◽

2021 ◽

Vol 23 (6) ◽

pp. 737

Author(s):

Fengjie Sun ◽

Xianchang Wang ◽

Rui Zhang

Keyword(s):

Decision Making ◽

Reinforcement Learning ◽

Optimal Policy ◽

Feasible Solution ◽

Learning Algorithm ◽

Plant Protection ◽

Agricultural Plant ◽

Q Learning ◽

Aerial Vehicle ◽

Optimal Action

An Unmanned Aerial Vehicle (UAV) can greatly reduce manpower in the agricultural plant protection such as watering, sowing, and pesticide spraying. It is essential to develop a Decision-making Support System (DSS) for UAVs to help them choose the correct action in states according to the policy. In an unknown environment, the method of formulating rules for UAVs to help them choose actions is not applicable, and it is a feasible solution to obtain the optimal policy through reinforcement learning. However, experiments show that the existing reinforcement learning algorithms cannot get the optimal policy for a UAV in the agricultural plant protection environment. In this work we propose an improved Q-learning algorithm based on similar state matching, and we prove theoretically that there has a greater probability for UAV choosing the optimal action according to the policy learned by the algorithm we proposed than the classic Q-learning algorithm in the agricultural plant protection environment. This proposed algorithm is implemented and tested on datasets that are evenly distributed based on real UAV parameters and real farm information. The performance evaluation of the algorithm is discussed in detail. Experimental results show that the algorithm we proposed can efficiently learn the optimal policy for UAVs in the agricultural plant protection environment.

Download Full-text

Intelligent Decision-Making of MAV/UAV in Air Combat Based on DDPG Algorithm

10.1007/978-981-15-8155-7_403 ◽

2021 ◽

pp. 4881-4891

Author(s):

Yue Li ◽

Wei Han ◽

Weiguo Zhong ◽

Jiazheng Ji ◽

Wanhui Mu

Keyword(s):

Decision Making ◽

Intelligent Decision Making ◽

Intelligent Decision ◽

Air Combat

Download Full-text

Learning algorithm for an intelligent decision making system based on multi-agent neurocognitive architectures

Cognitive Systems Research ◽

10.1016/j.cogsys.2020.10.015 ◽

2021 ◽

Vol 66 ◽

pp. 82-88

Author(s):

Zalimhan Nagoev ◽

Inna Pshenokova ◽

Olga Nagoeva ◽

Zaurbek Sundukov

Keyword(s):

Decision Making ◽

Learning Algorithm ◽

Intelligent Decision Making ◽

Intelligent Decision ◽

Decision Making System ◽

Multi Agent

Download Full-text

Intelligent Decision-Making for 3-Dimensional Dynamic Obstacle Avoidance of UAV Based on Deep Reinforcement Learning

2019 11th International Conference on Wireless Communications and Signal Processing (WCSP) ◽

10.1109/wcsp.2019.8928110 ◽

2019 ◽

Author(s):

Xiao Han ◽

Jing Wang ◽

Jiayin Xue ◽

Qinyu Zhang

Keyword(s):

Decision Making ◽

Reinforcement Learning ◽

Obstacle Avoidance ◽

3 Dimensional ◽

Intelligent Decision Making ◽

Intelligent Decision ◽

Dynamic Obstacle Avoidance ◽

Dynamic Obstacle

Download Full-text

Driver-like decision-making method for vehicle longitudinal autonomous driving based on deep reinforcement learning

Proceedings of the Institution of Mechanical Engineers Part D Journal of Automobile Engineering ◽

10.1177/09544070211063081 ◽

2021 ◽

pp. 095440702110630

Author(s):

Zhenhai Gao ◽

Xiangtong Yan ◽

Fei Gao ◽

Lei He

Keyword(s):

Decision Making ◽

Reinforcement Learning ◽

Learning Algorithm ◽

Autonomous Driving ◽

Decision Strategies ◽

Reward Function ◽

Human Driver ◽

Reward Functions ◽

A Current ◽

Better Than

Decision-making is one of the key parts of the research on vehicle longitudinal autonomous driving. Considering the behavior of human drivers when designing autonomous driving decision-making strategies is a current research hotspot. In longitudinal autonomous driving decision-making strategies, traditional rule-based decision-making strategies are difficult to apply to complex scenarios. Current decision-making methods that use reinforcement learning and deep reinforcement learning construct reward functions designed with safety, comfort, and economy. Compared with human drivers, the obtained decision strategies still have big gaps. Focusing on the above problems, this paper uses the driver’s behavior data to design the reward function of the deep reinforcement learning algorithm through BP neural network fitting, and uses the deep reinforcement learning DQN algorithm and the DDPG algorithm to establish two driver-like longitudinal autonomous driving decision-making models. The simulation experiment compares the decision-making effect of the two models with the driver curve. The results shows that the two algorithms can realize driver-like decision-making, and the consistency of the DDPG algorithm and human driver behavior is higher than that of the DQN algorithm, the effect of the DDPG algorithm is better than the DQN algorithm.

Download Full-text

Risk Perception Oriented Autonomous Ship Navigation in AIS Environment

Volume 1: Offshore Technology ◽

10.1115/omae2020-18003 ◽

2020 ◽

Author(s):

Ruolan Zhang ◽

Masao Furusho

Keyword(s):

Decision Making ◽

Reinforcement Learning ◽

Risk Perception ◽

Adaptive Systems ◽

Learning Algorithm ◽

Automatic Identification ◽

Identification System ◽

Rule Based ◽

Ship Navigation ◽

Frame Motion

Abstract Due to the quality and error of the data itself, historical automatic identification system (AIS) data was insufficient used to predict navigation risk at sea, but it adequately used to train decision-making neural networks. This paper presents a real AIS ship navigation environment with a rule-based and a neural-based decision processes with frame motion and training the decision network using a deep reinforcement learning algorithm. Rule-based decision-making has several applications in the field of adaptive systems, expert systems, and decision support systems, it also including general ship navigation which regulated by the convention on the international regulations for preventing collisions at sea (COLREGs). However, if someone intend to achieve full unmanned ship navigation without any remote control at the open sea, a rule-based decision-making system cannot be implemented alone. With the growing amount of data, complex sea environment, different collision scenarios, the agent-based decision has become an important role in transportation. For ships, combined rule-based and neural-based decision-making is the only option. It has become progressively challenging to satisfy autonomous decision-making development requirements. This study uses deep reinforcement learning to evaluate the performance of decision-making efficiency under different AIS data input shapes. The results show that the decision neural network trained with AIS data has good robustness and a high ability to achieve collision avoidance. Furthermore, using the same methodology, include instructive guidance for processing radar, camera, ENC, etc., respond to different risk perception tasks in different scenarios. It has important implications for fully unmanned navigation.

Download Full-text

Deep reinforcement learning based intelligent decision making for multi-player sequential game with uncertain irrational players (Conference Presentation)

Sensors and Systems for Space Applications XIII ◽

10.1117/12.2556224 ◽

2020 ◽

Author(s):

Zejian Zhou ◽

Hao Xu

Keyword(s):

Decision Making ◽

Reinforcement Learning ◽

Sequential Game ◽

Intelligent Decision Making ◽

Intelligent Decision

Download Full-text

Research on decision-making strategy of soccer robot based on multi-agent reinforcement learning

International Journal of Advanced Robotic Systems ◽

10.1177/1729881420916960 ◽

2020 ◽

Vol 17 (3) ◽

pp. 172988142091696

Author(s):

Xiaoli Liu

Keyword(s):

Decision Making ◽

Reinforcement Learning ◽

Learning Algorithm ◽

Division Of Labour ◽

Learning System ◽

Selection Strategy ◽

Multi Agent System ◽

Soccer Robot ◽

Agent System ◽

Multi Agent

This article studies a multi-agent reinforcement learning algorithm based on agent action prediction. In multi-agent system, the action of learning agent selection is inevitably affected by the action of other agents, so the reinforcement learning system needs to consider the joint state and joint action of multi-agent based on this. In addition, the application of this method in the cooperative strategy learning of soccer robot is studied, so that the multi-agent system can pass through the environment. To realize the division of labour and cooperation of multi-robots, the interactive learning is used to master the behaviour strategy. Combined with the characteristics of decision-making of soccer robot, this article analyses the role transformation and experience sharing of multi-agent reinforcement learning, and applies it to the local attack strategy of soccer robot, uses this algorithm to learn the action selection strategy of the main robot in the team, and uses Matlab platform for simulation verification. The experimental results prove the effectiveness of the research method, and the superiority of the proposed method is validated compared with some simple methods.

Download Full-text