Intelligent Decision-Making for 3-Dimensional Dynamic Obstacle Avoidance of UAV Based on Deep Reinforcement Learning

Dynamic scheduling problems have been receiving increasing attention in recent years due to their practical implications. To realize real-time and the intelligent decision-making of dynamic scheduling, we studied dynamic permutation flowshop scheduling problem (PFSP) with new job arrival using deep reinforcement learning (DRL). A system architecture for solving dynamic PFSP using DRL is proposed, and the mathematical model to minimize total tardiness cost is established. Additionally, the intelligent scheduling system based on DRL is modeled, with state features, actions, and reward designed. Moreover, the advantage actor-critic (A2C) algorithm is adapted to train the scheduling agent. The learning curve indicates that the scheduling agent learned to generate better solutions efficiently during training. Extensive experiments are carried out to compare the A2C-based scheduling agent with every single action, other DRL algorithms, and meta-heuristics. The results show the well performance of the A2C-based scheduling agent considering solution quality, CPU times, and generalization. Notably, the trained agent generates a scheduling action only in 2.16 ms on average, which is almost instantaneous and can be used for real-time scheduling. Our work can help to build a self-learning, real-time optimizing, and intelligent decision-making scheduling system.

Download Full-text

UAV maneuvering decision -making algorithm based on Twin Delayed Deep Deterministic Policy Gradient Algorithm

Journal of Artificial Intelligence and Technology ◽

10.37965/jait.2021.12003 ◽

2021 ◽

Author(s):

Shuangxia Bai ◽

Shaomei Song ◽

Shiyang Liang ◽

Jianmei Wang ◽

Bo Li ◽

...

Keyword(s):

Decision Making ◽

Reinforcement Learning ◽

Simulation Experiment ◽

Gradient Algorithm ◽

Intelligent Decision Making ◽

Intelligent Decision ◽

Air Combat ◽

Policy Gradient ◽

Markov Decision ◽

Combat Problems

Aiming at intelligent decision-making of UAV based on situation information in air combat, a novel maneuvering decision method based on deep reinforcement learning is proposed in this paper. The autonomous maneuvering model of UAV is established by Markov Decision Process. The Twin Delayed Deep Deterministic Policy Gradient(TD3) algorithm and the Deep Deterministic Policy Gradient (DDPG) algorithm in deep reinforcement learning are used to train the model, and the experimental results of the two algorithms are analyzed and compared. The simulation experiment results show that compared with the DDPG algorithm, the TD3 algorithm has stronger decision-making performance and faster convergence speed, and is more suitable forsolving combat problems. The algorithm proposed in this paper enables UAVs to autonomously make maneuvering decisions based on situation information such as position, speed, and relative azimuth, adjust their actions to approach and successfully strike the enemy, providing a new method for UAVs to make intelligent maneuvering decisions during air combat.

Download Full-text

An intelligent decision-making method for anti-jamming communication based on deep reinforcement learning

Xibei Gongye Daxue Xuebao/Journal of Northwestern Polytechnical University ◽

10.1051/jnwpu/20213930641 ◽

2021 ◽

Vol 39 (3) ◽

pp. 641-649

Author(s):

Bailin Song ◽

Hua Xu ◽

Lei Jiang ◽

Ning Rao

Keyword(s):

Decision Making ◽

Reinforcement Learning ◽

Success Rate ◽

Intelligent Decision Making ◽

Decision Network ◽

Intelligent Decision ◽

Experience Replay ◽

Average Success Rate ◽

Convergent Algorithm ◽

Fast Decision

In order to solve the problem of intelligent anti-jamming decision-making in battlefield communication, this paper designs an intelligent decision-making method for communication anti-jamming based on deep reinforcement learning. Introducing experience replay and dynamic epsilon mechanism based on PHC under the framework of DQN algorithm, a dynamic epsilon-DQN intelligent decision-making method is proposed. The algorithm can better select the value of epsilon according to the state of the decision network and improve the convergence speed and decision success rate. During the decision-making process, the jamming signals of all communication frequencies are detected, and the results are input into the decision-making algorithm as jamming discriminant information, so that we can effectively avoid being jammed under the condition of no prior jamming information. The experimental results show that the proposed method adapts to various communication models, has a fast decision-making speed, and the average success rate of the convergent algorithm can reach more than 95%, which has a great advantage over the existing decision-making methods.

Download Full-text