Heuristic Q-learning based on experience replay for three-dimensional path planning of the unmanned aerial vehicle

In order to solve the problem that the existing reinforcement learning algorithm is difficult to converge due to the excessive state space of the three-dimensional path planning of the unmanned aerial vehicle, this article proposes a reinforcement learning algorithm based on the heuristic function and the maximum average reward value of the experience replay mechanism. The knowledge of track performance is introduced to construct heuristic function to guide the unmanned aerial vehicles’ action selection and reduce the useless exploration. Experience replay mechanism based on maximum average reward increases the utilization rate of excellent samples and the convergence speed of the algorithm. The simulation results show that the proposed three-dimensional path planning algorithm has good learning efficiency, and the convergence speed and training performance are significantly improved.

Download Full-text

Three-Dimensional Path Planning of Constant Thrust Unmanned Aerial Vehicle Based on Artificial Fluid Method

Discrete Dynamics in Nature and Society ◽

10.1155/2020/4269193 ◽

2020 ◽

Vol 2020 ◽

pp. 1-13

Author(s):

Yongqiang Qi ◽

Shuai Li ◽

Yi Ke

Keyword(s):

Path Planning ◽

Unmanned Aerial Vehicle ◽

Interpolation Method ◽

Three Dimensional ◽

Planning Problem ◽

Perturbation Matrix ◽

Fluid Field ◽

Control Scheme ◽

Aerial Vehicle ◽

Tangential Vector

In this paper, a three-dimensional path planning problem of an unmanned aerial vehicle under constant thrust is studied based on the artificial fluid method. The effect of obstacles on the original fluid field is quantified by the perturbation matrix, the streamlines can be regarded as the planned path for the unmanned aerial vehicle, and the tangential vector and the disturbance matrix of the artificial fluid method are improved. In particular, this paper addresses a novel algorithm of constant thrust fitting which is proposed through the impulse compensation, and then the constant thrust switching control scheme based on the isochronous interpolation method is given. It is proved that the planned path can avoid all obstacles smoothly and swiftly and reach the destination eventually. Simulation results demonstrate the effectiveness of this method.

Download Full-text

Three-dimensional path planning for unmanned aerial vehicle based on interfered fluid dynamical system

Chinese Journal of Aeronautics ◽

10.1016/j.cja.2014.12.031 ◽

2015 ◽

Vol 28 (1) ◽

pp. 229-239 ◽

Cited By ~ 35

Author(s):

Honglun Wang ◽

Wentao Lyu ◽

Peng Yao ◽

Xiao Liang ◽

Chang Liu

Keyword(s):

Dynamical System ◽

Path Planning ◽

Unmanned Aerial Vehicle ◽

Three Dimensional ◽

Aerial Vehicle

Download Full-text

Three-dimensional path planning of unmanned aerial vehicle under complicated environment

2016 Chinese Control and Decision Conference (CCDC) ◽

10.1109/ccdc.2016.7532146 ◽

2016 ◽

Cited By ~ 1

Author(s):

Jihua Tao ◽

Yinqiu Wang ◽

Huanhuan Yang ◽

Li Gao

Keyword(s):

Path Planning ◽

Unmanned Aerial Vehicle ◽

Three Dimensional ◽

Aerial Vehicle

Download Full-text

Unmanned Aerial Vehicle Path Planning Algorithm Based on Deep Reinforcement Learning in Large-Scale and Dynamic Environments

IEEE Access ◽

10.1109/access.2021.3057485 ◽

2021 ◽

Vol 9 ◽

pp. 24884-24900

Author(s):

Ronglei Xie ◽

Zhijun Meng ◽

Lifeng Wang ◽

Haochen Li ◽

Kaipeng Wang ◽

...

Keyword(s):

Reinforcement Learning ◽

Path Planning ◽

Unmanned Aerial Vehicle ◽

Large Scale ◽

Dynamic Environments ◽

Planning Algorithm ◽

Aerial Vehicle ◽

Vehicle Path ◽

Path Planning Algorithm

Download Full-text

Improved Path Planning for Indoor Patrol Robot Based on Deep Reinforcement Learning

Symmetry ◽

10.3390/sym14010132 ◽

2022 ◽

Vol 14 (1) ◽

pp. 132

Author(s):

Jianfeng Zheng ◽

Shuren Mao ◽

Zhenyu Wu ◽

Pengcheng Kong ◽

Hao Qiang

Keyword(s):

Reinforcement Learning ◽

Path Planning ◽

Loss Function ◽

Learning Algorithm ◽

Target Position ◽

Convergence Speed ◽

Position Information ◽

Image Information ◽

Navigation Task ◽

Reinforcement Learning Algorithm

To solve the problems of poor exploration ability and convergence speed of traditional deep reinforcement learning in the navigation task of the patrol robot under indoor specified routes, an improved deep reinforcement learning algorithm based on Pan/Tilt/Zoom(PTZ) image information was proposed in this paper. The obtained symmetric image information and target position information are taken as the input of the network, the speed of the robot is taken as the output of the next action, and the circular route with boundary is taken as the test. The improved reward and punishment function is designed to improve the convergence speed of the algorithm and optimize the path so that the robot can plan a safer path while avoiding obstacles first. Compared with Deep Q Network(DQN) algorithm, the convergence speed after improvement is shortened by about 40%, and the loss function is more stable.

Download Full-text

Three-dimensional unmanned aerial vehicle path planning using modified wolf pack search algorithm

Neurocomputing ◽

10.1016/j.neucom.2017.05.059 ◽

2017 ◽

Vol 266 ◽

pp. 445-457 ◽

Cited By ~ 34

Author(s):

Chen YongBo ◽

Mei YueSong ◽

Yu JianQiao ◽

Su XiaoLong ◽

Xu Nuo

Keyword(s):

Path Planning ◽

Unmanned Aerial Vehicle ◽

Search Algorithm ◽

Three Dimensional ◽

Aerial Vehicle ◽

Vehicle Path

Download Full-text

A Survey of Three-Dimensional Flight Path Planning for Unmanned Aerial Vehicle

2019 Chinese Control And Decision Conference (CCDC) ◽

10.1109/ccdc.2019.8832890 ◽

2019 ◽

Author(s):

Baoye Song ◽

Gaoru Qi ◽

Lin Xu

Keyword(s):

Path Planning ◽

Unmanned Aerial Vehicle ◽

Three Dimensional ◽

Flight Path ◽

Aerial Vehicle

Download Full-text

Three dimensional unmanned aerial vehicle path planning by a continuous optimization method

2015 34th Chinese Control Conference (CCC) ◽

10.1109/chicc.2015.7260005 ◽

2015 ◽

Author(s):

Wang Yin ◽

Zeng Xingxing

Keyword(s):

Path Planning ◽

Unmanned Aerial Vehicle ◽

Three Dimensional ◽

Continuous Optimization ◽

Optimization Method ◽

Aerial Vehicle ◽

Vehicle Path

Download Full-text

Deep Reinforcement Learning for Computation Offloading and Resource Allocation in Unmanned-Aerial-Vehicle Assisted Edge Computing

Sensors ◽

10.3390/s21196499 ◽

2021 ◽

Vol 21 (19) ◽

pp. 6499

Author(s):

Shuyang Li ◽

Xiaohui Hu ◽

Yongwen Du

Keyword(s):

Resource Allocation ◽

Reinforcement Learning ◽

Energy Consumption ◽

Unmanned Aerial Vehicle ◽

Learning Algorithm ◽

End Users ◽

Edge Computing ◽

Computation Offloading ◽

Multiple Users ◽

Aerial Vehicle

Computation offloading technology extends cloud computing to the edge of the access network close to users, bringing many benefits to terminal devices with limited battery and computational resources. Nevertheless, the existing computation offloading approaches are challenging to apply to specific scenarios, such as the dense distribution of end-users and the sparse distribution of network infrastructure. The technological revolution in the unmanned aerial vehicle (UAV) and chip industry has granted UAVs more computing resources and promoted the emergence of UAV-assisted mobile edge computing (MEC) technology, which could be applied to those scenarios. However, in the MEC system with multiple users and multiple servers, making reasonable offloading decisions and allocating system resources is still a severe challenge. This paper studies the offloading decision and resource allocation problem in the UAV-assisted MEC environment with multiple users and servers. To ensure the quality of service for end-users, we set the weighted total cost of delay, energy consumption, and the size of discarded tasks as our optimization objective. We further formulate the joint optimization problem as a Markov decision process and apply the soft actor–critic (SAC) deep reinforcement learning algorithm to optimize the offloading policy. Numerical simulation results show that the offloading policy optimized by our proposed SAC-based dynamic computing offloading (SACDCO) algorithm effectively reduces the delay, energy consumption, and size of discarded tasks for the UAV-assisted MEC system. Compared with the fixed local-UAV scheme in the specific simulation setting, our proposed approach reduces system delay and energy consumption by approximately 50% and 200%, respectively.

Download Full-text

A UAV Maneuver Decision-Making Algorithm for Autonomous Airdrop Based on Deep Reinforcement Learning

Sensors ◽

10.3390/s21062233 ◽

2021 ◽

Vol 21 (6) ◽

pp. 2233 ◽

Cited By ~ 1

Author(s):

Ke Li ◽

Kun Zhang ◽

Zhenchong Zhang ◽

Zekun Liu ◽

Shuai Hua ◽

...

Keyword(s):

Decision Making ◽

Reinforcement Learning ◽

Learning Algorithm ◽

Training Set ◽

Decision Network ◽

Network Training ◽

Interactive Environment ◽

Experience Replay ◽

Aerial Vehicle ◽

Key Issues

How to operate an unmanned aerial vehicle (UAV) safely and efficiently in an interactive environment is challenging. A large amount of research has been devoted to improve the intelligence of a UAV while performing a mission, where finding an optimal maneuver decision-making policy of the UAV has become one of the key issues when we attempt to enable the UAV autonomy. In this paper, we propose a maneuver decision-making algorithm based on deep reinforcement learning, which generates efficient maneuvers for a UAV agent to execute the airdrop mission autonomously in an interactive environment. Particularly, the training set of the learning algorithm by the Prioritized Experience Replay is constructed, that can accelerate the convergence speed of decision network training in the algorithm. It is shown that a desirable and effective maneuver decision-making policy can be found by extensive experimental results.

Download Full-text