scholarly journals Heuristic Q-learning based on experience replay for three-dimensional path planning of the unmanned aerial vehicle

2019 ◽  
Vol 103 (1) ◽  
pp. 003685041987902 ◽  
Author(s):  
Ronglei Xie ◽  
Zhijun Meng ◽  
Yaoming Zhou ◽  
Yunpeng Ma ◽  
Zhe Wu

In order to solve the problem that the existing reinforcement learning algorithm is difficult to converge due to the excessive state space of the three-dimensional path planning of the unmanned aerial vehicle, this article proposes a reinforcement learning algorithm based on the heuristic function and the maximum average reward value of the experience replay mechanism. The knowledge of track performance is introduced to construct heuristic function to guide the unmanned aerial vehicles’ action selection and reduce the useless exploration. Experience replay mechanism based on maximum average reward increases the utilization rate of excellent samples and the convergence speed of the algorithm. The simulation results show that the proposed three-dimensional path planning algorithm has good learning efficiency, and the convergence speed and training performance are significantly improved.

2020 ◽  
Vol 2020 ◽  
pp. 1-13
Author(s):  
Yongqiang Qi ◽  
Shuai Li ◽  
Yi Ke

In this paper, a three-dimensional path planning problem of an unmanned aerial vehicle under constant thrust is studied based on the artificial fluid method. The effect of obstacles on the original fluid field is quantified by the perturbation matrix, the streamlines can be regarded as the planned path for the unmanned aerial vehicle, and the tangential vector and the disturbance matrix of the artificial fluid method are improved. In particular, this paper addresses a novel algorithm of constant thrust fitting which is proposed through the impulse compensation, and then the constant thrust switching control scheme based on the isochronous interpolation method is given. It is proved that the planned path can avoid all obstacles smoothly and swiftly and reach the destination eventually. Simulation results demonstrate the effectiveness of this method.


2015 ◽  
Vol 28 (1) ◽  
pp. 229-239 ◽  
Author(s):  
Honglun Wang ◽  
Wentao Lyu ◽  
Peng Yao ◽  
Xiao Liang ◽  
Chang Liu

Symmetry ◽  
2022 ◽  
Vol 14 (1) ◽  
pp. 132
Author(s):  
Jianfeng Zheng ◽  
Shuren Mao ◽  
Zhenyu Wu ◽  
Pengcheng Kong ◽  
Hao Qiang

To solve the problems of poor exploration ability and convergence speed of traditional deep reinforcement learning in the navigation task of the patrol robot under indoor specified routes, an improved deep reinforcement learning algorithm based on Pan/Tilt/Zoom(PTZ) image information was proposed in this paper. The obtained symmetric image information and target position information are taken as the input of the network, the speed of the robot is taken as the output of the next action, and the circular route with boundary is taken as the test. The improved reward and punishment function is designed to improve the convergence speed of the algorithm and optimize the path so that the robot can plan a safer path while avoiding obstacles first. Compared with Deep Q Network(DQN) algorithm, the convergence speed after improvement is shortened by about 40%, and the loss function is more stable.


2017 ◽  
Vol 266 ◽  
pp. 445-457 ◽  
Author(s):  
Chen YongBo ◽  
Mei YueSong ◽  
Yu JianQiao ◽  
Su XiaoLong ◽  
Xu Nuo

Sensors ◽  
2021 ◽  
Vol 21 (19) ◽  
pp. 6499
Author(s):  
Shuyang Li ◽  
Xiaohui Hu ◽  
Yongwen Du

Computation offloading technology extends cloud computing to the edge of the access network close to users, bringing many benefits to terminal devices with limited battery and computational resources. Nevertheless, the existing computation offloading approaches are challenging to apply to specific scenarios, such as the dense distribution of end-users and the sparse distribution of network infrastructure. The technological revolution in the unmanned aerial vehicle (UAV) and chip industry has granted UAVs more computing resources and promoted the emergence of UAV-assisted mobile edge computing (MEC) technology, which could be applied to those scenarios. However, in the MEC system with multiple users and multiple servers, making reasonable offloading decisions and allocating system resources is still a severe challenge. This paper studies the offloading decision and resource allocation problem in the UAV-assisted MEC environment with multiple users and servers. To ensure the quality of service for end-users, we set the weighted total cost of delay, energy consumption, and the size of discarded tasks as our optimization objective. We further formulate the joint optimization problem as a Markov decision process and apply the soft actor–critic (SAC) deep reinforcement learning algorithm to optimize the offloading policy. Numerical simulation results show that the offloading policy optimized by our proposed SAC-based dynamic computing offloading (SACDCO) algorithm effectively reduces the delay, energy consumption, and size of discarded tasks for the UAV-assisted MEC system. Compared with the fixed local-UAV scheme in the specific simulation setting, our proposed approach reduces system delay and energy consumption by approximately 50% and 200%, respectively.


Sensors ◽  
2021 ◽  
Vol 21 (6) ◽  
pp. 2233 ◽  
Author(s):  
Ke Li ◽  
Kun Zhang ◽  
Zhenchong Zhang ◽  
Zekun Liu ◽  
Shuai Hua ◽  
...  

How to operate an unmanned aerial vehicle (UAV) safely and efficiently in an interactive environment is challenging. A large amount of research has been devoted to improve the intelligence of a UAV while performing a mission, where finding an optimal maneuver decision-making policy of the UAV has become one of the key issues when we attempt to enable the UAV autonomy. In this paper, we propose a maneuver decision-making algorithm based on deep reinforcement learning, which generates efficient maneuvers for a UAV agent to execute the airdrop mission autonomously in an interactive environment. Particularly, the training set of the learning algorithm by the Prioritized Experience Replay is constructed, that can accelerate the convergence speed of decision network training in the algorithm. It is shown that a desirable and effective maneuver decision-making policy can be found by extensive experimental results.


Sign in / Sign up

Export Citation Format

Share Document