Path planning in an unknown environment based on deep reinforcement learning with prior knowledge

2021 ◽  
pp. 1-17
Author(s):  
Ping Lou ◽  
Kun Xu ◽  
Xuemei Jiang ◽  
Zheng Xiao ◽  
Junwei Yan

Path planning in an unknown environment is a basic task for mobile robots to complete tasks. As a typical deep reinforcement learning, deep Q-network (DQN) algorithm has gained wide popularity in path planning tasks due to its self-learning and adaptability to complex environment. However, most of path planning algorithms based on DQN spend plenty of time for model training and the learned model policy depends only on the information observed by sensors. It will cause poor generalization capability for the new task and time waste for model retraining. Therefore, a new deep reinforcement learning method combining DQN with prior knowledge is proposed to reduce training time and enhance generalization capability. In this method, a fuzzy logic controller is designed to avoid the obstacles and help the robot avoid blind exploration for reducing the training time. A target-driven approach is used to address the lack of generalization, in which the learned policy depends on the fusion of observed information and target information. Extensive experiments show that the proposed algorithm converges faster than DQN algorithm in path planning tasks and the target can be reached without retraining when the path planning task changes.

2021 ◽  
pp. 229-236
Author(s):  
Xinshun Ning ◽  
Hongyong Yang ◽  
Zhilin Fan ◽  
Yilin Han

2020 ◽  
Vol 1576 ◽  
pp. 012009
Author(s):  
Yang Wang ◽  
Yilin Fang ◽  
Ping Lou ◽  
Junwei Yan ◽  
Nianyun Liu

2021 ◽  
Vol 2021 ◽  
pp. 1-14
Author(s):  
Liang Huang ◽  
Xuequn Wu ◽  
Qiuzhi Peng ◽  
Xueqin Yu

The tobacco in plateau mountains has the characteristics of fragmented planting, uneven growth, and mixed/interplanting of crops. It is difficult to extract effective features using an object-oriented image analysis method to accurately extract tobacco planting areas. To this end, the advantage of deep learning features self-learning is relied on in this paper. An accurate extraction method of tobacco planting areas based on a deep semantic segmentation model from the unmanned aerial vehicle (UAV) remote sensing images in plateau mountains is proposed in this paper. Firstly, the tobacco semantic segmentation dataset is established using Labelme. Four deep semantic segmentation models of DeeplabV3+, PSPNet, SegNet, and U-Net are used to train the sample data in the dataset. Among them, in order to reduce the model training time, the MobileNet series of lightweight networks are used to replace the original backbone networks of the four network models. Finally, the predictive images are semantically segmented by trained networks, and the mean Intersection over Union (mIoU) is used to evaluate the accuracy. The experimental results show that, using DeeplabV3+, PSPNet, SegNet, and U-Net to perform semantic segmentation on 71 scene prediction images, the mIoU obtained is 0.9436, 0.9118, 0.9392, and 0.9473, respectively, and the accuracy of semantic segmentation is high. The feasibility of the deep semantic segmentation method for extracting tobacco planting surface from UAV remote sensing images has been verified, and the research method can provide a reference for subsequent automatic extraction of tobacco planting areas.


2018 ◽  
Vol 2018 ◽  
pp. 1-10 ◽  
Author(s):  
Xiaoyun Lei ◽  
Zhian Zhang ◽  
Peifang Dong

Dynamic path planning of unknown environment has always been a challenge for mobile robots. In this paper, we apply double Q-network (DDQN) deep reinforcement learning proposed by DeepMind in 2016 to dynamic path planning of unknown environment. The reward and punishment function and the training method are designed for the instability of the training stage and the sparsity of the environment state space. In different training stages, we dynamically adjust the starting position and target position. With the updating of neural network and the increase of greedy rule probability, the local space searched by agent is expanded. Pygame module in PYTHON is used to establish dynamic environments. Considering lidar signal and local target position as the inputs, convolutional neural networks (CNNs) are used to generalize the environmental state. Q-learning algorithm enhances the ability of the dynamic obstacle avoidance and local planning of the agents in environment. The results show that, after training in different dynamic environments and testing in a new environment, the agent is able to reach the local target position successfully in unknown dynamic environment.


Author(s):  
Jean Phelipe De Oliveira Lima ◽  
Raimundo Correa de Oliveira ◽  
Cleinaldo de Almeida Costa

Autonomous vehicle path planning aims to allow safe and rapid movement in an environment without human interference. Recently, Reinforcement Learning methods have been used to solve this problem and have achieved satisfactory results. This work presents the use of Deep Reinforcement Learning for the task of path planning for autonomous vehicles through trajectory simulation, to define routes that offer greater safety (without collisions) and less distance for the displacement between two points. A method for creating simulation environments was developed to analyze the performance of the proposed models in different difficult degrees of circumstances. The decision-making strategy implemented was based on the use of Artificial Neural Networks of the Multilayer Perceptron type with parameters and hyperparameters determined from a grid search. The models were evaluated for their reward charts resulting from their learning process. Such evaluation occurred in two phases: isolated evaluation, in which the models were inserted into the environment without prior knowledge; and incremental evaluation, in which models were inserted in unknown environments with previous intelligence accumulated in other conditions. The results obtained are competitive with state-of-the-art works and highlight the adaptive characteristic of the models presented, which, when inserted with prior knowledge in environments, can reduce the convergence time by up to 89.47% when compared to related works.


Sign in / Sign up

Export Citation Format

Share Document