scholarly journals Dynamic Path Planning of Unknown Environment Based on Deep Reinforcement Learning

2018 ◽  
Vol 2018 ◽  
pp. 1-10 ◽  
Author(s):  
Xiaoyun Lei ◽  
Zhian Zhang ◽  
Peifang Dong

Dynamic path planning of unknown environment has always been a challenge for mobile robots. In this paper, we apply double Q-network (DDQN) deep reinforcement learning proposed by DeepMind in 2016 to dynamic path planning of unknown environment. The reward and punishment function and the training method are designed for the instability of the training stage and the sparsity of the environment state space. In different training stages, we dynamically adjust the starting position and target position. With the updating of neural network and the increase of greedy rule probability, the local space searched by agent is expanded. Pygame module in PYTHON is used to establish dynamic environments. Considering lidar signal and local target position as the inputs, convolutional neural networks (CNNs) are used to generalize the environmental state. Q-learning algorithm enhances the ability of the dynamic obstacle avoidance and local planning of the agents in environment. The results show that, after training in different dynamic environments and testing in a new environment, the agent is able to reach the local target position successfully in unknown dynamic environment.

Symmetry ◽  
2022 ◽  
Vol 14 (1) ◽  
pp. 132
Author(s):  
Jianfeng Zheng ◽  
Shuren Mao ◽  
Zhenyu Wu ◽  
Pengcheng Kong ◽  
Hao Qiang

To solve the problems of poor exploration ability and convergence speed of traditional deep reinforcement learning in the navigation task of the patrol robot under indoor specified routes, an improved deep reinforcement learning algorithm based on Pan/Tilt/Zoom(PTZ) image information was proposed in this paper. The obtained symmetric image information and target position information are taken as the input of the network, the speed of the robot is taken as the output of the next action, and the circular route with boundary is taken as the test. The improved reward and punishment function is designed to improve the convergence speed of the algorithm and optimize the path so that the robot can plan a safer path while avoiding obstacles first. Compared with Deep Q Network(DQN) algorithm, the convergence speed after improvement is shortened by about 40%, and the loss function is more stable.


2021 ◽  
Vol 2021 ◽  
pp. 1-10
Author(s):  
Peng Li ◽  
Xiangcheng Ding ◽  
Hongfang Sun ◽  
Shiquan Zhao ◽  
Ricardo Cajo

Aiming at the problems of low success rate and slow learning speed of the DDPG algorithm in path planning of a mobile robot in a dynamic environment, an improved DDPG algorithm is designed. In this article, the RAdam algorithm is used to replace the neural network optimizer in DDPG, combined with the curiosity algorithm to improve the success rate and convergence speed. Based on the improved algorithm, priority experience replay is added, and transfer learning is introduced to improve the training effect. Through the ROS robot operating system and Gazebo simulation software, a dynamic simulation environment is established, and the improved DDPG algorithm and DDPG algorithm are compared. For the dynamic path planning task of the mobile robot, the simulation results show that the convergence speed of the improved DDPG algorithm is increased by 21%, and the success rate is increased to 90% compared with the original DDPG algorithm. It has a good effect on dynamic path planning for mobile robots with continuous action space.


IEEE Access ◽  
2021 ◽  
pp. 1-1
Author(s):  
Jianxin Feng ◽  
Jingze Zhang ◽  
Geng Zhang ◽  
Shuang Xie ◽  
Yuanming Ding ◽  
...  

Author(s):  
Junior Sundar ◽  
Sumith Yesudasan ◽  
Sibi Chacko

This investigation explores a novel path-planning and optimization strategy for multiple cooperative robotic agents, applied in a fully observable and dynamically changing obstacle field. Current dynamic path planning strategies employ static algorithms operating over incremental time-steps. We propose a cooperative multi-agent (CMA) based algorithm, based on natural flocking of animals, using vector operations. It is preferred over more common graph search algorithms like A* as it can be easily applied for dynamic environments. CMA algorithm executes obstacle avoidance using static potential fields around obstacles, that scale based on relative motion. Optimization strategies including interpolation and Bezier curves are applied to the algorithm. To validate effectiveness, CMA algorithm is compared with A* using static obstacles due to lack of equivalent algorithms for dynamic environments. CMA performed comparably to A* with difference ranging from -0.2% to 1.3%. CMA algorithm is applied experimentally to achieve comparable performance, with an error range of -0.5% to 5.2%. These errors are attributed to the limitations of the Kinect V1 sensor used for obstacle detection. The algorithm was finally implemented in a 3D simulated space, indicating that it is possible to apply with drones. This algorithm shows promise for application in warehouse and inventory automation, especially when the workspace is observable.


Sign in / Sign up

Export Citation Format

Share Document