Robot obstacle avoidance system using deep reinforcement learning

2021 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Xiaojun Zhu ◽  
Yinghao Liang ◽  
Hanxu Sun ◽  
Xueqian Wang ◽  
Bin Ren

Purpose Most manufacturing plants choose the easy way of completely separating human operators from robots to prevent accidents, but as a result, it dramatically affects the overall quality and speed that is expected from human–robot collaboration. It is not an easy task to ensure human safety when he/she has entered a robot’s workspace, and the unstructured nature of those working environments makes it even harder. The purpose of this paper is to propose a real-time robot collision avoidance method to alleviate this problem. Design/methodology/approach In this paper, a model is trained to learn the direct control commands from the raw depth images through self-supervised reinforcement learning algorithm. To reduce the effect of sample inefficiency and safety during initial training, a virtual reality platform is used to simulate a natural working environment and generate obstacle avoidance data for training. To ensure a smooth transfer to a real robot, the automatic domain randomization technique is used to generate randomly distributed environmental parameters through the obstacle avoidance simulation of virtual robots in the virtual environment, contributing to better performance in the natural environment. Findings The method has been tested in both simulations with a real UR3 robot for several practical applications. The results of this paper indicate that the proposed approach can effectively make the robot safety-aware and learn how to divert its trajectory to avoid accidents with humans within the workspace. Research limitations/implications The method has been tested in both simulations with a real UR3 robot in several practical applications. The results indicate that the proposed approach can effectively make the robot be aware of safety and learn how to change its trajectory to avoid accidents with persons within the workspace. Originality/value This paper provides a novel collision avoidance framework that allows robots to work alongside human operators in unstructured and complex environments. The method uses end-to-end policy training to directly extract the optimal path from the visual inputs for the scene.

2021 ◽  
Vol 9 (11) ◽  
pp. 1166
Author(s):  
Jianya Yuan ◽  
Hongjian Wang ◽  
Honghan Zhang ◽  
Changjian Lin ◽  
Dan Yu ◽  
...  

In a complex underwater environment, finding a viable, collision-free path for an autonomous underwater vehicle (AUV) is a challenging task. The purpose of this paper is to establish a safe, real-time, and robust method of collision avoidance that improves the autonomy of AUVs. We propose a method based on active sonar, which utilizes a deep reinforcement learning algorithm to learn the processed sonar information to navigate the AUV in an uncertain environment. We compare the performance of double deep Q-network algorithms with that of a genetic algorithm and deep learning. We propose a line-of-sight guidance method to mitigate abrupt changes in the yaw direction and smooth the heading changes when the AUV switches trajectory. The different experimental results show that the double deep Q-network algorithms ensure excellent collision avoidance performance. The effectiveness of the algorithm proposed in this paper was verified in three environments: random static, mixed static, and complex dynamic. The results show that the proposed algorithm has significant advantages over other algorithms in terms of success rate, collision avoidance performance, and generalization ability. The double deep Q-network algorithm proposed in this paper is superior to the genetic algorithm and deep learning in terms of the running time, total path, performance in avoiding collisions with moving obstacles, and planning time for each step. After the algorithm is trained in a simulated environment, it can still perform online learning according to the information of the environment after deployment and adjust the weight of the network in real-time. These results demonstrate that the proposed approach has significant potential for practical applications.


2020 ◽  
Vol 39 (7) ◽  
pp. 856-892 ◽  
Author(s):  
Tingxiang Fan ◽  
Pinxin Long ◽  
Wenxi Liu ◽  
Jia Pan

Developing a safe and efficient collision-avoidance policy for multiple robots is challenging in the decentralized scenarios where each robot generates its paths with limited observation of other robots’ states and intentions. Prior distributed multi-robot collision-avoidance systems often require frequent inter-robot communication or agent-level features to plan a local collision-free action, which is not robust and computationally prohibitive. In addition, the performance of these methods is not comparable with their centralized counterparts in practice. In this article, we present a decentralized sensor-level collision-avoidance policy for multi-robot systems, which shows promising results in practical applications. In particular, our policy directly maps raw sensor measurements to an agent’s steering commands in terms of the movement velocity. As a first step toward reducing the performance gap between decentralized and centralized methods, we present a multi-scenario multi-stage training framework to learn an optimal policy. The policy is trained over a large number of robots in rich, complex environments simultaneously using a policy-gradient-based reinforcement-learning algorithm. The learning algorithm is also integrated into a hybrid control framework to further improve the policy’s robustness and effectiveness. We validate the learned sensor-level collision-3avoidance policy in a variety of simulated and real-world scenarios with thorough performance evaluations for large-scale multi-robot systems. The generalization of the learned policy is verified in a set of unseen scenarios including the navigation of a group of heterogeneous robots and a large-scale scenario with 100 robots. Although the policy is trained using simulation data only, we have successfully deployed it on physical robots with shapes and dynamics characteristics that are different from the simulated agents, in order to demonstrate the controller’s robustness against the simulation-to-real modeling error. Finally, we show that the collision-avoidance policy learned from multi-robot navigation tasks provides an excellent solution for safe and effective autonomous navigation for a single robot working in a dense real human crowd. Our learned policy enables a robot to make effective progress in a crowd without getting stuck. More importantly, the policy has been successfully deployed on different types of physical robot platforms without tedious parameter tuning. Videos are available at https://sites.google.com/view/hybridmrca .


2020 ◽  
Author(s):  
Josias G. Batista ◽  
Felipe J. S. Vasconcelos ◽  
Kaio M. Ramos ◽  
Darielson A. Souza ◽  
José L. N. Silva

Industrial robots have grown over the years making production systems more and more efficient, requiring the need for efficient trajectory generation algorithms that optimize and, if possible, generate collision-free trajectories without interrupting the production process. In this work is presented the use of Reinforcement Learning (RL), based on the Q-Learning algorithm, in the trajectory generation of a robotic manipulator and also a comparison of its use with and without constraints of the manipulator kinematics, in order to generate collisionfree trajectories. The results of the simulations are presented with respect to the efficiency of the algorithm and its use in trajectory generation, a comparison of the computational cost for the use of constraints is also presented.


2020 ◽  
Vol 8 (6) ◽  
pp. 4333-4338

This paper presents a thorough comparative analysis of various reinforcement learning algorithms used by autonomous mobile robots for optimal path finding and, we propose a new algorithm called Iterative SARSA for the same. The main objective of the paper is to differentiate between the Q-learning and SARSA, and modify the latter. These algorithms use either the on-policy or off-policy methods of reinforcement learning. For the on-policy method, we have used the SARSA algorithm and for the off-policy method, the Q-learning algorithm has been used. These algorithms also have an impacting effect on finding the shortest path possible for the robot. Based on the results obtained, we have concluded how our algorithm is better than the current standard reinforcement learning algorithms


2021 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Xiaohuan Liu ◽  
Degan Zhang ◽  
Ting Zhang ◽  
Jie Zhang ◽  
Jiaxu Wang

PurposeTo solve the path planning problem of the intelligent driving vehicular, this paper designs a hybrid path planning algorithm based on optimized reinforcement learning (RL) and improved particle swarm optimization (PSO).Design/methodology/approachFirst, the authors optimized the hyper-parameters of RL to make it converge quickly and learn more efficiently. Then the authors designed a pre-set operation for PSO to reduce the calculation of invalid particles. Finally, the authors proposed a correction variable that can be obtained from the cumulative reward of RL; this revises the fitness of the individual optimal particle and global optimal position of PSO to achieve an efficient path planning result. The authors also designed a selection parameter system to help to select the optimal path.FindingsSimulation analysis and experimental test results proved that the proposed algorithm has advantages in terms of practicability and efficiency. This research also foreshadows the research prospects of RL in path planning, which is also the authors’ next research direction.Originality/valueThe authors designed a pre-set operation to reduce the participation of invalid particles in the calculation in PSO. And then, the authors designed a method to optimize hyper-parameters to improve learning efficiency of RL. And then they used RL trained PSO to plan path. The authors also proposed an optimal path evaluation system. This research also foreshadows the research prospects of RL in path planning, which is also the authors’ next research direction.


2022 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Zheng Fang ◽  
Xifeng Liang

Purpose The results of obstacle avoidance path planning for the manipulator using artificial potential field (APF) method contain a large number of path nodes, which reduce the efficiency of manipulators. This paper aims to propose a new intelligent obstacle avoidance path planning method for picking robot to improve the efficiency of manipulators. Design/methodology/approach To improve the efficiency of the robot, this paper proposes a new intelligent obstacle avoidance path planning method for picking robot. In this method, we present a snake-tongue algorithm based on slope-type potential field and combine the snake-tongue algorithm with genetic algorithm (GA) and reinforcement learning (RL) to reduce the path length and the number of path nodes in the path planning results. Findings Simulation experiments were conducted with tomato string picking manipulator. The results showed that the path length is reduced from 4.1 to 2.979 m, the number of nodes is reduced from 31 to 3 and the working time of the robot is reduced from 87.35 to 37.12 s, after APF method combined with GA and RL. Originality/value This paper proposes a new improved method of APF, and combines it with GA and RL. The experimental results show that the new intelligent obstacle avoidance path planning method proposed in this paper is beneficial to improve the efficiency of the robotic arm. Graphical abstract Figure 1 According to principles of bionics, we propose a new path search method, snake-tongue algorithm, based on a slope-type potential field. At the same time, we use genetic algorithm to strengthen the ability of the artificial potential field method for path searching, so that it can complete the path searching in a variety of complex obstacle distribution situations with shorter path searching results. Reinforcement learning is used to reduce the number of path nodes, which is good for improving the efficiency of robot work. The use of genetic algorithm and reinforcement learning lays the foundation for intelligent control.


2018 ◽  
Vol 15 (3) ◽  
pp. 172988141877584 ◽  
Author(s):  
Amir Ramezani Dooraki ◽  
Deok Jin Lee

In the near future, robots would be seen in almost every area of our life, in different shapes and with different objectives such as entertainment, surveillance, rescue, and navigation. In any shape and with any objective, it is necessary for them to be capable of successful exploration. They should be able to explore efficiently and be able to adapt themselves with changes in their environment. For successful navigation, it is necessary to recognize the difference between similar places of an environment. In order to achieve this goal without increasing the capability of sensors, having a memory is crucial. In this article, an algorithm for autonomous exploration and obstacle avoidance in an unknown environment is proposed. In order to make our self-learner algorithm, a memory-based reinforcement learning method using multilayer neural network is used with the aim of creating an agent having an efficient exploration and obstacle avoidance policy. Furthermore, this agent can automatically adapt itself to the changes of its environment. Finally, in order to test the capability of our algorithm, we have implemented it in a robot similar to a real model, simulated in the robust physics engine simulator of Gazebo.


Author(s):  
Shuhuan Wen ◽  
Xueheng Hu ◽  
Zhen Li ◽  
Hak Keung Lam ◽  
Fuchun Sun ◽  
...  

Purpose This paper aims to propose a novel active SLAM framework to realize avoid obstacles and finish the autonomous navigation in indoor environment. Design/methodology/approach The improved fuzzy optimized Q-Learning (FOQL) algorithm is used to solve the avoidance obstacles problem of the robot in the environment. To reduce the motion deviation of the robot, fractional controller is designed. The localization of the robot is based on FastSLAM algorithm. Findings Simulation results of avoiding obstacles using traditional Q-learning algorithm, optimized Q-learning algorithm and FOQL algorithm are compared. The simulation results show that the improved FOQL algorithm has a faster learning speed than other two algorithms. To verify the simulation result, the FOQL algorithm is implemented on a NAO robot and the experimental results demonstrate that the improved fuzzy optimized Q-Learning obstacle avoidance algorithm is feasible and effective. Originality/value The improved fuzzy optimized Q-Learning (FOQL) algorithm is used to solve the avoidance obstacles problem of the robot in the environment. To reduce the motion deviation of the robot, fractional controller is designed. To verify the simulation result, the FOQL algorithm is implemented on a NAO robot and the experimental results demonstrate that the improved fuzzy optimized Q-Learning obstacle avoidance algorithm is feasible and effective.


Sign in / Sign up

Export Citation Format

Share Document