Robot obstacle avoidance system using deep reinforcement learning

Purpose Most manufacturing plants choose the easy way of completely separating human operators from robots to prevent accidents, but as a result, it dramatically affects the overall quality and speed that is expected from human–robot collaboration. It is not an easy task to ensure human safety when he/she has entered a robot’s workspace, and the unstructured nature of those working environments makes it even harder. The purpose of this paper is to propose a real-time robot collision avoidance method to alleviate this problem. Design/methodology/approach In this paper, a model is trained to learn the direct control commands from the raw depth images through self-supervised reinforcement learning algorithm. To reduce the effect of sample inefficiency and safety during initial training, a virtual reality platform is used to simulate a natural working environment and generate obstacle avoidance data for training. To ensure a smooth transfer to a real robot, the automatic domain randomization technique is used to generate randomly distributed environmental parameters through the obstacle avoidance simulation of virtual robots in the virtual environment, contributing to better performance in the natural environment. Findings The method has been tested in both simulations with a real UR3 robot for several practical applications. The results of this paper indicate that the proposed approach can effectively make the robot safety-aware and learn how to divert its trajectory to avoid accidents with humans within the workspace. Research limitations/implications The method has been tested in both simulations with a real UR3 robot in several practical applications. The results indicate that the proposed approach can effectively make the robot be aware of safety and learn how to change its trajectory to avoid accidents with persons within the workspace. Originality/value This paper provides a novel collision avoidance framework that allows robots to work alongside human operators in unstructured and complex environments. The method uses end-to-end policy training to directly extract the optimal path from the visual inputs for the scene.

Download Full-text

AUV Obstacle Avoidance Planning Based on Deep Reinforcement Learning

Journal of Marine Science and Engineering ◽

10.3390/jmse9111166 ◽

2021 ◽

Vol 9 (11) ◽

pp. 1166

Author(s):

Jianya Yuan ◽

Hongjian Wang ◽

Honghan Zhang ◽

Changjian Lin ◽

Dan Yu ◽

...

Keyword(s):

Genetic Algorithm ◽

Deep Learning ◽

Reinforcement Learning ◽

Real Time ◽

Collision Avoidance ◽

Autonomous Underwater Vehicle ◽

Learning Algorithm ◽

Avoidance Performance ◽

Network Algorithms ◽

Practical Applications

In a complex underwater environment, finding a viable, collision-free path for an autonomous underwater vehicle (AUV) is a challenging task. The purpose of this paper is to establish a safe, real-time, and robust method of collision avoidance that improves the autonomy of AUVs. We propose a method based on active sonar, which utilizes a deep reinforcement learning algorithm to learn the processed sonar information to navigate the AUV in an uncertain environment. We compare the performance of double deep Q-network algorithms with that of a genetic algorithm and deep learning. We propose a line-of-sight guidance method to mitigate abrupt changes in the yaw direction and smooth the heading changes when the AUV switches trajectory. The different experimental results show that the double deep Q-network algorithms ensure excellent collision avoidance performance. The effectiveness of the algorithm proposed in this paper was verified in three environments: random static, mixed static, and complex dynamic. The results show that the proposed algorithm has significant advantages over other algorithms in terms of success rate, collision avoidance performance, and generalization ability. The double deep Q-network algorithm proposed in this paper is superior to the genetic algorithm and deep learning in terms of the running time, total path, performance in avoiding collisions with moving obstacles, and planning time for each step. After the algorithm is trained in a simulated environment, it can still perform online learning according to the information of the environment after deployment and adjust the weight of the network in real-time. These results demonstrate that the proposed approach has significant potential for practical applications.

Download Full-text

Optimal Path-Planning of Nonholonomic Terrain Robots for Dynamic Obstacle Avoidance Using Single-Time Velocity Estimator and Reinforcement Learning Approach

IEEE Access ◽

10.1109/access.2019.2950166 ◽

2019 ◽

Vol 7 ◽

pp. 159347-159356

Author(s):

Hamid Taghavifar ◽

Bin Xu ◽

Leyla Taghavifar ◽

Yechen Qin

Keyword(s):

Reinforcement Learning ◽

Path Planning ◽

Obstacle Avoidance ◽

Optimal Path ◽

Learning Approach ◽

Single Time ◽

Optimal Path Planning ◽

Dynamic Obstacle Avoidance ◽

Dynamic Obstacle

Download Full-text

Distributed multi-robot collision avoidance via deep reinforcement learning for navigation in complex scenarios

The International Journal of Robotics Research ◽

10.1177/0278364920916531 ◽

2020 ◽

Vol 39 (7) ◽

pp. 856-892 ◽

Cited By ~ 4

Author(s):

Tingxiang Fan ◽

Pinxin Long ◽

Wenxi Liu ◽

Jia Pan

Keyword(s):

Reinforcement Learning ◽

Collision Avoidance ◽

Autonomous Navigation ◽

Large Scale ◽

Learning Algorithm ◽

Free Action ◽

Parameter Tuning ◽

Movement Velocity ◽

Robot Systems ◽

Multi Robot

Developing a safe and efficient collision-avoidance policy for multiple robots is challenging in the decentralized scenarios where each robot generates its paths with limited observation of other robots’ states and intentions. Prior distributed multi-robot collision-avoidance systems often require frequent inter-robot communication or agent-level features to plan a local collision-free action, which is not robust and computationally prohibitive. In addition, the performance of these methods is not comparable with their centralized counterparts in practice. In this article, we present a decentralized sensor-level collision-avoidance policy for multi-robot systems, which shows promising results in practical applications. In particular, our policy directly maps raw sensor measurements to an agent’s steering commands in terms of the movement velocity. As a first step toward reducing the performance gap between decentralized and centralized methods, we present a multi-scenario multi-stage training framework to learn an optimal policy. The policy is trained over a large number of robots in rich, complex environments simultaneously using a policy-gradient-based reinforcement-learning algorithm. The learning algorithm is also integrated into a hybrid control framework to further improve the policy’s robustness and effectiveness. We validate the learned sensor-level collision-3avoidance policy in a variety of simulated and real-world scenarios with thorough performance evaluations for large-scale multi-robot systems. The generalization of the learned policy is verified in a set of unseen scenarios including the navigation of a group of heterogeneous robots and a large-scale scenario with 100 robots. Although the policy is trained using simulation data only, we have successfully deployed it on physical robots with shapes and dynamics characteristics that are different from the simulated agents, in order to demonstrate the controller’s robustness against the simulation-to-real modeling error. Finally, we show that the collision-avoidance policy learned from multi-robot navigation tasks provides an excellent solution for safe and effective autonomous navigation for a single robot working in a dense real human crowd. Our learned policy enables a robot to make effective progress in a crowd without getting stuck. More importantly, the policy has been successfully deployed on different types of physical robot platforms without tedious parameter tuning. Videos are available at https://sites.google.com/view/hybridmrca .

Download Full-text

Path Planning Collision Avoidance using Reinforcement Learning

10.48011/asba.v2i1.1597 ◽

2020 ◽

Author(s):

Josias G. Batista ◽

Felipe J. S. Vasconcelos ◽

Kaio M. Ramos ◽

Darielson A. Souza ◽

José L. N. Silva

Keyword(s):

Reinforcement Learning ◽

Path Planning ◽

Production Process ◽

Collision Avoidance ◽

Production Systems ◽

Learning Algorithm ◽

Computational Cost ◽

Trajectory Generation ◽

Industrial Robots ◽

Q Learning

Industrial robots have grown over the years making production systems more and more efficient, requiring the need for efficient trajectory generation algorithms that optimize and, if possible, generate collision-free trajectories without interrupting the production process. In this work is presented the use of Reinforcement Learning (RL), based on the Q-Learning algorithm, in the trajectory generation of a robotic manipulator and also a comparison of its use with and without constraints of the manipulator kinematics, in order to generate collisionfree trajectories. The results of the simulations are presented with respect to the efficiency of the algorithm and its use in trajectory generation, a comparison of the computational cost for the use of constraints is also presented.

Download Full-text

Iterative SARSA: The Modified SARSA Algorithm for Finding the Optimal Path

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.f9429.038620 ◽

2020 ◽

Vol 8 (6) ◽

pp. 4333-4338

Keyword(s):

Reinforcement Learning ◽

Comparative Analysis ◽

Mobile Robots ◽

Learning Algorithm ◽

Learning Algorithms ◽

Optimal Path ◽

Autonomous Mobile Robots ◽

Current Standard ◽

Q Learning ◽

Better Than

This paper presents a thorough comparative analysis of various reinforcement learning algorithms used by autonomous mobile robots for optimal path finding and, we propose a new algorithm called Iterative SARSA for the same. The main objective of the paper is to differentiate between the Q-learning and SARSA, and modify the latter. These algorithms use either the on-policy or off-policy methods of reinforcement learning. For the on-policy method, we have used the SARSA algorithm and for the off-policy method, the Q-learning algorithm has been used. These algorithms also have an impacting effect on finding the shortest path possible for the robot. Based on the results obtained, we have concluded how our algorithm is better than the current standard reinforcement learning algorithms

Download Full-text

A new path plan method based on hybrid algorithm of reinforcement learning and particle swarm optimization

Engineering Computations ◽

10.1108/ec-09-2020-0500 ◽

2021 ◽

Vol ahead-of-print (ahead-of-print) ◽

Author(s):

Xiaohuan Liu ◽

Degan Zhang ◽

Ting Zhang ◽

Jie Zhang ◽

Jiaxu Wang

Keyword(s):

Particle Swarm Optimization ◽

Reinforcement Learning ◽

Path Planning ◽

Optimal Path ◽

Particle Swarm ◽

Research Direction ◽

Swarm Optimization ◽

Optimal Position ◽

Content Type ◽

Set Operation

PurposeTo solve the path planning problem of the intelligent driving vehicular, this paper designs a hybrid path planning algorithm based on optimized reinforcement learning (RL) and improved particle swarm optimization (PSO).Design/methodology/approachFirst, the authors optimized the hyper-parameters of RL to make it converge quickly and learn more efficiently. Then the authors designed a pre-set operation for PSO to reduce the calculation of invalid particles. Finally, the authors proposed a correction variable that can be obtained from the cumulative reward of RL; this revises the fitness of the individual optimal particle and global optimal position of PSO to achieve an efficient path planning result. The authors also designed a selection parameter system to help to select the optimal path.FindingsSimulation analysis and experimental test results proved that the proposed algorithm has advantages in terms of practicability and efficiency. This research also foreshadows the research prospects of RL in path planning, which is also the authors’ next research direction.Originality/valueThe authors designed a pre-set operation to reduce the participation of invalid particles in the calculation in PSO. And then, the authors designed a method to optimize hyper-parameters to improve learning efficiency of RL. And then they used RL trained PSO to plan path. The authors also proposed an optimal path evaluation system. This research also foreshadows the research prospects of RL in path planning, which is also the authors’ next research direction.

Download Full-text

Intelligent obstacle avoidance path planning method for picking manipulator combined with artificial potential field method

Industrial Robot the international journal of robotics research and application ◽

10.1108/ir-09-2021-0194 ◽

2022 ◽

Vol ahead-of-print (ahead-of-print) ◽

Author(s):

Zheng Fang ◽

Xifeng Liang

Keyword(s):

Genetic Algorithm ◽

Reinforcement Learning ◽

Path Planning ◽

Obstacle Avoidance ◽

Potential Field ◽

Field Method ◽

Planning Method ◽

Artificial Potential Field ◽

Content Type ◽

Artificial Potential Field Method

Purpose The results of obstacle avoidance path planning for the manipulator using artificial potential field (APF) method contain a large number of path nodes, which reduce the efficiency of manipulators. This paper aims to propose a new intelligent obstacle avoidance path planning method for picking robot to improve the efficiency of manipulators. Design/methodology/approach To improve the efficiency of the robot, this paper proposes a new intelligent obstacle avoidance path planning method for picking robot. In this method, we present a snake-tongue algorithm based on slope-type potential field and combine the snake-tongue algorithm with genetic algorithm (GA) and reinforcement learning (RL) to reduce the path length and the number of path nodes in the path planning results. Findings Simulation experiments were conducted with tomato string picking manipulator. The results showed that the path length is reduced from 4.1 to 2.979 m, the number of nodes is reduced from 31 to 3 and the working time of the robot is reduced from 87.35 to 37.12 s, after APF method combined with GA and RL. Originality/value This paper proposes a new improved method of APF, and combines it with GA and RL. The experimental results show that the new intelligent obstacle avoidance path planning method proposed in this paper is beneficial to improve the efficiency of the robotic arm. Graphical abstract Figure 1 According to principles of bionics, we propose a new path search method, snake-tongue algorithm, based on a slope-type potential field. At the same time, we use genetic algorithm to strengthen the ability of the artificial potential field method for path searching, so that it can complete the path searching in a variety of complex obstacle distribution situations with shorter path searching results. Reinforcement learning is used to reduce the number of path nodes, which is good for improving the efficiency of robot work. The use of genetic algorithm and reinforcement learning lays the foundation for intelligent control.

Download Full-text

Memory-based reinforcement learning algorithm for autonomous exploration in unknown environment

International Journal of Advanced Robotic Systems ◽

10.1177/1729881418775849 ◽

2018 ◽

Vol 15 (3) ◽

pp. 172988141877584 ◽

Cited By ~ 4

Author(s):

Amir Ramezani Dooraki ◽

Deok Jin Lee

Keyword(s):

Reinforcement Learning ◽

Obstacle Avoidance ◽

Learning Algorithm ◽

Autonomous Exploration ◽

Unknown Environment ◽

Real Model ◽

Different Shapes ◽

The Difference ◽

Efficient Exploration ◽

Near Future

In the near future, robots would be seen in almost every area of our life, in different shapes and with different objectives such as entertainment, surveillance, rescue, and navigation. In any shape and with any objective, it is necessary for them to be capable of successful exploration. They should be able to explore efficiently and be able to adapt themselves with changes in their environment. For successful navigation, it is necessary to recognize the difference between similar places of an environment. In order to achieve this goal without increasing the capability of sensors, having a memory is crucial. In this article, an algorithm for autonomous exploration and obstacle avoidance in an unknown environment is proposed. In order to make our self-learner algorithm, a memory-based reinforcement learning method using multilayer neural network is used with the aim of creating an agent having an efficient exploration and obstacle avoidance policy. Furthermore, this agent can automatically adapt itself to the changes of its environment. Finally, in order to test the capability of our algorithm, we have implemented it in a robot similar to a real model, simulated in the robust physics engine simulator of Gazebo.

Download Full-text

Obstacle Avoidance Path Planning for Mobile Robot Based on Ant-Q Reinforcement Learning Algorithm

Advances in Neural Networks – ISNN 2007 - Lecture Notes in Computer Science ◽

10.1007/978-3-540-72383-7_83 ◽

2007 ◽

pp. 704-713 ◽

Cited By ~ 7

Author(s):

Ngo Anh Vien ◽

Nguyen Hoang Viet ◽

SeungGwan Lee ◽

TaeChoong Chung

Keyword(s):

Reinforcement Learning ◽

Path Planning ◽

Mobile Robot ◽

Obstacle Avoidance ◽

Learning Algorithm ◽

Reinforcement Learning Algorithm

Download Full-text

NAO robot obstacle avoidance based on fuzzy Q-learning

Industrial Robot the international journal of robotics research and application ◽

10.1108/ir-01-2019-0002 ◽

2019 ◽

Vol 47 (6) ◽

pp. 801-811 ◽

Cited By ~ 1

Author(s):

Shuhuan Wen ◽

Xueheng Hu ◽

Zhen Li ◽

Hak Keung Lam ◽

Fuchun Sun ◽

...

Keyword(s):

Obstacle Avoidance ◽

Autonomous Navigation ◽

Learning Algorithm ◽

Experimental Results ◽

Content Type ◽

Q Learning ◽

Nao Robot ◽

Learning Speed ◽

Simulation Results ◽

Fractional Controller

Purpose This paper aims to propose a novel active SLAM framework to realize avoid obstacles and finish the autonomous navigation in indoor environment. Design/methodology/approach The improved fuzzy optimized Q-Learning (FOQL) algorithm is used to solve the avoidance obstacles problem of the robot in the environment. To reduce the motion deviation of the robot, fractional controller is designed. The localization of the robot is based on FastSLAM algorithm. Findings Simulation results of avoiding obstacles using traditional Q-learning algorithm, optimized Q-learning algorithm and FOQL algorithm are compared. The simulation results show that the improved FOQL algorithm has a faster learning speed than other two algorithms. To verify the simulation result, the FOQL algorithm is implemented on a NAO robot and the experimental results demonstrate that the improved fuzzy optimized Q-Learning obstacle avoidance algorithm is feasible and effective. Originality/value The improved fuzzy optimized Q-Learning (FOQL) algorithm is used to solve the avoidance obstacles problem of the robot in the environment. To reduce the motion deviation of the robot, fractional controller is designed. To verify the simulation result, the FOQL algorithm is implemented on a NAO robot and the experimental results demonstrate that the improved fuzzy optimized Q-Learning obstacle avoidance algorithm is feasible and effective.

Download Full-text