Distributed multi-robot collision avoidance via deep reinforcement learning for navigation in complex scenarios

Developing a safe and efficient collision-avoidance policy for multiple robots is challenging in the decentralized scenarios where each robot generates its paths with limited observation of other robots’ states and intentions. Prior distributed multi-robot collision-avoidance systems often require frequent inter-robot communication or agent-level features to plan a local collision-free action, which is not robust and computationally prohibitive. In addition, the performance of these methods is not comparable with their centralized counterparts in practice. In this article, we present a decentralized sensor-level collision-avoidance policy for multi-robot systems, which shows promising results in practical applications. In particular, our policy directly maps raw sensor measurements to an agent’s steering commands in terms of the movement velocity. As a first step toward reducing the performance gap between decentralized and centralized methods, we present a multi-scenario multi-stage training framework to learn an optimal policy. The policy is trained over a large number of robots in rich, complex environments simultaneously using a policy-gradient-based reinforcement-learning algorithm. The learning algorithm is also integrated into a hybrid control framework to further improve the policy’s robustness and effectiveness. We validate the learned sensor-level collision-3avoidance policy in a variety of simulated and real-world scenarios with thorough performance evaluations for large-scale multi-robot systems. The generalization of the learned policy is verified in a set of unseen scenarios including the navigation of a group of heterogeneous robots and a large-scale scenario with 100 robots. Although the policy is trained using simulation data only, we have successfully deployed it on physical robots with shapes and dynamics characteristics that are different from the simulated agents, in order to demonstrate the controller’s robustness against the simulation-to-real modeling error. Finally, we show that the collision-avoidance policy learned from multi-robot navigation tasks provides an excellent solution for safe and effective autonomous navigation for a single robot working in a dense real human crowd. Our learned policy enables a robot to make effective progress in a crowd without getting stuck. More importantly, the policy has been successfully deployed on different types of physical robot platforms without tedious parameter tuning. Videos are available at https://sites.google.com/view/hybridmrca .

Download Full-text

Decision-Making for the Autonomous Navigation of Maritime Autonomous Surface Ships Based on Scene Division and Deep Reinforcement Learning

Sensors ◽

10.3390/s19184055 ◽

2019 ◽

Vol 19 (18) ◽

pp. 4055 ◽

Cited By ~ 9

Author(s):

Zhang ◽

Wang ◽

Liu ◽

Chen

Keyword(s):

Decision Making ◽

Reinforcement Learning ◽

Collision Avoidance ◽

Autonomous Navigation ◽

Learning Algorithm ◽

Q Learning ◽

Reward Function ◽

International Regulations ◽

Convergence Trend ◽

Decision Making Model

This research focuses on the adaptive navigation of maritime autonomous surface ships (MASSs) in an uncertain environment. To achieve intelligent obstacle avoidance of MASSs in a port, an autonomous navigation decision-making model based on hierarchical deep reinforcement learning is proposed. The model is mainly composed of two layers: the scene division layer and an autonomous navigation decision-making layer. The scene division layer mainly quantifies the sub-scenarios according to the International Regulations for Preventing Collisions at Sea (COLREG). This research divides the navigational situation of a ship into entities and attributes based on the ontology model and Protégé language. In the decision-making layer, we designed a deep Q-learning algorithm utilizing the environmental model, ship motion space, reward function, and search strategy to learn the environmental state in a quantized sub-scenario to train the navigation strategy. Finally, two sets of verification experiments of the deep reinforcement learning (DRL) and improved DRL algorithms were designed with Rizhao port as a study case. Moreover, the experimental data were analyzed in terms of the convergence trend, iterative path, and collision avoidance effect. The results indicate that the improved DRL algorithm could effectively improve the navigation safety and collision avoidance.

Download Full-text

Collision avoidance in multi-robot systems based on multi-layered reinforcement learning

Robotics and Autonomous Systems ◽

10.1016/s0921-8890(99)00035-4 ◽

1999 ◽

Vol 29 (1) ◽

pp. 21-32 ◽

Cited By ~ 11

Author(s):

Yoshikazu Arai ◽

Teruo Fujii ◽

Hajime Asama ◽

Hayato Kaetsu ◽

Isao Endo

Keyword(s):

Reinforcement Learning ◽

Collision Avoidance ◽

Robot Systems ◽

Multi Robot

Download Full-text

A Distributed Range-Only Collision Avoidance Approach for Low-cost Large-scale Multi-Robot Systems

2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) ◽

10.1109/iros45743.2020.9341539 ◽

2020 ◽

Author(s):

Ruihua Han ◽

Shengduo Chen ◽

Qi Hao

Keyword(s):

Collision Avoidance ◽

Large Scale ◽

Low Cost ◽

Robot Systems ◽

Multi Robot

Download Full-text

Multi-Robot Collision Avoidance with Map-based Deep Reinforcement Learning

2020 IEEE 32nd International Conference on Tools with Artificial Intelligence (ICTAI) ◽

10.1109/ictai50040.2020.00088 ◽

2020 ◽

Author(s):

Shunyi Yao ◽

Guangda Chen ◽

Lifan Pan ◽

Jun Ma ◽

Jianmin Ji ◽

...

Keyword(s):

Reinforcement Learning ◽

Collision Avoidance ◽

Multi Robot

Download Full-text

Robot obstacle avoidance system using deep reinforcement learning

Industrial Robot the international journal of robotics research and application ◽

10.1108/ir-06-2021-0127 ◽

2021 ◽

Vol ahead-of-print (ahead-of-print) ◽

Author(s):

Xiaojun Zhu ◽

Yinghao Liang ◽

Hanxu Sun ◽

Xueqian Wang ◽

Bin Ren

Keyword(s):

Reinforcement Learning ◽

Collision Avoidance ◽

Obstacle Avoidance ◽

Learning Algorithm ◽

Optimal Path ◽

Environmental Parameters ◽

Working Environment ◽

Content Type ◽

Practical Applications ◽

Human Operators

Purpose Most manufacturing plants choose the easy way of completely separating human operators from robots to prevent accidents, but as a result, it dramatically affects the overall quality and speed that is expected from human–robot collaboration. It is not an easy task to ensure human safety when he/she has entered a robot’s workspace, and the unstructured nature of those working environments makes it even harder. The purpose of this paper is to propose a real-time robot collision avoidance method to alleviate this problem. Design/methodology/approach In this paper, a model is trained to learn the direct control commands from the raw depth images through self-supervised reinforcement learning algorithm. To reduce the effect of sample inefficiency and safety during initial training, a virtual reality platform is used to simulate a natural working environment and generate obstacle avoidance data for training. To ensure a smooth transfer to a real robot, the automatic domain randomization technique is used to generate randomly distributed environmental parameters through the obstacle avoidance simulation of virtual robots in the virtual environment, contributing to better performance in the natural environment. Findings The method has been tested in both simulations with a real UR3 robot for several practical applications. The results of this paper indicate that the proposed approach can effectively make the robot safety-aware and learn how to divert its trajectory to avoid accidents with humans within the workspace. Research limitations/implications The method has been tested in both simulations with a real UR3 robot in several practical applications. The results indicate that the proposed approach can effectively make the robot be aware of safety and learn how to change its trajectory to avoid accidents with persons within the workspace. Originality/value This paper provides a novel collision avoidance framework that allows robots to work alongside human operators in unstructured and complex environments. The method uses end-to-end policy training to directly extract the optimal path from the visual inputs for the scene.

Download Full-text

Robust Collision Avoidance in Multi-Robot Systems - Implementation onto Real Robots -

Distributed Autonomous Robotic Systems 3 ◽

10.1007/978-3-642-72198-4_3 ◽

1998 ◽

pp. 23-33 ◽

Cited By ~ 4

Author(s):

Yoshikazu Arai ◽

Teruo Fujii ◽

Hajime Asama ◽

Hayato Kaetsu ◽

Isao Endo

Keyword(s):

Collision Avoidance ◽

Systems Implementation ◽

Robot Systems ◽

Multi Robot

Download Full-text

Distributed Coordination Architecture for Cooperative Task Planning and Execution of Intelligent Multi-Robot Systems

Handbook of Research on Advanced Intelligent Control Engineering and Automation - Advances in Computational Intelligence and Robotics ◽

10.4018/978-1-4666-7248-2.ch015 ◽

2015 ◽

pp. 407-426 ◽

Cited By ~ 4

Author(s):

Gen'ichi Yasuda

Keyword(s):

Large Scale ◽

Discrete Event ◽

Robotic Systems ◽

Task Planning ◽

Distributed Coordination ◽

Robot Systems ◽

Concurrent Processes ◽

Manufacturing Applications ◽

Robot Cooperation ◽

Multi Robot

This chapter provides a practical and intuitive way of cooperative task planning and execution for complex robotic systems using multiple robots in automated manufacturing applications. In large-scale complex robotic systems, because individual robots can autonomously execute their tasks, robotic activities are viewed as discrete event-driven asynchronous, concurrent processes. Further, since robotic activities are hierarchically defined, place/transition Petri nets can be properly used as specification tools on different levels of control abstraction. Net models representing inter-robot cooperation with synchronized interaction are presented to achieve distributed autonomous coordinated activities. An implementation of control software on hierarchical and distributed architecture is presented in an example multi-robot cell, where the higher level controller executes an activity-based global net model of task plan representing cooperative behaviors performed by the robots, and the parallel activities of the associated robots are synchronized without the coordinator through the transmission of requests and the reception of status.

Download Full-text

Multi-robot Target Encirclement Control with Collision Avoidance via Deep Reinforcement Learning

Journal of Intelligent & Robotic Systems ◽

10.1007/s10846-019-01106-x ◽

2019 ◽

Vol 99 (2) ◽

pp. 371-386 ◽

Cited By ~ 3

Author(s):

Junchong Ma ◽

Huimin Lu ◽

Junhao Xiao ◽

Zhiwen Zeng ◽

Zhiqiang Zheng

Keyword(s):

Reinforcement Learning ◽

Collision Avoidance ◽

Multi Robot

Download Full-text

A New Collision Avoidance Methodology for Multi-Robot-Systems Based on the Robots’ Dynamics

Field and Service Robotics ◽

10.1007/978-1-4471-1273-0_63 ◽

1988 ◽

pp. 419-426 ◽

Cited By ~ 1

Author(s):

E. Freund ◽

J. Roßmann ◽

M. Schluse

Keyword(s):

Collision Avoidance ◽

Robot Systems ◽

Multi Robot

Download Full-text

A Multi-Step Reinforcement Learning Algorithm

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.44-47.3611 ◽

2010 ◽

Vol 44-47 ◽

pp. 3611-3615 ◽

Cited By ~ 1

Author(s):

Zhi Cong Zhang ◽

Kai Shun Hu ◽

Hui Yu Huang ◽

Shuai Li ◽

Shao Yong Zhao

Keyword(s):

Reinforcement Learning ◽

Markov Decision Process ◽

Decision Process ◽

Large Scale ◽

Learning Algorithm ◽

Machine Learning Method ◽

Learning Method ◽

K Value ◽

Markov Decision ◽

Action Value

Reinforcement learning (RL) is a state or action value based machine learning method which approximately solves large-scale Markov Decision Process (MDP) or Semi-Markov Decision Process (SMDP). A multi-step RL algorithm called Sarsa(,k) is proposed, which is a compromised variation of Sarsa and Sarsa(). It is equivalent to Sarsa if k is 1 and is equivalent to Sarsa() if k is infinite. Sarsa(,k) adjust its performance by setting k value. Two forms of Sarsa(,k), forward view Sarsa(,k) and backward view Sarsa(,k), are constructed and proved equivalent in off-line updating.

Download Full-text