Autonomous Navigation of UAVs in Large-Scale Complex Environments: A Deep Reinforcement Learning Approach

2019 ◽  
Vol 68 (3) ◽  
pp. 2124-2136 ◽  
Author(s):  
Chao Wang ◽  
Jian Wang ◽  
Yuan Shen ◽  
Xudong Zhang
2020 ◽  
Vol 39 (7) ◽  
pp. 856-892 ◽  
Author(s):  
Tingxiang Fan ◽  
Pinxin Long ◽  
Wenxi Liu ◽  
Jia Pan

Developing a safe and efficient collision-avoidance policy for multiple robots is challenging in the decentralized scenarios where each robot generates its paths with limited observation of other robots’ states and intentions. Prior distributed multi-robot collision-avoidance systems often require frequent inter-robot communication or agent-level features to plan a local collision-free action, which is not robust and computationally prohibitive. In addition, the performance of these methods is not comparable with their centralized counterparts in practice. In this article, we present a decentralized sensor-level collision-avoidance policy for multi-robot systems, which shows promising results in practical applications. In particular, our policy directly maps raw sensor measurements to an agent’s steering commands in terms of the movement velocity. As a first step toward reducing the performance gap between decentralized and centralized methods, we present a multi-scenario multi-stage training framework to learn an optimal policy. The policy is trained over a large number of robots in rich, complex environments simultaneously using a policy-gradient-based reinforcement-learning algorithm. The learning algorithm is also integrated into a hybrid control framework to further improve the policy’s robustness and effectiveness. We validate the learned sensor-level collision-3avoidance policy in a variety of simulated and real-world scenarios with thorough performance evaluations for large-scale multi-robot systems. The generalization of the learned policy is verified in a set of unseen scenarios including the navigation of a group of heterogeneous robots and a large-scale scenario with 100 robots. Although the policy is trained using simulation data only, we have successfully deployed it on physical robots with shapes and dynamics characteristics that are different from the simulated agents, in order to demonstrate the controller’s robustness against the simulation-to-real modeling error. Finally, we show that the collision-avoidance policy learned from multi-robot navigation tasks provides an excellent solution for safe and effective autonomous navigation for a single robot working in a dense real human crowd. Our learned policy enables a robot to make effective progress in a crowd without getting stuck. More importantly, the policy has been successfully deployed on different types of physical robot platforms without tedious parameter tuning. Videos are available at https://sites.google.com/view/hybridmrca .


2021 ◽  
Vol 57 (2) ◽  
pp. 50-53
Author(s):  
Muhammad Mudassir Ejaz ◽  
Tong Boon Tang ◽  
Cheng‐Kai Lu

2019 ◽  
Vol 9 (2) ◽  
pp. 323 ◽  
Author(s):  
Junjie Zeng ◽  
Long Qin ◽  
Yue Hu ◽  
Cong Hu ◽  
Quanjun Yin

In this paper, we present a hierarchical path planning framework called SG–RL (subgoal graphs–reinforcement learning), to plan rational paths for agents maneuvering in continuous and uncertain environments. By “rational”, we mean (1) efficient path planning to eliminate first-move lags; (2) collision-free and smooth for agents with kinematic constraints satisfied. SG–RL works in a two-level manner. At the first level, SG–RL uses a geometric path-planning method, i.e., simple subgoal graphs (SSGs), to efficiently find optimal abstract paths, also called subgoal sequences. At the second level, SG–RL uses an RL method, i.e., least-squares policy iteration (LSPI), to learn near-optimal motion-planning policies which can generate kinematically feasible and collision-free trajectories between adjacent subgoals. The first advantage of the proposed method is that SSG can solve the limitations of sparse reward and local minima trap for RL agents; thus, LSPI can be used to generate paths in complex environments. The second advantage is that, when the environment changes slightly (i.e., unexpected obstacles appearing), SG–RL does not need to reconstruct subgoal graphs and replan subgoal sequences using SSGs, since LSPI can deal with uncertainties by exploiting its generalization ability to handle changes in environments. Simulation experiments in representative scenarios demonstrate that, compared with existing methods, SG–RL can work well on large-scale maps with relatively low action-switching frequencies and shorter path lengths, and SG–RL can deal with small changes in environments. We further demonstrate that the design of reward functions and the types of training environments are important factors for learning feasible policies.


Sensors ◽  
2021 ◽  
Vol 21 (4) ◽  
pp. 1468
Author(s):  
Razin Bin Issa ◽  
Modhumonty Das ◽  
Md. Saferi Rahman ◽  
Monika Barua ◽  
Md. Khalilur Rhaman ◽  
...  

Autonomous vehicle navigation in an unknown dynamic environment is crucial for both supervised- and Reinforcement Learning-based autonomous maneuvering. The cooperative fusion of these two learning approaches has the potential to be an effective mechanism to tackle indefinite environmental dynamics. Most of the state-of-the-art autonomous vehicle navigation systems are trained on a specific mapped model with familiar environmental dynamics. However, this research focuses on the cooperative fusion of supervised and Reinforcement Learning technologies for autonomous navigation of land vehicles in a dynamic and unknown environment. The Faster R-CNN, a supervised learning approach, identifies the ambient environmental obstacles for untroubled maneuver of the autonomous vehicle. Whereas, the training policies of Double Deep Q-Learning, a Reinforcement Learning approach, enable the autonomous agent to learn effective navigation decisions form the dynamic environment. The proposed model is primarily tested in a gaming environment similar to the real-world. It exhibits the overall efficiency and effectiveness in the maneuver of autonomous land vehicles.


Sign in / Sign up

Export Citation Format

Share Document