Combining Subgoal Graphs with Reinforcement Learning to Build a Rational Pathfinder

In this paper, we present a hierarchical path planning framework called SG–RL (subgoal graphs–reinforcement learning), to plan rational paths for agents maneuvering in continuous and uncertain environments. By “rational”, we mean (1) efficient path planning to eliminate first-move lags; (2) collision-free and smooth for agents with kinematic constraints satisfied. SG–RL works in a two-level manner. At the first level, SG–RL uses a geometric path-planning method, i.e., simple subgoal graphs (SSGs), to efficiently find optimal abstract paths, also called subgoal sequences. At the second level, SG–RL uses an RL method, i.e., least-squares policy iteration (LSPI), to learn near-optimal motion-planning policies which can generate kinematically feasible and collision-free trajectories between adjacent subgoals. The first advantage of the proposed method is that SSG can solve the limitations of sparse reward and local minima trap for RL agents; thus, LSPI can be used to generate paths in complex environments. The second advantage is that, when the environment changes slightly (i.e., unexpected obstacles appearing), SG–RL does not need to reconstruct subgoal graphs and replan subgoal sequences using SSGs, since LSPI can deal with uncertainties by exploiting its generalization ability to handle changes in environments. Simulation experiments in representative scenarios demonstrate that, compared with existing methods, SG–RL can work well on large-scale maps with relatively low action-switching frequencies and shorter path lengths, and SG–RL can deal with small changes in environments. We further demonstrate that the design of reward functions and the types of training environments are important factors for learning feasible policies.

Download Full-text

A DEEP REINFORCEMENT LEARNING APPROACH TO FLOCKING AND NAVIGATION OF UAVS IN LARGE-SCALE COMPLEX ENVIRONMENTS

2018 IEEE Global Conference on Signal and Information Processing (GlobalSIP) ◽

10.1109/globalsip.2018.8646428 ◽

2018 ◽

Cited By ~ 1

Author(s):

Chao Wang ◽

Jian Wang ◽

Xudong Zhang

Keyword(s):

Reinforcement Learning ◽

Large Scale ◽

Learning Approach ◽

Complex Environments

Download Full-text

Multi-Task Reinforcement Learning in Humans

10.1101/815332 ◽

2019 ◽

Cited By ~ 1

Author(s):

Momchil S. Tomov ◽

Eric Schulz ◽

Samuel J. Gershman

Keyword(s):

Reinforcement Learning ◽

State Of The Art ◽

Task Demands ◽

Human Intelligence ◽

Value Functions ◽

Complex Environments ◽

Multiple Features ◽

Model Free ◽

Reward Functions ◽

Study Participants

ABSTRACTThe ability to transfer knowledge across tasks and generalize to novel ones is an important hallmark of human intelligence. Yet not much is known about human multi-task reinforcement learning. We study participants’ behavior in a novel two-step decision making task with multiple features and changing reward functions. We compare their behavior to two state-of-the-art algorithms for multi-task reinforcement learning, one that maps previous policies and encountered features to new reward functions and one that approximates value functions across tasks, as well as to standard model-based and model-free algorithms. Across three exploratory experiments and a large preregistered experiment, our results provide strong evidence for a strategy that maps previously learned policies to novel scenarios. These results enrich our understanding of human reinforcement learning in complex environments with changing task demands.

Download Full-text

Unmanned Aerial Vehicle Path Planning Algorithm Based on Deep Reinforcement Learning in Large-Scale and Dynamic Environments

IEEE Access ◽

10.1109/access.2021.3057485 ◽

2021 ◽

Vol 9 ◽

pp. 24884-24900

Author(s):

Ronglei Xie ◽

Zhijun Meng ◽

Lifeng Wang ◽

Haochen Li ◽

Kaipeng Wang ◽

...

Keyword(s):

Reinforcement Learning ◽

Path Planning ◽

Unmanned Aerial Vehicle ◽

Large Scale ◽

Dynamic Environments ◽

Planning Algorithm ◽

Aerial Vehicle ◽

Vehicle Path ◽

Path Planning Algorithm

Download Full-text

Cooperative Path Planning for Aerial Recovery of a UAV Swarm Using Genetic Algorithm and Homotopic Approach

Applied Sciences ◽

10.3390/app10124154 ◽

2020 ◽

Vol 10 (12) ◽

pp. 4154

Author(s):

Yongbei Liu ◽

Naiming Qi ◽

Weiran Yao ◽

Jun Zhao ◽

Song Xu

Keyword(s):

Genetic Algorithm ◽

Path Planning ◽

High Performance ◽

Large Scale ◽

Coupling Mechanism ◽

Recovery Planning ◽

Planning Framework ◽

Uav Swarm ◽

Recovery Problem ◽

Multi Uav

To maximize the advantages of being low-cost, highly mobile, and having a high flexibility, aerial recovery technology is important for unmanned aerial vehicle (UAV) swarms. In particular, the operation mode of “launch-recovery-relaunch” will greatly improve the efficiency of a UAV swarm. However, it is difficult to realize large-scale aerial recovery of UAV swarms because this process involves complex multi-UAV recovery scheduling, path planning, rendezvous, and acquisition problems. In this study, the recovery problem of a UAV swarm by a mother aircraft has been investigated. To solve the problem, a recovery planning framework is proposed to establish the coupling mechanism between the scheduling and path planning of a multi-UAV aerial recovery. A genetic algorithm is employed to realize efficient and precise scheduling. A homotopic path planning approach is proposed to cover the paths with an expected length for long-range aerial recovery missions. Simulations in representative scenarios validate the effectiveness of the recovery planning framework and the proposed methods. It can be concluded that the recovery planning framework can achieve a high performance in dealing with the aerial recovery problem.

Download Full-text

Autonomous Navigation of UAVs in Large-Scale Complex Environments: A Deep Reinforcement Learning Approach

IEEE Transactions on Vehicular Technology ◽

10.1109/tvt.2018.2890773 ◽

2019 ◽

Vol 68 (3) ◽

pp. 2124-2136 ◽

Cited By ~ 35

Author(s):

Chao Wang ◽

Jian Wang ◽

Yuan Shen ◽

Xudong Zhang

Keyword(s):

Reinforcement Learning ◽

Autonomous Navigation ◽

Large Scale ◽

Learning Approach ◽

Complex Environments

Download Full-text

Monitoring an Advection-Diffusion Process Using Aerial Mobile Sensors

Unmanned Systems ◽

10.1142/s2301385015500144 ◽

2015 ◽

Vol 03 (03) ◽

pp. 221-238 ◽

Cited By ~ 9

Author(s):

Joakim Haugen ◽

Lars Imsland

Keyword(s):

Path Planning ◽

Diffusion Process ◽

Large Scale ◽

Optimization Problems ◽

Concentration Field ◽

Mobile Sensors ◽

Galerkin Finite Element ◽

Advection Diffusion ◽

Planning Framework ◽

Ice Concentration

A path planning framework for regional surveillance of a planar advection-diffusion process by aerial mobile sensors is proposed. The goal of the path planning is to produce feasible and collision-free trajectories for a set of aerial mobile sensors that minimize some uncertainty measure of the process under observation. The problem is formulated as a dynamic optimization problem and discretized into a large-scale nonlinear programming (NLP) problem using the Petrov–Galerkin finite element method in space and simultaneous collocation in time. Receding horizon optimization problems are solved in simulations with an advection-dominated ice concentration field. Simulations illustrate the usefulness of the proposed method.

Download Full-text

A Novel Static Path Planning Method for Mobile Anchor-Assisted Localization in Wireless Sensor Networks

International Journal of Sensors Wireless Communications and Control ◽

10.2174/2210327910999200723164502 ◽

2020 ◽

Vol 10 ◽

Author(s):

Abdelhady M. Naguib ◽

Shahzad Ali

Keyword(s):

Wireless Sensor Networks ◽

Sensor Networks ◽

Path Planning ◽

Sensor Node ◽

Large Scale ◽

Performance Metrics ◽

Wireless Sensor ◽

Anchor Node ◽

Mobile Anchor ◽

Planning Methods

Background: Many applications of Wireless Sensor Networks (WSNs) require awareness of sensor node’s location but not every sensor node can be equipped with a GPS receiver for localization, due to cost and energy constraints especially for large-scale networks. For localization, many algorithms have been proposed to enable a sensor node to be able to determine its location by utilizing a small number of special nodes called anchors that are equipped with GPS receivers. In recent years a promising method that significantly reduces the cost is to replace the set of statically deployed GPS anchors with one mobile anchor node equipped with a GPS unit that moves to cover the entire network. Objectives: This paper proposes a novel static path planning mechanism that enables a single anchor node to follow a predefined static path while periodically broadcasting its current location coordinates to the nearby sensors. This new path type is called SQUARE_SPIRAL and it is specifically designed to reduce the collinearity during localization. Results: Simulation results show that the performance of SQUARE_SPIRAL mechanism is better than other static path planning methods with respect to multiple performance metrics. Conclusion: This work includes an extensive comparative study of the existing static path planning methods then presents a comparison of the proposed mechanism with existing solutions by doing extensive simulations in NS-2.

Download Full-text