scholarly journals Combining Subgoal Graphs with Reinforcement Learning to Build a Rational Pathfinder

2019 ◽  
Vol 9 (2) ◽  
pp. 323 ◽  
Author(s):  
Junjie Zeng ◽  
Long Qin ◽  
Yue Hu ◽  
Cong Hu ◽  
Quanjun Yin

In this paper, we present a hierarchical path planning framework called SG–RL (subgoal graphs–reinforcement learning), to plan rational paths for agents maneuvering in continuous and uncertain environments. By “rational”, we mean (1) efficient path planning to eliminate first-move lags; (2) collision-free and smooth for agents with kinematic constraints satisfied. SG–RL works in a two-level manner. At the first level, SG–RL uses a geometric path-planning method, i.e., simple subgoal graphs (SSGs), to efficiently find optimal abstract paths, also called subgoal sequences. At the second level, SG–RL uses an RL method, i.e., least-squares policy iteration (LSPI), to learn near-optimal motion-planning policies which can generate kinematically feasible and collision-free trajectories between adjacent subgoals. The first advantage of the proposed method is that SSG can solve the limitations of sparse reward and local minima trap for RL agents; thus, LSPI can be used to generate paths in complex environments. The second advantage is that, when the environment changes slightly (i.e., unexpected obstacles appearing), SG–RL does not need to reconstruct subgoal graphs and replan subgoal sequences using SSGs, since LSPI can deal with uncertainties by exploiting its generalization ability to handle changes in environments. Simulation experiments in representative scenarios demonstrate that, compared with existing methods, SG–RL can work well on large-scale maps with relatively low action-switching frequencies and shorter path lengths, and SG–RL can deal with small changes in environments. We further demonstrate that the design of reward functions and the types of training environments are important factors for learning feasible policies.

2019 ◽  
Author(s):  
Momchil S. Tomov ◽  
Eric Schulz ◽  
Samuel J. Gershman

ABSTRACTThe ability to transfer knowledge across tasks and generalize to novel ones is an important hallmark of human intelligence. Yet not much is known about human multi-task reinforcement learning. We study participants’ behavior in a novel two-step decision making task with multiple features and changing reward functions. We compare their behavior to two state-of-the-art algorithms for multi-task reinforcement learning, one that maps previous policies and encountered features to new reward functions and one that approximates value functions across tasks, as well as to standard model-based and model-free algorithms. Across three exploratory experiments and a large preregistered experiment, our results provide strong evidence for a strategy that maps previously learned policies to novel scenarios. These results enrich our understanding of human reinforcement learning in complex environments with changing task demands.


2020 ◽  
Vol 10 (12) ◽  
pp. 4154
Author(s):  
Yongbei Liu ◽  
Naiming Qi ◽  
Weiran Yao ◽  
Jun Zhao ◽  
Song Xu

To maximize the advantages of being low-cost, highly mobile, and having a high flexibility, aerial recovery technology is important for unmanned aerial vehicle (UAV) swarms. In particular, the operation mode of “launch-recovery-relaunch” will greatly improve the efficiency of a UAV swarm. However, it is difficult to realize large-scale aerial recovery of UAV swarms because this process involves complex multi-UAV recovery scheduling, path planning, rendezvous, and acquisition problems. In this study, the recovery problem of a UAV swarm by a mother aircraft has been investigated. To solve the problem, a recovery planning framework is proposed to establish the coupling mechanism between the scheduling and path planning of a multi-UAV aerial recovery. A genetic algorithm is employed to realize efficient and precise scheduling. A homotopic path planning approach is proposed to cover the paths with an expected length for long-range aerial recovery missions. Simulations in representative scenarios validate the effectiveness of the recovery planning framework and the proposed methods. It can be concluded that the recovery planning framework can achieve a high performance in dealing with the aerial recovery problem.


2015 ◽  
Vol 03 (03) ◽  
pp. 221-238 ◽  
Author(s):  
Joakim Haugen ◽  
Lars Imsland

A path planning framework for regional surveillance of a planar advection-diffusion process by aerial mobile sensors is proposed. The goal of the path planning is to produce feasible and collision-free trajectories for a set of aerial mobile sensors that minimize some uncertainty measure of the process under observation. The problem is formulated as a dynamic optimization problem and discretized into a large-scale nonlinear programming (NLP) problem using the Petrov–Galerkin finite element method in space and simultaneous collocation in time. Receding horizon optimization problems are solved in simulations with an advection-dominated ice concentration field. Simulations illustrate the usefulness of the proposed method.


Author(s):  
Abdelhady M. Naguib ◽  
Shahzad Ali

Background: Many applications of Wireless Sensor Networks (WSNs) require awareness of sensor node’s location but not every sensor node can be equipped with a GPS receiver for localization, due to cost and energy constraints especially for large-scale networks. For localization, many algorithms have been proposed to enable a sensor node to be able to determine its location by utilizing a small number of special nodes called anchors that are equipped with GPS receivers. In recent years a promising method that significantly reduces the cost is to replace the set of statically deployed GPS anchors with one mobile anchor node equipped with a GPS unit that moves to cover the entire network. Objectives: This paper proposes a novel static path planning mechanism that enables a single anchor node to follow a predefined static path while periodically broadcasting its current location coordinates to the nearby sensors. This new path type is called SQUARE_SPIRAL and it is specifically designed to reduce the collinearity during localization. Results: Simulation results show that the performance of SQUARE_SPIRAL mechanism is better than other static path planning methods with respect to multiple performance metrics. Conclusion: This work includes an extensive comparative study of the existing static path planning methods then presents a comparison of the proposed mechanism with existing solutions by doing extensive simulations in NS-2.


Sign in / Sign up

Export Citation Format

Share Document