A spacecraft attitude manoeuvre planning algorithm based on improved policy gradient reinforcement learning

Journal of Navigation ◽

10.1017/s0373463321000813 ◽

2021 ◽

pp. 1-23

Author(s):

Bing Hua ◽

Shenggang Sun ◽

Yunhua Wu ◽

Zhiming Chen

Keyword(s):

Reinforcement Learning ◽

Convergence Rates ◽

Optimal Path ◽

Computation Time ◽

Spacecraft Attitude ◽

Planning Approach ◽

Adaptive Policy ◽

Policy Gradient ◽

Planning Algorithm ◽

High Orientation

Abstract To solve the problem of spacecraft attitude manoeuvre planning under dynamic multiple mandatory pointing constraints and prohibited pointing constraints, a systematic attitude manoeuvre planning approach is proposed that is based on improved policy gradient reinforcement learning. This paper presents a succinct model of dynamic multiple constraints that is similar to a real situation faced by an in-orbit spacecraft. By introducing return baseline and adaptive policy exploration methods, the proposed method overcomes issues such as large variances and slow convergence rates. Concurrently, the required computation time of the proposed method is markedly reduced. Using the proposed method, the near optimal path of the attitude manoeuvre can be determined, making the method suitable for the control of micro spacecraft. Simulation results demonstrate that the planning results fully satisfy all constraints, including six prohibited pointing constraints and two mandatory pointing constraints. The spacecraft also maintains high orientation accuracy to the Earth and Sun during all attitude manoeuvres.

Download Full-text

A 2D Optimal Path Planning Algorithm for Autonomous Underwater Vehicle Driving in Unknown Underwater Canyons

Journal of Marine Science and Engineering ◽

10.3390/jmse9030252 ◽

2021 ◽

Vol 9 (3) ◽

pp. 252

Author(s):

Yushan Sun ◽

Xiaokun Luo ◽

Xiangrui Ran ◽

Guocheng Zhang

Keyword(s):

Path Planning ◽

Obstacle Avoidance ◽

Autonomous Underwater Vehicles ◽

Optimal Path ◽

Small Scale ◽

Target Point ◽

Safe Driving ◽

Policy Gradient ◽

Planning Algorithm ◽

Path Planning Algorithm

This research aims to solve the safe navigation problem of autonomous underwater vehicles (AUVs) in deep ocean, which is a complex and changeable environment with various mountains. When an AUV reaches the deep sea navigation, it encounters many underwater canyons, and the hard valley walls threaten its safety seriously. To solve the problem on the safe driving of AUV in underwater canyons and address the potential of AUV autonomous obstacle avoidance in uncertain environments, an improved AUV path planning algorithm based on the deep deterministic policy gradient (DDPG) algorithm is proposed in this work. This method refers to an end-to-end path planning algorithm that optimizes the strategy directly. It takes sensor information as input and driving speed and yaw angle as outputs. The path planning algorithm can reach the predetermined target point while avoiding large-scale static obstacles, such as valley walls in the simulated underwater canyon environment, as well as sudden small-scale dynamic obstacles, such as marine life and other vehicles. In addition, this research aims at the multi-objective structure of the obstacle avoidance of path planning, modularized reward function design, and combined artificial potential field method to set continuous rewards. This research also proposes a new algorithm called deep SumTree-deterministic policy gradient algorithm (SumTree-DDPG), which improves the random storage and extraction strategy of DDPG algorithm experience samples. According to the importance of the experience samples, the samples are classified and stored in combination with the SumTree structure, high-quality samples are extracted continuously, and SumTree-DDPG algorithm finally improves the speed of the convergence model. Finally, this research uses Python language to write an underwater canyon simulation environment and builds a deep reinforcement learning simulation platform on a high-performance computer to conduct simulation learning training for AUV. Data simulation verified that the proposed path planning method can guide the under-actuated underwater robot to navigate to the target without colliding with any obstacles. In comparison with the DDPG algorithm, the stability, training’s total reward, and robustness of the improved Sumtree-DDPG algorithm planner in this study are better.

Download Full-text

Reinforcement learning-based radar-evasive path planning: a comparative analysis

The Aeronautical Journal ◽

10.1017/aer.2021.85 ◽

2021 ◽

pp. 1-18

Author(s):

R.U. Hameed ◽

A. Maqsood ◽

A.J. Hashmi ◽

M.T. Saeed ◽

R. Riaz

Keyword(s):

Reinforcement Learning ◽

Path Planning ◽

Learning Algorithms ◽

Optimal Path ◽

Trust Region ◽

Optimal Path Planning ◽

Optimal Paths ◽

Policy Gradient ◽

Model Aircraft ◽

Tracking Model

Abstract This paper discusses the utilisation of deep reinforcement learning algorithms to obtain optimal paths for an aircraft to avoid or minimise radar detection and tracking. A modular approach is adopted to formulate the problem, including the aircraft kinematics model, aircraft radar cross-section model and radar tracking model. A virtual environment is designed for single and multiple radar cases to obtain optimal paths. The optimal trajectories are generated through deep reinforcement learning in this study. Specifically, three algorithms, namely deep deterministic policy gradient, trust region policy optimisation and proximal policy optimisation, are used to find optimal paths for five test cases. The comparison is carried out based on six performance indicators. The investigation proves the importance of these reinforcement learning algorithms in optimal path planning. The results indicate that the proximal policy optimisation approach performed better for optimal paths in general.

Download Full-text

Review on path planning algorithm for unmanned aerial vehicles

Indonesian Journal of Electrical Engineering and Computer Science ◽

10.11591/ijeecs.v24.i2.pp1017-1026 ◽

2021 ◽

Vol 24 (2) ◽

pp. 1017

Author(s):

Nurul Saliha Amani Ibrahim ◽

Faiz Asraf Saparudin

Keyword(s):

Path Planning ◽

Unmanned Aerial Vehicles ◽

Autonomous Vehicles ◽

Autonomous Vehicle ◽

Optimal Path ◽

Computation Time ◽

Planning Problem ◽

Aerial Vehicles ◽

Planning Algorithm ◽

Path Planning Algorithm

The path planning problem has been a crucial topic to be solved in autonomous vehicles. Path planning consists operations to find the route that passes through all of the points of interest in a given area. Several algorithms have been proposed and outlined in the various literature for the path planning of autonomous vehicle especially for unmanned aerial vehicles (UAV). The algorithms are not guaranteed to give full performance in each path planning cases but each one of them has their own specification which makes them suitable in sophisticated situation. This review paper evaluates several possible different path planning approaches of UAVs in terms optimal path, probabilistic completeness and computation time along with their application in specific problems.

Download Full-text

Hybrid Bidirectional Rapidly Exploring Random Tree Path Planning Algorithm with Reinforcement Learning

Journal of Advanced Computational Intelligence and Intelligent Informatics ◽

10.20965/jaciii.2021.p0121 ◽

2021 ◽

Vol 25 (1) ◽

pp. 121-129

Author(s):

Junkui Wang ◽

◽

Kaoru Hirota ◽

Xiangdong Wu ◽

Yaping Dai ◽

...

Keyword(s):

Reinforcement Learning ◽

Path Planning ◽

Optimal Path ◽

Random Tree ◽

Local Optimum ◽

Local Optima ◽

Exploration Process ◽

Planning Algorithm ◽

Alternative Approaches ◽

Path Planning Algorithm

The randomness of path generation and slow convergence to the optimal path are two major problems in the current rapidly exploring random tree (RRT) path planning algorithm. Herein, a novel reinforcement-learning-based hybrid bidirectional rapidly exploring random tree (H-BRRT) is presented to solve these problems. To model the random exploration process, a target gravitational strategy is introduced. Reinforcement learning is applied to the improved target gravitational strategy using two operations: random exploration and target gravitational exploration. The algorithm is controlled to switch operations adaptively according to the accumulated performance. It not only improves the search efficiency, but also shortens the generated path after the proposed strategy is applied to a bidirectional rapidly exploring random tree (BRRT). In addition, to solve the problem of the traditional RRT continuously falling into the local optimum, an improved exploration strategy with collision weight is applied to the BRRT. Experimental results implemented in a robot operating system indicate that the proposed H-BRRT significantly outperforms alternative approaches such as the RRT and BRRT. The proposed algorithm enhances the capability of identifying unknown spaces and avoiding local optima.

Download Full-text

Intelligent Path Planning Approach for Autonomous Mobile Robot

Journal of Robotics and Mechatronics ◽

10.20965/jrm.2021.p1423 ◽

2021 ◽

Vol 33 (6) ◽

pp. 1423-1428

Author(s):

Ibrahim M. Al-Adwan ◽

Keyword(s):

Path Planning ◽

Mobile Robot ◽

Urban Environment ◽

Optimal Path ◽

Autonomous Mobile Robot ◽

Current Position ◽

Planning Approach ◽

Planning Algorithm ◽

Path Planning Algorithm ◽

Grid Based

This paper presents a new path planning algorithm for an autonomous mobile robot. It is desired that the robot reaches its goal in a known or partially known environment (e.g., a warehouse or an urban environment) and avoids collisions with walls and other obstacles. To this end, a new, efficient, simple, and flexible path finder strategy for the robot is proposed in this paper. With the proposed strategy, the optimal path from the robot’s current position to the goal position is guaranteed. The environment is represented as a grid-based map, which is then divided into a predefined number of subfields to reduce the number of required computations. This leads to a reduction in the load on the controller and allows a real-time response. To evaluate the flexibility and efficiency of the proposed strategy, several tests were simulated with environments of different sizes and obstacle distributions. The experimental results demonstrate the reliability and efficiency of the proposed algorithm.

Download Full-text

An Optimal Vehicle Speed Planning Algorithm for Regenerative Braking at Traffic Lights Intersections based on Reinforcement Learning

2020 4th CAA International Conference on Vehicular Control and Intelligence (CVCI) ◽

10.1109/cvci51460.2020.9338590 ◽

2020 ◽

Author(s):

Yuchuan Zhang ◽

Hui Xie ◽

Kang Song

Keyword(s):

Reinforcement Learning ◽

Regenerative Braking ◽

Vehicle Speed ◽

Traffic Lights ◽

Planning Algorithm

Download Full-text

Reinforcement Learning based on MPC and the Stochastic Policy Gradient Method

2021 American Control Conference (ACC) ◽

10.23919/acc50511.2021.9482765 ◽

2021 ◽

Author(s):

Sebastien Gros ◽

Mario Zanon

Keyword(s):

Reinforcement Learning ◽

Gradient Method ◽

Policy Gradient

Download Full-text

Strategy Generation Based on Reinforcement Learning with Deep Deterministic Policy Gradient for UCAV

2020 16th International Conference on Control, Automation, Robotics and Vision (ICARCV) ◽

10.1109/icarcv50220.2020.9305446 ◽

2020 ◽

Author(s):

Yunhong Ma ◽

Shuyao Bai ◽

Yifei Zhao ◽

Chao Song ◽

Jie Yang

Keyword(s):

Reinforcement Learning ◽

Policy Gradient

Download Full-text

Collision-free path planning for welding manipulator via hybrid algorithm of deep reinforcement learning and inverse kinematics

Complex & Intelligent Systems ◽

10.1007/s40747-021-00366-1 ◽

2021 ◽

Author(s):

Jie Zhong ◽

Tao Wang ◽

Lianglun Cheng

Keyword(s):

Reinforcement Learning ◽

Path Planning ◽

Free Path ◽

Inverse Kinematics ◽

Multiple Dimensions ◽

Continuous State ◽

Planning Algorithm ◽

Convergence Performance ◽

Path Planner ◽

Action Spaces

AbstractIn actual welding scenarios, an effective path planner is needed to find a collision-free path in the configuration space for the welding manipulator with obstacles around. However, as a state-of-the-art method, the sampling-based planner only satisfies the probability completeness and its computational complexity is sensitive with state dimension. In this paper, we propose a path planner for welding manipulators based on deep reinforcement learning for solving path planning problems in high-dimensional continuous state and action spaces. Compared with the sampling-based method, it is more robust and is less sensitive with state dimension. In detail, to improve the learning efficiency, we introduce the inverse kinematics module to provide prior knowledge while a gain module is also designed to avoid the local optimal policy, we integrate them into the training algorithm. To evaluate our proposed planning algorithm in multiple dimensions, we conducted multiple sets of path planning experiments for welding manipulators. The results show that our method not only improves the convergence performance but also is superior in terms of optimality and robustness of planning compared with most other planning algorithms.

Download Full-text

Formula-E race strategy development using distributed policy gradient reinforcement learning

Knowledge-Based Systems ◽

10.1016/j.knosys.2021.106781 ◽

2021 ◽

Vol 216 ◽

pp. 106781

Author(s):

Xuze Liu ◽

Abbas Fotouhi ◽

Daniel J. Auger

Keyword(s):

Reinforcement Learning ◽

Strategy Development ◽

Policy Gradient

Download Full-text