Comparing Deep Reinforcement Learning Algorithms’ Ability to Safely Navigate Challenging Waters

Reinforcement Learning (RL) controllers have proved to effectively tackle the dual objectives of path following and collision avoidance. However, finding which RL algorithm setup optimally trades off these two tasks is not necessarily easy. This work proposes a methodology to explore this that leverages analyzing the performance and task-specific behavioral characteristics for a range of RL algorithms applied to path-following and collision-avoidance for underactuated surface vehicles in environments of increasing complexity. Compared to the introduced RL algorithms, the results show that the Proximal Policy Optimization (PPO) algorithm exhibits superior robustness to changes in the environment complexity, the reward function, and when generalized to environments with a considerable domain gap from the training environment. Whereas the proposed reward function significantly improves the competing algorithms’ ability to solve the training environment, an unexpected consequence of the dimensionality reduction in the sensor suite, combined with the domain gap, is identified as the source of their impaired generalization performance.

Download Full-text

Quadrotor Path Following and Reactive Obstacle Avoidance with Deep Reinforcement Learning

Journal of Intelligent & Robotic Systems ◽

10.1007/s10846-021-01491-2 ◽

2021 ◽

Vol 103 (4) ◽

Author(s):

Bartomeu Rubí ◽

Bernardo Morcego ◽

Ramon Pérez

Keyword(s):

Reinforcement Learning ◽

Obstacle Avoidance ◽

Low Cost ◽

Path Following ◽

The State ◽

Gradient Algorithm ◽

Avoidance Task ◽

Learning Approaches ◽

Reward Function ◽

Novel Structure

AbstractA deep reinforcement learning approach for solving the quadrotor path following and obstacle avoidance problem is proposed in this paper. The problem is solved with two agents: one for the path following task and another one for the obstacle avoidance task. A novel structure is proposed, where the action computed by the obstacle avoidance agent becomes the state of the path following agent. Compared to traditional deep reinforcement learning approaches, the proposed method allows to interpret the training process outcomes, is faster and can be safely trained on the real quadrotor. Both agents implement the Deep Deterministic Policy Gradient algorithm. The path following agent was developed in a previous work. The obstacle avoidance agent uses the information provided by a low-cost LIDAR to detect obstacles around the vehicle. Since LIDAR has a narrow field-of-view, an approach for providing the agent with a memory of the previously seen obstacles is developed. A detailed description of the process of defining the state vector, the reward function and the action of this agent is given. The agents are programmed in python/tensorflow and are trained and tested in the RotorS/gazebo platform. Simulations results prove the validity of the proposed approach.

Download Full-text

Taming an Autonomous Surface Vehicle for Path Following and Collision Avoidance Using Deep Reinforcement Learning

IEEE Access ◽

10.1109/access.2020.2976586 ◽

2020 ◽

Vol 8 ◽

pp. 41466-41481 ◽

Cited By ~ 4

Author(s):

Eivind Meyer ◽

Haakon Robinson ◽

Adil Rasheed ◽

Omer San

Keyword(s):

Reinforcement Learning ◽

Collision Avoidance ◽

Path Following ◽

Autonomous Surface Vehicle

Download Full-text

Quadrotor Motion Control Using Deep Reinforcement Learning

Journal of Unmanned Vehicle Systems ◽

10.1139/juvs-2021-0010 ◽

2021 ◽

Author(s):

Zifei Jiang ◽

Alan F. Lynch

Keyword(s):

Reinforcement Learning ◽

Neural Nets ◽

Neural Net ◽

Reward Function ◽

Model Free ◽

Policy Gradient ◽

Aerial Vehicle ◽

Stochastic Controller ◽

Policy Optimization ◽

Gradient Approach

We present a deep neural net-based controller trained by a model-free reinforcement learning (RL) algorithm to achieve hover stabilization for a quadrotor unmanned aerial vehicle (UAV). With RL, two neural nets are trained. One neural net is used as a stochastic controller which gives the distribution of control inputs. The other maps the UAV state to a scalar which estimates the reward of the controller. A proximal policy optimization (PPO) method, which is an actor-critic policy gradient approach, is used to train the neural nets. Simulation results show that the trained controller achieves a comparable level of performance to a manually-tuned PID controller, despite not depending on any model information. The paper considers different choices of reward function and their influence on controller performance.

Download Full-text

A study of multiple reward function performances for vehicle collision avoidance systems applying the DQN algorithm in reinforcement learning

IOP Conference Series Materials Science and Engineering ◽

10.1088/1757-899x/1176/1/012033 ◽

2021 ◽

Vol 1176 (1) ◽

pp. 012033

Author(s):

N J Zakaria ◽

M I Shapiai ◽

N Wahid

Keyword(s):

Reinforcement Learning ◽

Collision Avoidance ◽

Reward Function ◽

Vehicle Collision

Download Full-text

Reinforcement learning-based collision avoidance: impact of reward function and knowledge transfer

Artificial intelligence for engineering design analysis and manufacturing ◽

10.1017/s0890060420000141 ◽

2020 ◽

Vol 34 (2) ◽

pp. 207-222

Author(s):

Xiongqing Liu ◽

Yan Jin

Keyword(s):

Reinforcement Learning ◽

Collision Avoidance ◽

Control Strategies ◽

Design Parameters ◽

Complex Tasks ◽

Reward Function ◽

Multi Stage ◽

Ship Steering ◽

Machine Learning Approach ◽

Reward Functions

AbstractCollision avoidance for robots and vehicles in unpredictable environments is a challenging task. Various control strategies have been developed for the agent (i.e., robots or vehicles) to sense the environment, assess the situation, and select the optimal actions to avoid collision and accomplish its mission. In our research on autonomous ships, we take a machine learning approach to collision avoidance. The lack of available ship steering data of human ship masters has made it necessary to acquire collision avoidance knowledge through reinforcement learning (RL). Given that the learned neural network tends to be a black box, it is desirable that a method is available which can be used to design an agent's behavior so that the desired knowledge can be captured. Furthermore, RL with complex tasks can be either time consuming or unfeasible. A multi-stage learning method is needed in which agents can learn from simple tasks and then transfer their learned knowledge to closely related but more complex tasks. In this paper, we explore the ways of designing agent behaviors through tuning reward functions and devise a transfer RL method for multi-stage knowledge acquisition. The computer simulation-based agent training results have shown that it is important to understand the roles of each component in a reward function and the various design parameters in transfer RL. The settings of these parameters are all dependent on the complexity of the tasks and the similarities between them.

Download Full-text

Decision-Making for the Autonomous Navigation of Maritime Autonomous Surface Ships Based on Scene Division and Deep Reinforcement Learning

Sensors ◽

10.3390/s19184055 ◽

2019 ◽

Vol 19 (18) ◽

pp. 4055 ◽

Cited By ~ 9

Author(s):

Zhang ◽

Wang ◽

Liu ◽

Chen

Keyword(s):

Decision Making ◽

Reinforcement Learning ◽

Collision Avoidance ◽

Autonomous Navigation ◽

Learning Algorithm ◽

Q Learning ◽

Reward Function ◽

International Regulations ◽

Convergence Trend ◽

Decision Making Model

This research focuses on the adaptive navigation of maritime autonomous surface ships (MASSs) in an uncertain environment. To achieve intelligent obstacle avoidance of MASSs in a port, an autonomous navigation decision-making model based on hierarchical deep reinforcement learning is proposed. The model is mainly composed of two layers: the scene division layer and an autonomous navigation decision-making layer. The scene division layer mainly quantifies the sub-scenarios according to the International Regulations for Preventing Collisions at Sea (COLREG). This research divides the navigational situation of a ship into entities and attributes based on the ontology model and Protégé language. In the decision-making layer, we designed a deep Q-learning algorithm utilizing the environmental model, ship motion space, reward function, and search strategy to learn the environmental state in a quantized sub-scenario to train the navigation strategy. Finally, two sets of verification experiments of the deep reinforcement learning (DRL) and improved DRL algorithms were designed with Rizhao port as a study case. Moreover, the experimental data were analyzed in terms of the convergence trend, iterative path, and collision avoidance effect. The results indicate that the improved DRL algorithm could effectively improve the navigation safety and collision avoidance.

Download Full-text

Aero-engine acceleration control using deep reinforcement learning with phase-based reward function

Proceedings of the Institution of Mechanical Engineers Part G Journal of Aerospace Engineering ◽

10.1177/09544100211046225 ◽

2021 ◽

pp. 095441002110462

Author(s):

Qian-Kun Hu ◽

Yong-Ping Zhao

Keyword(s):

Reinforcement Learning ◽

Trust Region ◽

Engine Control ◽

Control Task ◽

Q Learning ◽

Reward Function ◽

Engine Control System ◽

Aero Engine ◽

Markov Decision ◽

Policy Optimization

In this paper, the conventional aero-engine acceleration control task is formulated into a Markov Decision Process (MDP) problem. Then, a novel phase-based reward function is proposed to enhance the performance of deep reinforcement learning (DRL) in solving feedback control tasks. With that reward function, an aero-engine controller based on Trust Region Policy Optimization (TRPO) is developed to improve the aero-engine acceleration performance. Four comparison simulations were conducted to verify the effectiveness of the proposed methods. The simulation results show that the phase-based reward function helps to eliminate the oscillation problem of the aero-engine control system, which is caused by the traditional goal-based reward function when DRL is applied to the aero-engine control. And the TRPO controller outperforms deep Q-learning (DQN) and the proportional-integral-derivative (PID) in the aero-engine acceleration control task. Compared to DQN and PID controller, the acceleration time of aero-engine is decreased by 0.6 and 2.58 s, respectively, and the aero-engine acceleration performance is improved by 16.8 and 46.4 % each.

Download Full-text

Deep Reinforcement Learning Controller for 3D Path Following and Collision Avoidance by Autonomous Underwater Vehicles

Frontiers in Robotics and AI ◽

10.3389/frobt.2020.566037 ◽

2021 ◽

Vol 7 ◽

Author(s):

Simen Theie Havenstrøm ◽

Adil Rasheed ◽

Omer San

Keyword(s):

Decision Making ◽

Reinforcement Learning ◽

Collision Avoidance ◽

Autonomous Agents ◽

Autonomous Vehicle ◽

Autonomous Underwater Vehicles ◽

A Priori ◽

Path Following ◽

Underwater Vehicles ◽

Stability Of Dynamical Systems

Control theory provides engineers with a multitude of tools to design controllers that manipulate the closed-loop behavior and stability of dynamical systems. These methods rely heavily on insights into the mathematical model governing the physical system. However, in complex systems, such as autonomous underwater vehicles performing the dual objective of path following and collision avoidance, decision making becomes nontrivial. We propose a solution using state-of-the-art Deep Reinforcement Learning (DRL) techniques to develop autonomous agents capable of achieving this hybrid objective without having a priori knowledge about the goal or the environment. Our results demonstrate the viability of DRL in path following and avoiding collisions towards achieving human-level decision making in autonomous vehicle systems within extreme obstacle configurations.

Download Full-text

Training a model-free reinforcement learning controller for a 3-degree-of-freedom helicopter under multiple constraints

Measurement and Control ◽

10.1177/0020294019847711 ◽

2019 ◽

Vol 52 (7-8) ◽

pp. 844-854 ◽

Cited By ~ 3

Author(s):

Shengri Xue ◽

Zhan Li ◽

Liu Yang

Keyword(s):

Reinforcement Learning ◽

Attitude Control ◽

Degree Of Freedom ◽

Multiple Constraints ◽

Attitude Control System ◽

Multiple Control ◽

Training Environment ◽

Design Data ◽

Reward Function ◽

Model Free

The purpose of the article is to design data-driven attitude controllers for a 3-degree-of-freedom experimental helicopter under multiple constraints. Controllers were updated by utilizing the reinforcement learning technique. The 3-degree-of-freedom helicopter platform is an approximation to a practical helicopter attitude control system, which includes realistic features such as complicated dynamics, coupling and uncertainties. The method in this paper first describes the training environment, which consists of user-defined constraints and performance expectations by using a reward function module. Then, actor–critic-based controllers were designed for helicopter elevation and pitch axis. Next, the policy gradient method, which is an important branch of the reinforcement learning algorithms, is utilized to train the networks and optimize controllers. Finally, from experimental results acquired by the 3-degree-of-freedom helicopter platform, the advantages of the proposed method are illustrated by satisfying multiple control constraints.

Download Full-text

A Novel Ship Collision Avoidance Awareness Approach for Cooperating Ships Using Multi-Agent Deep Reinforcement Learning

Journal of Marine Science and Engineering ◽

10.3390/jmse9101056 ◽

2021 ◽

Vol 9 (10) ◽

pp. 1056

Author(s):

Chen Chen ◽

Feng Ma ◽

Xiaobin Xu ◽

Yuwang Chen ◽

Jin Wang

Keyword(s):

Reinforcement Learning ◽

Collision Avoidance ◽

Driving Forces ◽

Machine Learning Techniques ◽

Practical Significance ◽

Individual Agent ◽

Reward Function ◽

Learning Techniques ◽

International Regulations ◽

Multi Agent

Ships are special machineries with large inertias and relatively weak driving forces. Simulating the manual operations of manipulating ships with artificial intelligence (AI) and machine learning techniques becomes more and more common, in which avoiding collisions in crowded waters may be the most challenging task. This research proposes a cooperative collision avoidance approach for multiple ships using a multi-agent deep reinforcement learning (MADRL) algorithm. Specifically, each ship is modeled as an individual agent, controlled by a Deep Q-Network (DQN) method and described by a dedicated ship motion model. Each agent observes the state of itself and other ships as well as the surrounding environment. Then, agents analyze the navigation situation and make motion decisions accordingly. In particular, specific reward function schemas are designed to simulate the degree of cooperation among agents. According to the International Regulations for Preventing Collisions at Sea (COLREGs), three typical scenarios of simulation, which are head-on, overtaking and crossing, are established to validate the proposed approach. With sufficient training of MADRL, the ship agents were capable of avoiding collisions through cooperation in narrow crowded waters. This method provides new insights for bionic modeling of ship operations, which is of important theoretical and practical significance.

Download Full-text