Deep Reinforcement Learning for Vectored Thruster Autonomous Underwater Vehicle Control

Autonomous underwater vehicles (AUVs) are widely used to accomplish various missions in the complex marine environment; the design of a control system for AUVs is particularly difficult due to the high nonlinearity, variations in hydrodynamic coefficients, and external force from ocean currents. In this paper, we propose a controller based on deep reinforcement learning (DRL) in a simulation environment for studying the control performance of the vectored thruster AUV. RL is an important method of artificial intelligence that can learn behavior through trial-and-error interactions with the environment, so it does not need to provide an accurate AUV control model that is very hard to establish. The proposed RL algorithm only uses the information that can be measured by sensors inside the AUVs as the input parameters, and the outputs of the designed controller are the continuous control actions, which are the commands that are set to the vectored thruster. Moreover, a reward function is developed for deep RL controller considering different factors which actually affect the control accuracy of AUV navigation control. To confirm the algorithm’s effectiveness, a series of simulations are carried out in the designed simulation environment, which is a method to save time and improve efficiency. Simulation results prove the feasibility of the deep RL algorithm applied to the control system for AUV. Furthermore, our work also provides an optional method for robot control problems to deal with improving technology requirements and complicated application environments.

Download Full-text

Safe Navigation Algorithm for Autonomous Underwater Vehicles

Giroskopiya i Navigatsiya ◽

10.17285/0869-7035.0058 ◽

2021 ◽

Vol 29 (1) ◽

pp. 97-110

Author(s):

V.S. Bykova ◽

◽

A.I. Mashoshin ◽

I.V. Pashkevich ◽

◽

...

Keyword(s):

Control System ◽

Autonomous Underwater Vehicle ◽

Autonomous Underwater Vehicles ◽

Underwater Vehicle ◽

Underwater Vehicles ◽

Limited Size ◽

Safe Navigation ◽

Navigation Algorithm ◽

Navigation Algorithms ◽

Ice Edge

Two safe navigation algorithms for autonomous underwater vehicles are described: algorithm for avoidance of point obstacles including all the moving underwater and surface objects, and limited size bottom objects, and algorithm for bypassing extended obstacles such as bottom elevations, rough lower ice edge, garbage patches. These algorithms are developed for a control system of a heavyweight autonomous underwater vehicle.

Download Full-text

Navigation in Unknown Dynamic Environments Based on Deep Reinforcement Learning

Sensors ◽

10.3390/s19183837 ◽

2019 ◽

Vol 19 (18) ◽

pp. 3837 ◽

Cited By ~ 7

Author(s):

Junjie Zeng ◽

Rusheng Ju ◽

Long Qin ◽

Yue Hu ◽

Quanjun Yin ◽

...

Keyword(s):

Reinforcement Learning ◽

Domain Knowledge ◽

Moving Objects ◽

Dynamic Environment ◽

Dynamic Environments ◽

Continuous Control ◽

Complex Environments ◽

Reward Function ◽

Knowledge Based ◽

Task Architecture

In this paper, we propose a novel Deep Reinforcement Learning (DRL) algorithm which can navigate non-holonomic robots with continuous control in an unknown dynamic environment with moving obstacles. We call the approach MK-A3C (Memory and Knowledge-based Asynchronous Advantage Actor-Critic) for short. As its first component, MK-A3C builds a GRU-based memory neural network to enhance the robot’s capability for temporal reasoning. Robots without it tend to suffer from a lack of rationality in face of incomplete and noisy estimations for complex environments. Additionally, robots with certain memory ability endowed by MK-A3C can avoid local minima traps by estimating the environmental model. Secondly, MK-A3C combines the domain knowledge-based reward function and the transfer learning-based training task architecture, which can solve the non-convergence policies problems caused by sparse reward. These improvements of MK-A3C can efficiently navigate robots in unknown dynamic environments, and satisfy kinetic constraints while handling moving objects. Simulation experiments show that compared with existing methods, MK-A3C can realize successful robotic navigation in unknown and challenging environments by outputting continuous acceleration commands.

Download Full-text

TOWARDS CONTINUOUS CONTROL FOR MOBILE ROBOT NAVIGATION: A REINFORCEMENT LEARNING AND SLAM BASED APPROACH

ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences ◽

10.5194/isprs-archives-xlii-2-w13-857-2019 ◽

2019 ◽

Vol XLII-2/W13 ◽

pp. 857-863 ◽

Cited By ~ 2

Author(s):

K. A. A. Mustafa ◽

N. Botteghi ◽

B. Sirmacek ◽

M. Poel ◽

S. Stramigioli

Keyword(s):

Reinforcement Learning ◽

Mobile Robot ◽

Target Location ◽

Mobile Robot Navigation ◽

Continuous Control ◽

Local Optima ◽

Reward Function ◽

Angular Velocities ◽

Planning Algorithm ◽

Particle Filter Algorithm

<p><strong>Abstract.</strong> We introduce a new autonomous path planning algorithm for mobile robots for reaching target locations in an unknown environment where the robot relies on its on-board sensors. In particular, we describe the design and evaluation of a deep reinforcement learning motion planner with continuous linear and angular velocities to navigate to a desired target location based on deep deterministic policy gradient (DDPG). Additionally, the algorithm is enhanced by making use of the available knowledge of the environment provided by a grid-based SLAM with Rao-Blackwellized particle filter algorithm in order to shape the reward function in an attempt to improve the convergence rate, escape local optima and reduce the number of collisions with the obstacles. A comparison is made between a reward function shaped based on the map provided by the SLAM algorithm and a reward function when no knowledge of the map is available. Results show that the required learning time has been decreased in terms of number of episodes required to converge, which is 560 episodes compared to 1450 episodes in the standard RL algorithm, after adopting the proposed approach and the number of obstacle collision is reduced as well with a success ratio of 83% compared to 56% in the standard RL algorithm. The results are validated in a simulated experiment on a skid-steering mobile robot.</p>

Download Full-text

End-to-End AUV Motion Planning Method Based on Soft Actor–Critic

Sensors ◽

10.3390/s21175893 ◽

2021 ◽

Vol 21 (17) ◽

pp. 5893

Author(s):

Xin Yu ◽

Yushan Sun ◽

Xiangbin Wang ◽

Guocheng Zhang

Keyword(s):

Reinforcement Learning ◽

Motion Planning ◽

Autonomous Underwater Vehicle ◽

Planning System ◽

Optimal Decision ◽

Target Point ◽

Planning Problem ◽

Training Time ◽

Reward Function ◽

End To End

This study aims to solve the problems of poor exploration ability, single strategy, and high training cost in autonomous underwater vehicle (AUV) motion planning tasks and to overcome certain difficulties, such as multiple constraints and a sparse reward environment. In this research, an end-to-end motion planning system based on deep reinforcement learning is proposed to solve the motion planning problem of an underactuated AUV. The system directly maps the state information of the AUV and the environment into the control instructions of the AUV. The system is based on the soft actor–critic (SAC) algorithm, which enhances the exploration ability and robustness to the AUV environment. We also use the method of generative adversarial imitation learning (GAIL) to assist its training to overcome the problem that learning a policy for the first time is difficult and time-consuming in reinforcement learning. A comprehensive external reward function is then designed to help the AUV smoothly reach the target point, and the distance and time are optimized as much as possible. Finally, the end-to-end motion planning algorithm proposed in this research is tested and compared on the basis of the Unity simulation platform. Results show that the algorithm has an optimal decision-making ability during navigation, a shorter route, less time consumption, and a smoother trajectory. Moreover, GAIL can speed up the AUV training speed and minimize the training time without affecting the planning effect of the SAC algorithm.

Download Full-text

Autonomous underwater vehicle path planning based on actor-multi-critic reinforcement learning

Proceedings of the Institution of Mechanical Engineers Part I Journal of Systems and Control Engineering ◽

10.1177/0959651820937085 ◽

2020 ◽

pp. 095965182093708

Author(s):

Zhuo Wang ◽

Shiwei Zhang ◽

Xiaoning Feng ◽

Yancheng Sui

Keyword(s):

Reinforcement Learning ◽

Path Planning ◽

Value Function ◽

Autonomous Underwater Vehicle ◽

Autonomous Underwater Vehicles ◽

Underwater Vehicle ◽

Learning Efficiency ◽

Environmental Adaptability ◽

Vehicle Path ◽

The Value Function

The environmental adaptability of autonomous underwater vehicles is always a problem for its path planning. Although reinforcement learning can improve the environmental adaptability, the slow convergence of reinforcement learning is caused by multi-behavior coupling, so it is difficult for autonomous underwater vehicle to avoid moving obstacles. This article proposes a multi-behavior critic reinforcement learning algorithm applied to autonomous underwater vehicle path planning to overcome problems associated with oscillating amplitudes and low learning efficiency in the early stages of training which are common in traditional actor–critic algorithms. Behavior critic reinforcement learning assesses the actions of the actor from perspectives such as energy saving and security, combining these aspects into a whole evaluation of the actor. In this article, the policy gradient method is selected as the actor part, and the value function method is selected as the critic part. The strategy gradient and the value function methods for actor and critic, respectively, are approximated by a backpropagation neural network, the parameters of which are updated using the gradient descent method. The simulation results show that the method has the ability of optimizing learning in the environment and can improve learning efficiency, which meets the needs of real time and adaptability for autonomous underwater vehicle dynamic obstacle avoidance.

Download Full-text

An Autonomous Underwater Vehicle for Competition

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.380-384.595 ◽

2013 ◽

Vol 380-384 ◽

pp. 595-600

Author(s):

Hai Tian ◽

Bo Hu ◽

Can Yu Liu ◽

Guo Chao Xie ◽

Hui Min Luo

Keyword(s):

Control System ◽

Hydrodynamic Model ◽

Degrees Of Freedom ◽

Autonomous Underwater Vehicle ◽

Autonomous Underwater Vehicles ◽

Vertical Motion ◽

Underwater Vehicle ◽

Step Response ◽

Six Degrees Of Freedom ◽

Simulink Simulation

The research of this paper was derived from the small autonomous underwater vehicle (AUV)Raider well performed in the 15th International Underwater Vehicle Competition (IAUVC),San Diego. In order to improve the performance of underwater vehicle, the control system of performance motion played an important role on autonomous underwater vehicles stable motion, and the whole control system of AUV is the main point. Firstly, based on the motion equations of six degrees of freedom, the paper simplified the dynamical model reasonably in allusion; Due to the speed of Raider to find the target was very low, this paper considered the speed was approximately zero and only considered the vertical motion. Therefore, this paper established the vertical hydrodynamic model of Raider, obtaining the transfer equation of vertical motion. Through the experiment and Matlab/Simulink simulation, this paper got the actual depth of the step response curve and simulation curve, and verified the validity of the vertical hydrodynamic model and the correlation coefficient.

Download Full-text

A hierarchical planner for autonomous underwater vehicle control: the distributed control system multicomputer (DCSM) workbench

Proceedings IEEE International Symposium on Intelligent Control 1988 ◽

10.1109/isic.1988.65434 ◽

2003 ◽

Cited By ~ 2

Author(s):

G.M. Trimble

Keyword(s):

Control System ◽

Distributed Control ◽

Autonomous Underwater Vehicle ◽

Underwater Vehicle ◽

Vehicle Control ◽

Distributed Control System ◽

Underwater Vehicle Control

Download Full-text

Robust generalized dynamic inversion based control of autonomous underwater vehicles

Proceedings of the Institution of Mechanical Engineers Part M Journal of Engineering for the Maritime Environment ◽

10.1177/1475090217708640 ◽

2017 ◽

Vol 232 (4) ◽

pp. 434-447 ◽

Cited By ~ 11

Author(s):

Uzair Ansari ◽

Abdulrahman H Bajodah

Keyword(s):

Mathematical Model ◽

Control System ◽

Closed Loop ◽

Autonomous Underwater Vehicle ◽

Autonomous Underwater Vehicles ◽

Underwater Vehicle ◽

Underwater Vehicles ◽

Dynamic Inversion ◽

Control Law ◽

Closed Loop System

A novel two-loop structured robust generalized dynamic inversion–based control system is proposed for autonomous underwater vehicles. The outer (position) loop of the generalized dynamic inversion control system utilizes proportional-derivative control of the autonomous underwater vehicle’s inertial position errors from the desired inertial position trajectories, and it provides the reference yaw and pitch attitude angle commands to the inner loop. The inner (attitude) loop utilizes generalized dynamic inversion control of a prescribed asymptotically stable dynamics of the attitude angle errors from their reference values, and it provides the required control surface deflections such that the desired inertial position trajectories of the vehicle are tracked. The dynamic inversion singularity is avoided by augmenting a dynamic scaling factor within the Moore–Penrose generalized inverse in the particular part of the generalized dynamic inversion control law. The involved null control vector in the auxiliary part of the generalized dynamic inversion control law is constructed to be linear in the pitch and yaw angular velocities, and the proportionality gain matrix is designed to guarantee global closed-loop asymptotic stability of the vehicle’s angular velocity dynamics. An additional sliding mode control element is included in the particular part of the generalized dynamic inversion control system, and it works to robustify the closed-loop system against tracking performance deterioration due to generalized inversion scaling, such that semi-global practically stable attitude tracking is guaranteed. A detailed six degrees-of-freedom mathematical model of the Monterey Bay Aquarium Research Institute autonomous underwater vehicle is used to evaluate the control system design, and numerical simulations are conducted to demonstrate closed-loop system performance under various types of autonomous underwater vehicle maneuvers, under both nominal and perturbed autonomous underwater vehicle system’s mathematical model parameters.

Download Full-text

A Reward Optimization Method Based on Action Subrewards in Hierarchical Reinforcement Learning

The Scientific World JOURNAL ◽

10.1155/2014/120760 ◽

2014 ◽

Vol 2014 ◽

pp. 1-6

Author(s):

Yuchen Fu ◽

Quan Liu ◽

Xionghong Ling ◽

Zhiming Cui

Keyword(s):

Reinforcement Learning ◽

Learning Algorithm ◽

Optimization Method ◽

Curse Of Dimensionality ◽

Convergence Speed ◽

Learning Method ◽

Trial And Error ◽

State Spaces ◽

Reward Function ◽

Hierarchical Reinforcement Learning

Reinforcement learning (RL) is one kind of interactive learning methods. Its main characteristics are “trial and error” and “related reward.” A hierarchical reinforcement learning method based on action subrewards is proposed to solve the problem of “curse of dimensionality,” which means that the states space will grow exponentially in the number of features and low convergence speed. The method can reduce state spaces greatly and choose actions with favorable purpose and efficiency so as to optimize reward function and enhance convergence speed. Apply it to the online learning in Tetris game, and the experiment result shows that the convergence speed of this algorithm can be enhanced evidently based on the new method which combines hierarchical reinforcement learning algorithm and action subrewards. The “curse of dimensionality” problem is also solved to a certain extent with hierarchical method. All the performance with different parameters is compared and analyzed as well.

Download Full-text

Docking Control of an Autonomous Underwater Vehicle Using Reinforcement Learning

Applied Sciences ◽

10.3390/app9173456 ◽

2019 ◽

Vol 9 (17) ◽

pp. 3456 ◽

Cited By ~ 1

Author(s):

Enrico Anderlini ◽

Gordon G. Parker ◽

Giles Thomas

Keyword(s):

Optimal Control ◽

Reinforcement Learning ◽

Learning Strategies ◽

Autonomous Underwater Vehicle ◽

Autonomous Underwater Vehicles ◽

Computational Cost ◽

Control Input ◽

Control Effort ◽

Continuous State ◽

Action Spaces

To achieve persistent systems in the future, autonomous underwater vehicles (AUVs) willneed to autonomously dock onto a charging station. Here, reinforcement learning strategies wereapplied for the first time to control the docking of an AUV onto a fixed platform in a simulationenvironment. Two reinforcement learning schemes were investigated: one with continuous stateand action spaces, deep deterministic policy gradient (DDPG), and one with continuous state butdiscrete action spaces, deep Q network (DQN). For DQN, the discrete actions were selected as stepchanges in the control input signals. The performance of the reinforcement learning strategies wascompared with classical and optimal control techniques. The control actions selected by DDPG sufferfrom chattering effects due to a hyperbolic tangent layer in the actor. Conversely, DQN presents thebest compromise between short docking time and low control effort, whilst meeting the dockingrequirements. Whereas the reinforcement learning algorithms present a very high computational costat training time, they are five orders of magnitude faster than optimal control at deployment time,thus enabling an on-line implementation. Therefore, reinforcement learning achieves a performancesimilar to optimal control at a much lower computational cost at deployment, whilst also presentinga more general framework.

Download Full-text