scholarly journals Reinforcement Learning based on Deep Deterministic Policy Gradient for Roll Control of Underwater Vehicle

Author(s):  
Su Yong Kim ◽  
Yeon Geol Hwang ◽  
Sung Woong Moon

The existing underwater vehicle controller design is applied by linearizing the nonlinear dynamics model to a specific motion section. Since the linear controller has unstable control performance in a transient state, various studies have been conducted to overcome this problem. Recently, there have been studies to improve the control performance in the transient state by using reinforcement learning. Reinforcement learning can be largely divided into value-based reinforcement learning and policy-based reinforcement learning. In this paper, we propose the roll controller of underwater vehicle based on Deep Deterministic Policy Gradient(DDPG) that learns the control policy and can show stable control performance in various situations and environments. The performance of the proposed DDPG based roll controller was verified through simulation and compared with the existing PID and DQN with Normalized Advantage Functions based roll controllers.

2020 ◽  
Author(s):  
Gabriel Moraes Barros ◽  
Esther Colombini

In robotics, the ultimate goal of reinforcement learning is to endow robots with the ability to learn, improve, adapt, and reproduce tasks with dynamically changing constraints based on exploration and autonomous learning. Reinforcement Learning (RL) aims at addressing this problem by enabling a robot to learn behaviors through trial-and-error. With RL, a Neural Network can be trained as a function approximator to directly map states to actuator commands making any predefined control structure not-needed for training. However, the knowledge required to converge these methods is usually built from scratch. Learning may take a long time, not to mention that RL algorithms need a stated reward function. Sometimes, it is not trivial to define one. Often it is easier for a teacher, human or intelligent agent, do demonstrate the desired behavior or how to accomplish a given task. Humans and other animals have a natural ability to learn skills from observation, often from merely seeing these skills’ effects: without direct knowledge of the underlying actions. The same principle exists in Imitation Learning, a practical approach for autonomous systems to acquire control policies when an explicit reward function is unavailable, using supervision provided as demonstrations from an expert, typically a human operator. In this scenario, this work’s primary objective is to design an agent that can successfully imitate a prior acquired control policy using Imitation Learning. The chosen algorithm is GAIL since we consider that it is the proper algorithm to tackle this problem by utilizing expert (state, action) trajectories. As reference expert trajectories, we implement state-of-the-art on and off-policy methods PPO and SAC. Results show that the learned policies for all three methods can solve the task of low-level control of a quadrotor and that all can account for generalization on the original tasks.


Energies ◽  
2021 ◽  
Vol 14 (15) ◽  
pp. 4706
Author(s):  
Louis Desportes ◽  
Inbar Fijalkow ◽  
Pierre Andry

We address the control of a hybrid energy storage system composed of a lead battery and hydrogen storage. Powered by photovoltaic panels, it feeds a partially islanded building. We aim to minimize building carbon emissions over a long-term period while ensuring that 35% of the building consumption is powered using energy produced on site. To achieve this long-term goal, we propose to learn a control policy as a function of the building and of the storage state using a Deep Reinforcement Learning approach. We reformulate the problem to reduce the action space dimension to one. This highly improves the proposed approach performance. Given the reformulation, we propose a new algorithm, DDPGαrep, using a Deep Deterministic Policy Gradient (DDPG) to learn the policy. Once learned, the storage control is performed using this policy. Simulations show that the higher the hydrogen storage efficiency, the more effective the learning.


Minerals ◽  
2021 ◽  
Vol 11 (6) ◽  
pp. 587
Author(s):  
Joao Pedro de Carvalho ◽  
Roussos Dimitrakopoulos

This paper presents a new truck dispatching policy approach that is adaptive given different mining complex configurations in order to deliver supply material extracted by the shovels to the processors. The method aims to improve adherence to the operational plan and fleet utilization in a mining complex context. Several sources of operational uncertainty arising from the loading, hauling and dumping activities can influence the dispatching strategy. Given a fixed sequence of extraction of the mining blocks provided by the short-term plan, a discrete event simulator model emulates the interaction arising from these mining operations. The continuous repetition of this simulator and a reward function, associating a score value to each dispatching decision, generate sample experiences to train a deep Q-learning reinforcement learning model. The model learns from past dispatching experience, such that when a new task is required, a well-informed decision can be quickly taken. The approach is tested at a copper–gold mining complex, characterized by uncertainties in equipment performance and geological attributes, and the results show improvements in terms of production targets, metal production, and fleet management.


Sign in / Sign up

Export Citation Format

Share Document