Reinforcement and Imitation Learning Applied to Autonomous Aerial Robot Control

Mapping Intimacies ◽

10.5753/wtdr_ctdr.2020.14956 ◽

2020 ◽

Author(s):

Gabriel Moraes Barros ◽

Esther Colombini

Keyword(s):

Reinforcement Learning ◽

Autonomous Systems ◽

Control Policy ◽

Primary Objective ◽

Imitation Learning ◽

Level Control ◽

Reward Function ◽

Long Time ◽

Learning Reinforcement ◽

Function Approximator

In robotics, the ultimate goal of reinforcement learning is to endow robots with the ability to learn, improve, adapt, and reproduce tasks with dynamically changing constraints based on exploration and autonomous learning. Reinforcement Learning (RL) aims at addressing this problem by enabling a robot to learn behaviors through trial-and-error. With RL, a Neural Network can be trained as a function approximator to directly map states to actuator commands making any predefined control structure not-needed for training. However, the knowledge required to converge these methods is usually built from scratch. Learning may take a long time, not to mention that RL algorithms need a stated reward function. Sometimes, it is not trivial to define one. Often it is easier for a teacher, human or intelligent agent, do demonstrate the desired behavior or how to accomplish a given task. Humans and other animals have a natural ability to learn skills from observation, often from merely seeing these skills’ effects: without direct knowledge of the underlying actions. The same principle exists in Imitation Learning, a practical approach for autonomous systems to acquire control policies when an explicit reward function is unavailable, using supervision provided as demonstrations from an expert, typically a human operator. In this scenario, this work’s primary objective is to design an agent that can successfully imitate a prior acquired control policy using Imitation Learning. The chosen algorithm is GAIL since we consider that it is the proper algorithm to tackle this problem by utilizing expert (state, action) trajectories. As reference expert trajectories, we implement state-of-the-art on and off-policy methods PPO and SAC. Results show that the learned policies for all three methods can solve the task of low-level control of a quadrotor and that all can account for generalization on the original tasks.

Download Full-text

Integrating Production Planning with Truck-Dispatching Decisions through Reinforcement Learning While Managing Uncertainty

Minerals ◽

10.3390/min11060587 ◽

2021 ◽

Vol 11 (6) ◽

pp. 587

Author(s):

Joao Pedro de Carvalho ◽

Roussos Dimitrakopoulos

Keyword(s):

Reinforcement Learning ◽

Discrete Event ◽

Mining Operations ◽

Fixed Sequence ◽

Q Learning ◽

Reward Function ◽

Copper Gold ◽

Mining Complex ◽

Learning Reinforcement ◽

Operational Plan

This paper presents a new truck dispatching policy approach that is adaptive given different mining complex configurations in order to deliver supply material extracted by the shovels to the processors. The method aims to improve adherence to the operational plan and fleet utilization in a mining complex context. Several sources of operational uncertainty arising from the loading, hauling and dumping activities can influence the dispatching strategy. Given a fixed sequence of extraction of the mining blocks provided by the short-term plan, a discrete event simulator model emulates the interaction arising from these mining operations. The continuous repetition of this simulator and a reward function, associating a score value to each dispatching decision, generate sample experiences to train a deep Q-learning reinforcement learning model. The model learns from past dispatching experience, such that when a new task is required, a well-informed decision can be quickly taken. The approach is tested at a copper–gold mining complex, characterized by uncertainties in equipment performance and geological attributes, and the results show improvements in terms of production targets, metal production, and fleet management.

Download Full-text

Generative Adversarial Immitation Learning for Steering an Unmanned Surface Vehicle

Proceedings of the Northern Lights Deep Learning Workshop ◽

10.7557/18.5147 ◽

2020 ◽

Vol 1 ◽

pp. 6

Author(s):

Alexandra Vedeler ◽

Narada Warakagoda

Keyword(s):

Reinforcement Learning ◽

Obstacle Avoidance ◽

Complex Dynamics ◽

Imitation Learning ◽

Inverse Reinforcement Learning ◽

Radar Sensor ◽

Single Action ◽

Reward Function ◽

End To End ◽

Insight Into

The task of obstacle avoidance using maritime vessels, such as Unmanned Surface Vehicles (USV), has traditionally been solved using specialized modules that are designed and optimized separately. However, this approach requires a deep insight into the environment, the vessel, and their complex dynamics. We propose an alternative method using Imitation Learning (IL) through Deep Reinforcement Learning (RL) and Deep Inverse Reinforcement Learning (IRL) and present a system that learns an end-to-end steering model capable of mapping radar-like images directly to steering actions in an obstacle avoidance scenario. The USV used in the work is equipped with a Radar sensor and we studied the problem of generating a single action parameter, heading. We apply an IL algorithm known as generative adversarial imitation learning (GAIL) to develop an end-to-end steering model for a scenario where avoidance of an obstacle is the goal. The performance of the system was studied for different design choices and compared to that of a system that is based on pure RL. The IL system produces results that indicate it is able to grasp the concept of the task and that in many ways are on par with the RL system. We deem this to be promising for future use in tasks that are not as easily described by a reward function.

Download Full-text

Gamma-Nets: Generalizing Value Estimation over Timescale

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i04.6027 ◽

2020 ◽

Vol 34 (04) ◽

pp. 5717-5725

Author(s):

Craig Sherstan ◽

Shibhansh Dohare ◽

James MacGlashan ◽

Johannes Günther ◽

Patrick M. Pilarski

Keyword(s):

Reinforcement Learning ◽

Value Function ◽

A Priori ◽

Predictive Ability ◽

Representation Learning ◽

Robot Arm ◽

Function Estimation ◽

Temporal Abstraction ◽

Long Time ◽

Function Approximator

Temporal abstraction is a key requirement for agents making decisions over long time horizons—a fundamental challenge in reinforcement learning. There are many reasons why value estimates at multiple timescales might be useful; recent work has shown that value estimates at different time scales can be the basis for creating more advanced discounting functions and for driving representation learning. Further, predictions at many different timescales serve to broaden an agent's model of its environment. One predictive approach of interest within an online learning setting is general value function (GVFs), which represent models of an agent's world as a collection of predictive questions each defined by a policy, a signal to be predicted, and a prediction timescale. In this paper we present Γ-nets, a method for generalizing value function estimation over timescale, allowing a given GVF to be trained and queried for arbitrary timescales so as to greatly increase the predictive ability and scalability of a GVF-based model. The key to our approach is to use timescale as one of the value estimator's inputs. As a result, the prediction target for any timescale is available at every timestep and we are free to train on any number of timescales. We first provide two demonstrations by 1) predicting a square wave and 2) predicting sensorimotor signals on a robot arm using a linear function approximator. Next, we empirically evaluate Γ-nets in the deep reinforcement learning setting using policy evaluation on a set of Atari video games. Our results show that Γ-nets can be effective for predicting arbitrary timescales, with only a small cost in accuracy as compared to learning estimators for fixed timescales. Γ-nets provide a method for accurately and compactly making predictions at many timescales without requiring a priori knowledge of the task, making it a valuable contribution to ongoing work on model-based planning, representation learning, and lifelong learning algorithms.

Download Full-text

Reinforcement Learning based on Deep Deterministic Policy Gradient for Roll Control of Underwater Vehicle

Journal of the Korea Institute of Military Science and Technology ◽

10.9766/kimst.2021.24.5.558 ◽

2021 ◽

Vol 24 (5) ◽

pp. 558-568

Author(s):

Su Yong Kim ◽

Yeon Geol Hwang ◽

Sung Woong Moon

Keyword(s):

Reinforcement Learning ◽

Controller Design ◽

Control Policy ◽

Transient State ◽

Underwater Vehicle ◽

Control Performance ◽

Roll Control ◽

Policy Gradient ◽

Nonlinear Dynamics Model ◽

Learning Reinforcement

The existing underwater vehicle controller design is applied by linearizing the nonlinear dynamics model to a specific motion section. Since the linear controller has unstable control performance in a transient state, various studies have been conducted to overcome this problem. Recently, there have been studies to improve the control performance in the transient state by using reinforcement learning. Reinforcement learning can be largely divided into value-based reinforcement learning and policy-based reinforcement learning. In this paper, we propose the roll controller of underwater vehicle based on Deep Deterministic Policy Gradient(DDPG) that learns the control policy and can show stable control performance in various situations and environments. The performance of the proposed DDPG based roll controller was verified through simulation and compared with the existing PID and DQN with Normalized Advantage Functions based roll controllers.

Download Full-text

Training Champion-level Race Car Drivers Using Deep Reinforcement Learning

10.21203/rs.3.rs-795954/v1 ◽

2021 ◽

Author(s):

Peter Wurman ◽

Samuel Barrett ◽

Kenta Kawamoto ◽

James MacGlashan ◽

Kaushik Subramanian ◽

...

Keyword(s):

Reinforcement Learning ◽

Real Time ◽

Learning Algorithm ◽

Integrated Control ◽

Extreme Case ◽

Control Policy ◽

Reward Function ◽

Race Car ◽

Model Free ◽

Race Car Drivers

Abstract Many potential applications of artificial intelligence involve making real-time decisions in physical systems. Automobile racing represents an extreme case of real-time decision making in close proximity to other highly-skilled drivers while near the limits of vehicular control. Racing simulations, such as the PlayStation game Gran Turismo, faithfully reproduce the nonlinear control challenges of real race cars while also encapsulating the complex multi-agent interactions. We attack, and solve for the first time, the simulated racing challenge using model-free deep reinforcement learning. We introduce a novel reinforcement learning algorithm and enhance the learning process with mixed scenario training to encourage the agent to incorporate racing tactics into an integrated control policy. In addition, we construct a reward function that enables the agent to adhere to the sport's under-specified racing etiquette rules. We demonstrate the capabilities of our agent, GT Sophy, by winning two of three races against four of the world's best Gran Turismo drivers and being competitive in the overall team score. By showing that these techniques can be successfully used to train championship-level race car drivers, we open up the possibility of their use in other complex dynamical systems and real-world applications.

Download Full-text

Motion Planning with Energy Reduction for a Floating Robotic Platform Under Disturbances and Measurement Noise Using Reinforcement Learning

International Journal of Artificial Intelligence Tools ◽

10.1142/s0218213018600059 ◽

2018 ◽

Vol 27 (04) ◽

pp. 1860005 ◽

Cited By ~ 1

Author(s):

Konstantinos Tziortziotis ◽

Nikolaos Tziortziotis ◽

Kostas Vlachos ◽

Konstantinos Blekas

Keyword(s):

Reinforcement Learning ◽

Energy Consumption ◽

Degrees Of Freedom ◽

Optimal Path ◽

Iteration Scheme ◽

Measurement Noise ◽

Target Point ◽

Level Control ◽

Reward Function ◽

Marine Platform

This paper investigates the use of reinforcement learning for the navigation of an over-actuated, i.e. more control inputs than degrees of freedom, marine platform in unknown environment. The proposed approach uses an online least-squared policy iteration scheme for value function approximation in order to estimate optimal policy, in conjunction with a low-level control system that controls the magnitude of the linear velocity, and the orientation of the platform. Primary goal of the proposed scheme is the reduction of the consumed energy. To that end, we propose a variable reward function that depends on the energy consumption of the platform. We evaluate our approach in a complex and realistic simulation environment and report results concerning its performance on estimating optimal navigation policies under different environmental disturbances, and position GPS measurement noise. The proposed framework is compared, in terms of energy consumption, to a baseline approach based on virtual potential fields. The results show that the marine platform successfully discovers the target point following a sub-optimal path, maintaining reduced energy consumption.

Download Full-text

Autonomous Control of Urban Storm Water Networks Using Reinforcement Learning

10.29007/hx4d ◽

2018 ◽

Cited By ~ 1

Author(s):

Abhiram Mullapudi ◽

Branko Kerkez

Keyword(s):

Reinforcement Learning ◽

System Level ◽

Storm Water ◽

Level Control ◽

Water Network ◽

Reward Function ◽

Autonomous Operation ◽

Control Robustness ◽

Control Objective ◽

Urban Storm

We investigate the real-time and autonomous operation of a 12 km2 urban storm water network, which has been retrofitted with sensors and control valves. Specifically, we evaluate reinforcement learning, a technique rooted in deep learning, as a system-level control methodology. The controller opens and closes valves in the system, which enhances the performance in the storm water network by coordinating the discharges amongst spatially distributed storm water assets (i.e. detention basins and wetlands). A reinforcement learning control algorithm is implemented to control the storm water network across an urban watershed. Results show that control of valves using reinforcement learning shows great potential, but extensive research still needs to be conducted to develop a fundamental understanding of control robustness. We specifically discuss the role and importance of the reward function (i.e. heuristic control objective), which guides the autonomous controller towards achieving the desired water shed scale response.

Download Full-text

Leveraging Conventional Control to Improve Performance of Systems Using Reinforcement Learning

Volume 2: Intelligent Transportation/Vehicles; Manufacturing; Mechatronics; Engine/After-Treatment Systems; Soft Actuators/Manipulators; Modeling/Validation; Motion/Vibration Control Applications; Multi-Agent/Networked Systems; Path Planning/Motion Control; Renewable/Smart Energy Systems; Security/Privacy of Cyber-Physical Systems; Sensors/Actuators; Tracking Control Systems; Unmanned Ground/Aerial Vehicles; Vehicle Dynamics, Estimation, Control; Vibration/Control Systems; Vibrations ◽

10.1115/dscc2020-3307 ◽

2020 ◽

Author(s):

Gerald Eaglin ◽

Joshua Vaughan

Keyword(s):

Optimal Control ◽

Reinforcement Learning ◽

Domain Knowledge ◽

Training Time ◽

Reward Function ◽

Model Based ◽

Model Free ◽

Optimal Controllers ◽

Dynamics And Control ◽

Learning Reinforcement

Abstract While many model-based methods have been proposed for optimal control, it is often difficult to generate model-based optimal controllers for nonlinear systems. One model-free method to solve for optimal control policies is reinforcement learning. Reinforcement learning iteratively trains an agent to optimize a reward function. However, agents often perform poorly at the beginning of training and require a large number of trials to converge to a successful policy. A method is proposed to incorporate domain knowledge of dynamics and control into the controllers using reinforcement learning to reduce the training time needed. Simulations are presented to compare the performance of agents utilizing domain knowledge to those that do not use domain knowledge. The results show that the agents with domain knowledge can accomplish the desired task with less training time than those without domain knowledge.

Download Full-text

Joint Extraction of Entities and Relations Using Reinforcement Learning and Deep Learning

Computational Intelligence and Neuroscience ◽

10.1155/2017/7643065 ◽

2017 ◽

Vol 2017 ◽

pp. 1-11 ◽

Cited By ~ 2

Author(s):

Yuntian Feng ◽

Hongjun Zhang ◽

Wenning Hao ◽

Gang Chen

Keyword(s):

Deep Learning ◽

Reinforcement Learning ◽

Decision Process ◽

Learning Algorithm ◽

Relation Extraction ◽

Control Policy ◽

The State ◽

Entity Extraction ◽

Initial State ◽

Reward Function

We use both reinforcement learning and deep learning to simultaneously extract entities and relations from unstructured texts. For reinforcement learning, we model the task as a two-step decision process. Deep learning is used to automatically capture the most important information from unstructured texts, which represent the state in the decision process. By designing the reward function per step, our proposed method can pass the information of entity extraction to relation extraction and obtain feedback in order to extract entities and relations simultaneously. Firstly, we use bidirectional LSTM to model the context information, which realizes preliminary entity extraction. On the basis of the extraction results, attention based method can represent the sentences that include target entity pair to generate the initial state in the decision process. Then we use Tree-LSTM to represent relation mentions to generate the transition state in the decision process. Finally, we employ Q-Learning algorithm to get control policy π in the two-step decision process. Experiments on ACE2005 demonstrate that our method attains better performance than the state-of-the-art method and gets a 2.4% increase in recall-score.

Download Full-text

Rolling Cargo Management Using a Deep Reinforcement Learning Approach

Logistics ◽

10.3390/logistics5010010 ◽

2021 ◽

Vol 5 (1) ◽

pp. 10

Author(s):

Rachid Oucheikh ◽

Tuwe Löfström ◽

Ernst Ahlberg ◽

Lars Carlsson

Keyword(s):

Reinforcement Learning ◽

Autonomous Systems ◽

Obstacle Detection ◽

Maritime Logistics ◽

Reward Function ◽

Learning Framework ◽

Navigation Strategy ◽

Challenging Environment ◽

Loading And Unloading ◽

Policy Optimization

Loading and unloading rolling cargo in roll-on/roll-off are important and very recurrent operations in maritime logistics. In this paper, we apply state-of-the-art deep reinforcement learning algorithms to automate these operations in a complex and real environment. The objective is to teach an autonomous tug master to manage rolling cargo and perform loading and unloading operations while avoiding collisions with static and dynamic obstacles along the way. The artificial intelligence agent, representing the tug master, is trained and evaluated in a challenging environment based on the Unity3D learning framework, called the ML-Agents, and using proximal policy optimization. The agent is equipped with sensors for obstacle detection and is provided with real-time feedback from the environment thanks to its own reward function, allowing it to dynamically adapt its policies and navigation strategy. The performance evaluation shows that by choosing appropriate hyperparameters, the agents can successfully learn all required operations including lane-following, obstacle avoidance, and rolling cargo placement. This study also demonstrates the potential of intelligent autonomous systems to improve the performance and service quality of maritime transport.

Download Full-text