How to Train Your Quadrotor: A Framework for Consistently Smooth and Responsive Flight Control via Reinforcement Learning

We focus on the problem of reliably training Reinforcement Learning (RL) models (agents) for stable low-level control in embedded systems and test our methods on a high-performance, custom-built quadrotor platform. A common but often under-studied problem in developing RL agents for continuous control is that the control policies developed are not always smooth. This lack of smoothness can be a major problem when learning controllers as it can result in control instability and hardware failure. Issues of noisy control are further accentuated when training RL agents in simulation due to simulators ultimately being imperfect representations of reality—what is known as the reality gap . To combat issues of instability in RL agents, we propose a systematic framework, REinforcement-based transferable Agents through Learning (RE+AL), for designing simulated training environments that preserve the quality of trained agents when transferred to real platforms. RE+AL is an evolution of the Neuroflight infrastructure detailed in technical reports prepared by members of our research group. Neuroflight is a state-of-the-art framework for training RL agents for low-level attitude control. RE+AL improves and completes Neuroflight by solving a number of important limitations that hindered the deployment of Neuroflight to real hardware. We benchmark RE+AL on the NF1 racing quadrotor developed as part of Neuroflight. We demonstrate that RE+AL significantly mitigates the previously observed issues of smoothness in RL agents. Additionally, RE+AL is shown to consistently train agents that are flight capable and with minimal degradation in controller quality upon transfer. RE+AL agents also learn to perform better than a tuned PID controller, with better tracking errors, smoother control, and reduced power consumption. To the best of our knowledge, RE+AL agents are the first RL-based controllers trained in simulation to outperform a well-tuned PID controller on a real-world controls problem that is solvable with classical control.

Download Full-text

End-to-End Safe Reinforcement Learning through Barrier Functions for Safety-Critical Continuous Control Tasks

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33013387 ◽

2019 ◽

Vol 33 ◽

pp. 3387-3395 ◽

Cited By ~ 21

Author(s):

Richard Cheng ◽

Gábor Orosz ◽

Richard M. Murray ◽

Joel W. Burdick

Keyword(s):

Reinforcement Learning ◽

System Dynamics ◽

Learning Process ◽

High Performance ◽

Continuous Control ◽

Barrier Functions ◽

Synthesis Algorithm ◽

Vehicle Communication ◽

Model Free ◽

Vehicle To Vehicle

Reinforcement Learning (RL) algorithms have found limited success beyond simulated applications, and one main reason is the absence of safety guarantees during the learning process. Real world systems would realistically fail or break before an optimal controller can be learned. To address this issue, we propose a controller architecture that combines (1) a model-free RL-based controller with (2) model-based controllers utilizing control barrier functions (CBFs) and (3) online learning of the unknown system dynamics, in order to ensure safety during learning. Our general framework leverages the success of RL algorithms to learn high-performance controllers, while the CBF-based controllers both guarantee safety and guide the learning process by constraining the set of explorable polices. We utilize Gaussian Processes (GPs) to model the system dynamics and its uncertainties. Our novel controller synthesis algorithm, RL-CBF, guarantees safety with high probability during the learning process, regardless of the RL algorithm used, and demonstrates greater policy exploration efficiency. We test our algorithm on (1) control of an inverted pendulum and (2) autonomous carfollowing with wireless vehicle-to-vehicle communication, and show that our algorithm attains much greater sample efficiency in learning than other state-of-the-art algorithms and maintains safety during the entire learning process.

Download Full-text

Reinforcement Learning-Based Satellite Attitude Stabilization Method for Non-Cooperative Target Capturing

Sensors ◽

10.3390/s18124331 ◽

2018 ◽

Vol 18 (12) ◽

pp. 4331 ◽

Cited By ~ 3

Author(s):

Zhong Ma ◽

Yuejiao Wang ◽

Yidai Yang ◽

Zhuping Wang ◽

Lei Tang ◽

...

Keyword(s):

Neural Network ◽

Reinforcement Learning ◽

Network Model ◽

Neural Network Model ◽

Attitude Control ◽

Dynamics Simulation ◽

Continuous Control ◽

Simulation Environment ◽

Control Torque ◽

The Neural Network

When a satellite performs complex tasks such as discarding a payload or capturing a non-cooperative target, it will encounter sudden changes in the attitude and mass parameters, causing unstable flying and rolling of the satellite. In such circumstances, the change of the movement and mass characteristics are unpredictable. Thus, the traditional attitude control methods are unable to stabilize the satellite since they are dependent on the mass parameters of the controlled object. In this paper, we proposed a reinforcement learning method to re-stabilize the attitude of a satellite under such circumstances. Specifically, we discretize the continuous control torque, and build a neural network model that can output the discretized control torque to control the satellite. A dynamics simulation environment of the satellite is built, and the deep Q Network algorithm is then performed to train the neural network in this simulation environment. The reward of the training is the stabilization of the satellite. Simulation experiments illustrate that, with the iteration of training progresses, the neural network model gradually learned to re-stabilize the attitude of a satellite after unknown disturbance. As a contrast, the traditional PD (Proportion Differential) controller was unable to re-stabilize the satellite due to its dependence on the mass parameters. The proposed method adopts self-learning to control satellite attitudes, shows considerable intelligence and certain universality, and has a strong application potential for future intelligent control of satellites performing complex space tasks.

Download Full-text

Low-Level Control of a Quadrotor With Deep Model-Based Reinforcement Learning

IEEE Robotics and Automation Letters ◽

10.1109/lra.2019.2930489 ◽

2019 ◽

Vol 4 (4) ◽

pp. 4224-4230 ◽

Cited By ~ 9

Author(s):

Nathan O. Lambert ◽

Daniel S. Drew ◽

Joseph Yaconelli ◽

Sergey Levine ◽

Roberto Calandra ◽

...

Keyword(s):

Reinforcement Learning ◽

Level Control ◽

Low Level ◽

Model Based ◽

Deep Model

Download Full-text

General Purpose Low-Level Reinforcement Learning Control for Multi-Axis Rotor Aerial Vehicles

Sensors ◽

10.3390/s21134560 ◽

2021 ◽

Vol 21 (13) ◽

pp. 4560

Author(s):

Chen-Huan Pi ◽

Yi-Wei Dai ◽

Kai-Chun Hu ◽

Stone Cheng

Keyword(s):

Neural Network ◽

Reinforcement Learning ◽

Motion Capture ◽

Flight Control ◽

Control Policy ◽

General Purpose ◽

Position Information ◽

Low Level ◽

Model Free ◽

Aerial Vehicles

This paper proposes a multipurpose reinforcement learning based low-level multirotor unmanned aerial vehicles control structure constructed using neural networks with model-free training. Other low-level reinforcement learning controllers developed in studies have only been applicable to a model-specific and physical-parameter-specific multirotor, and time-consuming training is required when switching to a different vehicle. We use a 6-degree-of-freedom dynamic model combining acceleration-based control from the policy neural network to overcome these problems. The UAV automatically learns the maneuver by an end-to-end neural network from fusion states to acceleration command. The state estimation is performed using the data from on-board sensors and motion capture. The motion capture system provides spatial position information and a multisensory fusion framework fuses the measurement from the onboard inertia measurement units for compensating the time delay and low update frequency of the capture system. Without requiring expert demonstration, the trained control policy implemented using an improved algorithm can be applied to various multirotors with the output directly mapped to actuators. The algorithm’s ability to control multirotors in the hovering and the tracking task is evaluated. Through simulation and actual experiments, we demonstrate the flight control with a quadrotor and hexrotor by using the trained policy. With the same policy, we verify that we can stabilize the quadrotor and hexrotor in the air under random initial states.

Download Full-text

Adaptive low-level control of autonomous underwater vehicles using deep reinforcement learning

Robotics and Autonomous Systems ◽

10.1016/j.robot.2018.05.016 ◽

2018 ◽

Vol 107 ◽

pp. 71-86 ◽

Cited By ~ 31

Author(s):

Ignacio Carlucho ◽

Mariano De Paula ◽

Sen Wang ◽

Yvan Petillot ◽

Gerardo G. Acosta

Keyword(s):

Reinforcement Learning ◽

Autonomous Underwater Vehicles ◽

Underwater Vehicles ◽

Level Control ◽

Low Level

Download Full-text

Optimal low-level control strategies for a high-performance hybrid electric power unit

Applied Energy ◽

10.1016/j.apenergy.2020.115248 ◽

2020 ◽

Vol 276 ◽

pp. 115248 ◽

Cited By ~ 2

Author(s):

Camillo Balerna ◽

Nicolas Lanzetti ◽

Mauro Salazar ◽

Alberto Cerofolini ◽

Christopher Onder

Keyword(s):

Electric Power ◽

Power Unit ◽

High Performance ◽

Control Strategies ◽

Level Control ◽

Low Level ◽

Hybrid Electric

Download Full-text

Design Of PID Controller For Boiler Drum Level Control

i-manager s Journal on Digital Signal Processing ◽

10.26634/jdp.1.4.2596 ◽

2013 ◽

Vol 1 (4) ◽

pp. 14-18

Author(s):

J. Pavithra ◽

◽

T.Kalavathi Devi ◽

R. Mouleeshuwarapprabu ◽

◽

...

Keyword(s):

Pid Controller ◽

Level Control ◽

Boiler Drum

Download Full-text

Satellite Attitude Control with Deep Reinforcement Learning

2020 Chinese Automation Congress (CAC) ◽

10.1109/cac51589.2020.9326605 ◽

2020 ◽

Author(s):

Duozhi Gao ◽

Haibo Zhang ◽

Chuanjiang Li ◽

Xinzhou Gao

Keyword(s):

Reinforcement Learning ◽

Attitude Control ◽

Satellite Attitude ◽

Satellite Attitude Control

Download Full-text

Hybrid deep reinforcement learning based eco-driving for low-level connected and automated vehicles along signalized corridors

Transportation Research Part C Emerging Technologies ◽

10.1016/j.trc.2021.102980 ◽

2021 ◽

Vol 124 ◽

pp. 102980

Author(s):

Qiangqiang Guo ◽

Ohay Angah ◽

Zhijun Liu ◽

Xuegang (Jeff) Ban

Keyword(s):

Reinforcement Learning ◽

Automated Vehicles ◽

Low Level

Download Full-text

High-Level Parallel Ant Colony Optimization with Algorithmic Skeletons

International Journal of Parallel Programming ◽

10.1007/s10766-021-00714-1 ◽

2021 ◽

Author(s):

Breno A. de Melo Menezes ◽

Nina Herrmann ◽

Herbert Kuchen ◽

Fernando Buarque de Lima Neto

Keyword(s):

Ant Colony Optimization ◽

High Performance ◽

Optimization Problems ◽

Programming Model ◽

Parallel Implementation ◽

Ant Colony ◽

Algorithmic Skeletons ◽

Low Level ◽

Programming Patterns ◽

High Level

AbstractParallel implementations of swarm intelligence algorithms such as the ant colony optimization (ACO) have been widely used to shorten the execution time when solving complex optimization problems. When aiming for a GPU environment, developing efficient parallel versions of such algorithms using CUDA can be a difficult and error-prone task even for experienced programmers. To overcome this issue, the parallel programming model of Algorithmic Skeletons simplifies parallel programs by abstracting from low-level features. This is realized by defining common programming patterns (e.g. map, fold and zip) that later on will be converted to efficient parallel code. In this paper, we show how algorithmic skeletons formulated in the domain specific language Musket can cope with the development of a parallel implementation of ACO and how that compares to a low-level implementation. Our experimental results show that Musket suits the development of ACO. Besides making it easier for the programmer to deal with the parallelization aspects, Musket generates high performance code with similar execution times when compared to low-level implementations.

Download Full-text