scholarly journals How to Train Your Quadrotor: A Framework for Consistently Smooth and Responsive Flight Control via Reinforcement Learning

2021 ◽  
Vol 5 (4) ◽  
pp. 1-24
Author(s):  
Siddharth Mysore ◽  
Bassel Mabsout ◽  
Kate Saenko ◽  
Renato Mancuso

We focus on the problem of reliably training Reinforcement Learning (RL) models (agents) for stable low-level control in embedded systems and test our methods on a high-performance, custom-built quadrotor platform. A common but often under-studied problem in developing RL agents for continuous control is that the control policies developed are not always smooth. This lack of smoothness can be a major problem when learning controllers as it can result in control instability and hardware failure. Issues of noisy control are further accentuated when training RL agents in simulation due to simulators ultimately being imperfect representations of reality—what is known as the reality gap . To combat issues of instability in RL agents, we propose a systematic framework, REinforcement-based transferable Agents through Learning (RE+AL), for designing simulated training environments that preserve the quality of trained agents when transferred to real platforms. RE+AL is an evolution of the Neuroflight infrastructure detailed in technical reports prepared by members of our research group. Neuroflight is a state-of-the-art framework for training RL agents for low-level attitude control. RE+AL improves and completes Neuroflight by solving a number of important limitations that hindered the deployment of Neuroflight to real hardware. We benchmark RE+AL on the NF1 racing quadrotor developed as part of Neuroflight. We demonstrate that RE+AL significantly mitigates the previously observed issues of smoothness in RL agents. Additionally, RE+AL is shown to consistently train agents that are flight capable and with minimal degradation in controller quality upon transfer. RE+AL agents also learn to perform better than a tuned PID controller, with better tracking errors, smoother control, and reduced power consumption. To the best of our knowledge, RE+AL agents are the first RL-based controllers trained in simulation to outperform a well-tuned PID controller on a real-world controls problem that is solvable with classical control.

Author(s):  
Richard Cheng ◽  
Gábor Orosz ◽  
Richard M. Murray ◽  
Joel W. Burdick

Reinforcement Learning (RL) algorithms have found limited success beyond simulated applications, and one main reason is the absence of safety guarantees during the learning process. Real world systems would realistically fail or break before an optimal controller can be learned. To address this issue, we propose a controller architecture that combines (1) a model-free RL-based controller with (2) model-based controllers utilizing control barrier functions (CBFs) and (3) online learning of the unknown system dynamics, in order to ensure safety during learning. Our general framework leverages the success of RL algorithms to learn high-performance controllers, while the CBF-based controllers both guarantee safety and guide the learning process by constraining the set of explorable polices. We utilize Gaussian Processes (GPs) to model the system dynamics and its uncertainties. Our novel controller synthesis algorithm, RL-CBF, guarantees safety with high probability during the learning process, regardless of the RL algorithm used, and demonstrates greater policy exploration efficiency. We test our algorithm on (1) control of an inverted pendulum and (2) autonomous carfollowing with wireless vehicle-to-vehicle communication, and show that our algorithm attains much greater sample efficiency in learning than other state-of-the-art algorithms and maintains safety during the entire learning process.


Sensors ◽  
2018 ◽  
Vol 18 (12) ◽  
pp. 4331 ◽  
Author(s):  
Zhong Ma ◽  
Yuejiao Wang ◽  
Yidai Yang ◽  
Zhuping Wang ◽  
Lei Tang ◽  
...  

When a satellite performs complex tasks such as discarding a payload or capturing a non-cooperative target, it will encounter sudden changes in the attitude and mass parameters, causing unstable flying and rolling of the satellite. In such circumstances, the change of the movement and mass characteristics are unpredictable. Thus, the traditional attitude control methods are unable to stabilize the satellite since they are dependent on the mass parameters of the controlled object. In this paper, we proposed a reinforcement learning method to re-stabilize the attitude of a satellite under such circumstances. Specifically, we discretize the continuous control torque, and build a neural network model that can output the discretized control torque to control the satellite. A dynamics simulation environment of the satellite is built, and the deep Q Network algorithm is then performed to train the neural network in this simulation environment. The reward of the training is the stabilization of the satellite. Simulation experiments illustrate that, with the iteration of training progresses, the neural network model gradually learned to re-stabilize the attitude of a satellite after unknown disturbance. As a contrast, the traditional PD (Proportion Differential) controller was unable to re-stabilize the satellite due to its dependence on the mass parameters. The proposed method adopts self-learning to control satellite attitudes, shows considerable intelligence and certain universality, and has a strong application potential for future intelligent control of satellites performing complex space tasks.


2019 ◽  
Vol 4 (4) ◽  
pp. 4224-4230 ◽  
Author(s):  
Nathan O. Lambert ◽  
Daniel S. Drew ◽  
Joseph Yaconelli ◽  
Sergey Levine ◽  
Roberto Calandra ◽  
...  

Sensors ◽  
2021 ◽  
Vol 21 (13) ◽  
pp. 4560
Author(s):  
Chen-Huan Pi ◽  
Yi-Wei Dai ◽  
Kai-Chun Hu ◽  
Stone Cheng

This paper proposes a multipurpose reinforcement learning based low-level multirotor unmanned aerial vehicles control structure constructed using neural networks with model-free training. Other low-level reinforcement learning controllers developed in studies have only been applicable to a model-specific and physical-parameter-specific multirotor, and time-consuming training is required when switching to a different vehicle. We use a 6-degree-of-freedom dynamic model combining acceleration-based control from the policy neural network to overcome these problems. The UAV automatically learns the maneuver by an end-to-end neural network from fusion states to acceleration command. The state estimation is performed using the data from on-board sensors and motion capture. The motion capture system provides spatial position information and a multisensory fusion framework fuses the measurement from the onboard inertia measurement units for compensating the time delay and low update frequency of the capture system. Without requiring expert demonstration, the trained control policy implemented using an improved algorithm can be applied to various multirotors with the output directly mapped to actuators. The algorithm’s ability to control multirotors in the hovering and the tracking task is evaluated. Through simulation and actual experiments, we demonstrate the flight control with a quadrotor and hexrotor by using the trained policy. With the same policy, we verify that we can stabilize the quadrotor and hexrotor in the air under random initial states.


2018 ◽  
Vol 107 ◽  
pp. 71-86 ◽  
Author(s):  
Ignacio Carlucho ◽  
Mariano De Paula ◽  
Sen Wang ◽  
Yvan Petillot ◽  
Gerardo G. Acosta

2020 ◽  
Vol 276 ◽  
pp. 115248 ◽  
Author(s):  
Camillo Balerna ◽  
Nicolas Lanzetti ◽  
Mauro Salazar ◽  
Alberto Cerofolini ◽  
Christopher Onder

2013 ◽  
Vol 1 (4) ◽  
pp. 14-18
Author(s):  
J. Pavithra ◽  
◽  
T.Kalavathi Devi ◽  
R. Mouleeshuwarapprabu ◽  
◽  
...  

Author(s):  
Breno A. de Melo Menezes ◽  
Nina Herrmann ◽  
Herbert Kuchen ◽  
Fernando Buarque de Lima Neto

AbstractParallel implementations of swarm intelligence algorithms such as the ant colony optimization (ACO) have been widely used to shorten the execution time when solving complex optimization problems. When aiming for a GPU environment, developing efficient parallel versions of such algorithms using CUDA can be a difficult and error-prone task even for experienced programmers. To overcome this issue, the parallel programming model of Algorithmic Skeletons simplifies parallel programs by abstracting from low-level features. This is realized by defining common programming patterns (e.g. map, fold and zip) that later on will be converted to efficient parallel code. In this paper, we show how algorithmic skeletons formulated in the domain specific language Musket can cope with the development of a parallel implementation of ACO and how that compares to a low-level implementation. Our experimental results show that Musket suits the development of ACO. Besides making it easier for the programmer to deal with the parallelization aspects, Musket generates high performance code with similar execution times when compared to low-level implementations.


Sign in / Sign up

Export Citation Format

Share Document