Low-Level Control of a Quadrotor With Deep Model-Based Reinforcement Learning

We focus on the problem of reliably training Reinforcement Learning (RL) models (agents) for stable low-level control in embedded systems and test our methods on a high-performance, custom-built quadrotor platform. A common but often under-studied problem in developing RL agents for continuous control is that the control policies developed are not always smooth. This lack of smoothness can be a major problem when learning controllers as it can result in control instability and hardware failure. Issues of noisy control are further accentuated when training RL agents in simulation due to simulators ultimately being imperfect representations of reality—what is known as the reality gap . To combat issues of instability in RL agents, we propose a systematic framework, REinforcement-based transferable Agents through Learning (RE+AL), for designing simulated training environments that preserve the quality of trained agents when transferred to real platforms. RE+AL is an evolution of the Neuroflight infrastructure detailed in technical reports prepared by members of our research group. Neuroflight is a state-of-the-art framework for training RL agents for low-level attitude control. RE+AL improves and completes Neuroflight by solving a number of important limitations that hindered the deployment of Neuroflight to real hardware. We benchmark RE+AL on the NF1 racing quadrotor developed as part of Neuroflight. We demonstrate that RE+AL significantly mitigates the previously observed issues of smoothness in RL agents. Additionally, RE+AL is shown to consistently train agents that are flight capable and with minimal degradation in controller quality upon transfer. RE+AL agents also learn to perform better than a tuned PID controller, with better tracking errors, smoother control, and reduced power consumption. To the best of our knowledge, RE+AL agents are the first RL-based controllers trained in simulation to outperform a well-tuned PID controller on a real-world controls problem that is solvable with classical control.

Download Full-text

Model-based development of low-level control strategies for transient operation of solid oxide fuel cell systems

Journal of Power Sources ◽

10.1016/j.jpowsour.2011.01.023 ◽

2011 ◽

Vol 196 (21) ◽

pp. 9036-9045 ◽

Cited By ~ 31

Author(s):

Marco Sorrentino ◽

Cesare Pianese

Keyword(s):

Fuel Cell ◽

Solid Oxide Fuel Cell ◽

Control Strategies ◽

Solid Oxide ◽

Oxide Fuel ◽

Transient Operation ◽

Level Control ◽

Cell Systems ◽

Low Level ◽

Model Based

Download Full-text

Model-based low-level control in flexible manufacturing systems

Robotics and Computer-Integrated Manufacturing ◽

10.1016/0736-5845(88)90013-0 ◽

1988 ◽

Vol 4 (3-4) ◽

pp. 423-428 ◽

Cited By ~ 8

Author(s):

Oded Maimon ◽

Gilead Tadmor

Keyword(s):

Flexible Manufacturing ◽

Flexible Manufacturing Systems ◽

Manufacturing Systems ◽

Level Control ◽

Low Level ◽

Model Based

Download Full-text

Deep Model-Based Reinforcement Learning via Estimated Uncertainty and Conservative Policy Optimization

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i04.6177 ◽

2020 ◽

Vol 34 (04) ◽

pp. 6941-6948

Author(s):

Qi Zhou ◽

HouQiang Li ◽

Jie Wang

Keyword(s):

Reinforcement Learning ◽

Performance Improvement ◽

Optimization Method ◽

Asymptotic Performance ◽

Model Based ◽

Model Free ◽

Deep Model ◽

Conservative Policy ◽

Policy Optimization ◽

Novel Model

Model-based reinforcement learning algorithms tend to achieve higher sample efficiency than model-free methods. However, due to the inevitable errors of learned models, model-based methods struggle to achieve the same asymptotic performance as model-free methods. In this paper, We propose a Policy Optimization method with Model-Based Uncertainty (POMBU)—a novel model-based approach—that can effectively improve the asymptotic performance using the uncertainty in Q-values. We derive an upper bound of the uncertainty, based on which we can approximate the uncertainty accurately and efficiently for model-based methods. We further propose an uncertainty-aware policy optimization algorithm that optimizes the policy conservatively to encourage performance improvement with high probability. This can significantly alleviate the overfitting of policy to inaccurate models. Experiments show POMBU can outperform existing state-of-the-art policy optimization algorithms in terms of sample efficiency and asymptotic performance. Moreover, the experiments demonstrate the excellent robustness of POMBU compared to previous model-based approaches.

Download Full-text

Adaptive low-level control of autonomous underwater vehicles using deep reinforcement learning

Robotics and Autonomous Systems ◽

10.1016/j.robot.2018.05.016 ◽

2018 ◽

Vol 107 ◽

pp. 71-86 ◽

Cited By ~ 31

Author(s):

Ignacio Carlucho ◽

Mariano De Paula ◽

Sen Wang ◽

Yvan Petillot ◽

Gerardo G. Acosta

Keyword(s):

Reinforcement Learning ◽

Autonomous Underwater Vehicles ◽

Underwater Vehicles ◽

Level Control ◽

Low Level

Download Full-text

Shaping Model-Free Reinforcement-Learning with Model-Based Pseudorewards

10.32470/ccn.2018.1191-0 ◽

2018 ◽

Author(s):

Paul Krueger ◽

Thomas Griffiths

Keyword(s):

Reinforcement Learning ◽

Model Based ◽

Model Free

Download Full-text

Model-Based and Model-Free Social Cognition

10.31234/osf.io/ue6j2 ◽

2019 ◽

Author(s):

Leor M Hackel ◽

Jeffrey Jordan Berg ◽

Björn Lindström ◽

David Amodio

Keyword(s):

Reinforcement Learning ◽

Social Cognition ◽

Learning Strategies ◽

Memory Systems ◽

Learning Task ◽

Financial Advisors ◽

Model Based ◽

Model Free ◽

Systems Model ◽

Task Assessment

Do habits play a role in our social impressions? To investigate the contribution of habits to the formation of social attitudes, we examined the roles of model-free and model-based reinforcement learning in social interactions—computations linked in past work to habit and planning, respectively. Participants in this study learned about novel individuals in a sequential reinforcement learning paradigm, choosing financial advisors who led them to high- or low-paying stocks. Results indicated that participants relied on both model-based and model-free learning, such that each independently predicted choice during the learning task and self-reported liking in a post-task assessment. Specifically, participants liked advisors who could provide large future rewards as well as advisors who had provided them with large rewards in the past. Moreover, participants varied in their use of model-based and model-free learning strategies, and this individual difference influenced the way in which learning related to self-reported attitudes: among participants who relied more on model-free learning, model-free social learning related more to post-task attitudes. We discuss implications for attitudes, trait impressions, and social behavior, as well as the role of habits in a memory systems model of social cognition.

Download Full-text

Faculty Opinions recommendation of States versus rewards: dissociable neural prediction error signals underlying model-based and model-free reinforcement learning.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.4125957.4076054 ◽

2010 ◽

Author(s):

Susan Courtney

Keyword(s):

Reinforcement Learning ◽

Prediction Error ◽

Model Based ◽

Model Free

Download Full-text

Bestärkendes Lernen mittels Offline-Trajektorienplanung basierend auf iterativ approximierten Modellen

at - Automatisierungstechnik ◽

10.1515/auto-2020-0024 ◽

2020 ◽

Vol 68 (8) ◽

pp. 612-624

Author(s):

Max Pritzkoleit ◽

Robert Heedt ◽

Carsten Knoll ◽

Klaus Röbenack

Keyword(s):

Reinforcement Learning ◽

Neuronale Netze ◽

Model Based ◽

Künstliche Neuronale Netze

ZusammenfassungIn diesem Beitrag nutzen wir Künstliche Neuronale Netze (KNN) zur Approximation der Dynamik nichtlinearer (mechanischer) Systeme. Diese iterativ approximierten neuronalen Systemmodelle werden in einer Offline-Trajektorienplanung verwendet, um eine optimale Rückführung zu bestimmen, welche auf das reale System angewandt wird. Dieser Ansatz des modellbasierten bestärkenden Lernens (engl. model-based reinforcement learning (RL)) wird am Aufschwingen des Einfachwagenpendels zunächst simulativ evaluiert und zeigt gegenüber modellfreien RL-Ansätzen eine signifikante Verbesserung der Dateneffizienz. Weiterhin zeigen wir Experimentalergebnisse an einem Versuchsstand, wobei der vorgestellte Algorithmus innerhalb weniger Versuche in der Lage ist, eine für das System optimale Rückführung hinreichend gut zu approximieren.

Download Full-text

Hybrid deep reinforcement learning based eco-driving for low-level connected and automated vehicles along signalized corridors

Transportation Research Part C Emerging Technologies ◽

10.1016/j.trc.2021.102980 ◽

2021 ◽

Vol 124 ◽

pp. 102980

Author(s):

Qiangqiang Guo ◽

Ohay Angah ◽

Zhijun Liu ◽

Xuegang (Jeff) Ban

Keyword(s):

Reinforcement Learning ◽

Automated Vehicles ◽

Low Level

Download Full-text