General Purpose Low-Level Reinforcement Learning Control for Multi-Axis Rotor Aerial Vehicles

This paper proposes a multipurpose reinforcement learning based low-level multirotor unmanned aerial vehicles control structure constructed using neural networks with model-free training. Other low-level reinforcement learning controllers developed in studies have only been applicable to a model-specific and physical-parameter-specific multirotor, and time-consuming training is required when switching to a different vehicle. We use a 6-degree-of-freedom dynamic model combining acceleration-based control from the policy neural network to overcome these problems. The UAV automatically learns the maneuver by an end-to-end neural network from fusion states to acceleration command. The state estimation is performed using the data from on-board sensors and motion capture. The motion capture system provides spatial position information and a multisensory fusion framework fuses the measurement from the onboard inertia measurement units for compensating the time delay and low update frequency of the capture system. Without requiring expert demonstration, the trained control policy implemented using an improved algorithm can be applied to various multirotors with the output directly mapped to actuators. The algorithm’s ability to control multirotors in the hovering and the tracking task is evaluated. Through simulation and actual experiments, we demonstrate the flight control with a quadrotor and hexrotor by using the trained policy. With the same policy, we verify that we can stabilize the quadrotor and hexrotor in the air under random initial states.

Download Full-text

How to Train Your Quadrotor: A Framework for Consistently Smooth and Responsive Flight Control via Reinforcement Learning

ACM Transactions on Cyber-Physical Systems ◽

10.1145/3466618 ◽

2021 ◽

Vol 5 (4) ◽

pp. 1-24

Author(s):

Siddharth Mysore ◽

Bassel Mabsout ◽

Kate Saenko ◽

Renato Mancuso

Keyword(s):

Reinforcement Learning ◽

Flight Control ◽

Attitude Control ◽

Pid Controller ◽

High Performance ◽

Continuous Control ◽

Level Control ◽

Hardware Failure ◽

Low Level ◽

Reduced Power Consumption

We focus on the problem of reliably training Reinforcement Learning (RL) models (agents) for stable low-level control in embedded systems and test our methods on a high-performance, custom-built quadrotor platform. A common but often under-studied problem in developing RL agents for continuous control is that the control policies developed are not always smooth. This lack of smoothness can be a major problem when learning controllers as it can result in control instability and hardware failure. Issues of noisy control are further accentuated when training RL agents in simulation due to simulators ultimately being imperfect representations of reality—what is known as the reality gap . To combat issues of instability in RL agents, we propose a systematic framework, REinforcement-based transferable Agents through Learning (RE+AL), for designing simulated training environments that preserve the quality of trained agents when transferred to real platforms. RE+AL is an evolution of the Neuroflight infrastructure detailed in technical reports prepared by members of our research group. Neuroflight is a state-of-the-art framework for training RL agents for low-level attitude control. RE+AL improves and completes Neuroflight by solving a number of important limitations that hindered the deployment of Neuroflight to real hardware. We benchmark RE+AL on the NF1 racing quadrotor developed as part of Neuroflight. We demonstrate that RE+AL significantly mitigates the previously observed issues of smoothness in RL agents. Additionally, RE+AL is shown to consistently train agents that are flight capable and with minimal degradation in controller quality upon transfer. RE+AL agents also learn to perform better than a tuned PID controller, with better tracking errors, smoother control, and reduced power consumption. To the best of our knowledge, RE+AL agents are the first RL-based controllers trained in simulation to outperform a well-tuned PID controller on a real-world controls problem that is solvable with classical control.

Download Full-text

End-to-End Deep Reinforcement Learning for Image-Based UAV Autonomous Control

Applied Sciences ◽

10.3390/app11188419 ◽

2021 ◽

Vol 11 (18) ◽

pp. 8419

Author(s):

Jiang Zhao ◽

Jiaming Sun ◽

Zhihao Cai ◽

Longhong Wang ◽

Yingxun Wang

Keyword(s):

Reinforcement Learning ◽

Network Architecture ◽

Control Method ◽

Control Policy ◽

Input Image ◽

Autonomous Control ◽

Policy Network ◽

Model Free ◽

Control Command ◽

End To End

To achieve the perception-based autonomous control of UAVs, schemes with onboard sensing and computing are popular in state-of-the-art work, which often consist of several separated modules with respective complicated algorithms. Most methods depend on handcrafted designs and prior models with little capacity for adaptation and generalization. Inspired by the research on deep reinforcement learning, this paper proposes a new end-to-end autonomous control method to simplify the separate modules in the traditional control pipeline into a single neural network. An image-based reinforcement learning framework is established, depending on the design of the network architecture and the reward function. Training is performed with model-free algorithms developed according to the specific mission, and the control policy network can map the input image directly to the continuous actuator control command. A simulation environment for the scenario of UAV landing was built. In addition, the results under different typical cases, including both the small and large initial lateral or heading angle offsets, show that the proposed end-to-end method is feasible for perception-based autonomous control.

Download Full-text

Intelligent Multi-Microgrid Energy Management Based on Deep Neural Network and Model-Free Reinforcement Learning

IEEE Transactions on Smart Grid ◽

10.1109/tsg.2019.2930299 ◽

2020 ◽

Vol 11 (2) ◽

pp. 1066-1076 ◽

Cited By ~ 17

Author(s):

Yan Du ◽

Fangxing Li

Keyword(s):

Neural Network ◽

Reinforcement Learning ◽

Energy Management ◽

Deep Neural Network ◽

Model Free ◽

Microgrid Energy Management

Download Full-text

Totally model-free actor-critic recurrent neural-network reinforcement learning in non-Markovian domains

Annals of Operations Research ◽

10.1007/s10479-016-2366-2 ◽

2016 ◽

Vol 258 (1) ◽

pp. 107-131 ◽

Cited By ~ 2

Author(s):

Eiji Mizutani ◽

Stuart Dreyfus

Keyword(s):

Neural Network ◽

Reinforcement Learning ◽

Recurrent Neural Network ◽

Model Free

Download Full-text

Artificial Neural Network-Based Flight Control Using Distributed Sensors on Fixed-Wing Unmanned Aerial Vehicles

AIAA Scitech 2020 Forum ◽

10.2514/6.2020-1485 ◽

2020 ◽

Author(s):

Sergio A. Araujo-Estrada ◽

Shane P. Windsor

Keyword(s):

Neural Network ◽

Artificial Neural Network ◽

Unmanned Aerial Vehicles ◽

Flight Control ◽

Distributed Sensors ◽

Aerial Vehicles ◽

Artificial Neural

Download Full-text

Stochastic Actor-Executor-Critic for Image-to-Image Translation

Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2021/382 ◽

2021 ◽

Author(s):

Ziwei Luo ◽

Jing Hu ◽

Xin Wang ◽

Siwei Lyu ◽

Bin Kong ◽

...

Keyword(s):

Reinforcement Learning ◽

Control Policy ◽

High Dimensional ◽

Continuous Control ◽

Continuous Space ◽

Model Free ◽

Recent Success ◽

Image Translation ◽

Continuous State ◽

And Control

Training a model-free deep reinforcement learning model to solve image-to-image translation is difficult since it involves high-dimensional continuous state and action spaces. In this paper, we draw inspiration from the recent success of the maximum entropy reinforcement learning framework designed for challenging continuous control problems to develop stochastic policies over high dimensional continuous spaces including image representation, generation, and control simultaneously. Central to this method is the Stochastic Actor-Executor-Critic (SAEC) which is an off-policy actor-critic model with an additional executor to generate realistic images. Specifically, the actor focuses on the high-level representation and control policy by a stochastic latent action, as well as explicitly directs the executor to generate low-level actions to manipulate the state. Experiments on several image-to-image translation tasks have demonstrated the effectiveness and robustness of the proposed SAEC when facing high-dimensional continuous space problems.

Download Full-text

Control of chaotic systems by deep reinforcement learning

Proceedings of The Royal Society A Mathematical Physical and Engineering Sciences ◽

10.1098/rspa.2019.0351 ◽

2019 ◽

Vol 475 (2231) ◽

pp. 20190351 ◽

Cited By ~ 3

Author(s):

M. A. Bucci ◽

O. Semeraro ◽

A. Allauzen ◽

G. Wisniewski ◽

L. Cordier ◽

...

Keyword(s):

Reinforcement Learning ◽

Bluff Body ◽

Initial Conditions ◽

Control Policy ◽

Chaotic Regime ◽

Model Free ◽

Local Measurements ◽

Target States ◽

The One ◽

Learning Principles

Deep reinforcement learning (DRL) is applied to control a nonlinear, chaotic system governed by the one-dimensional Kuramoto–Sivashinsky (KS) equation. DRL uses reinforcement learning principles for the determination of optimal control solutions and deep neural networks for approximating the value function and the control policy. Recent applications have shown that DRL may achieve superhuman performance in complex cognitive tasks. In this work, we show that using restricted localized actuation, partial knowledge of the state based on limited sensor measurements and model-free DRL controllers, it is possible to stabilize the dynamics of the KS system around its unstable fixed solutions, here considered as target states. The robustness of the controllers is tested by considering several trajectories in the phase space emanating from different initial conditions; we show that DRL is always capable of driving and stabilizing the dynamics around target states. The possibility of controlling the KS system in the chaotic regime by using a DRL strategy solely relying on local measurements suggests the extension of the application of RL methods to the control of more complex systems such as drag reduction in bluff-body wakes or the enhancement/diminution of turbulent mixing.

Download Full-text

Deep reinforcement learning for the control of microbial co-cultures in bioreactors

10.1101/457366 ◽

2018 ◽

Cited By ~ 2

Author(s):

Neythen J. Treloar ◽

Alexander J.H. Fedorec ◽

Brian P. Ingalls ◽

Chris P. Barnes

Keyword(s):

Reinforcement Learning ◽

Microbial Communities ◽

Control Policy ◽

Continuous Control ◽

Natural Ecosystems ◽

Model Free ◽

Pure Cultures ◽

Integral Controller ◽

Proportional Integral Controller ◽

Bang Bang Control

AbstractMulti-species microbial communities are widespread in natural ecosystems. When employed for biomanufacturing, engineered synthetic communities have shown increased productivity (in comparison with pure cultures) and allow for the reduction of metabolic load by compartmentalising bioprocesses between multiple sub-populations. Despite these benefits, co-cultures are rarely used in practice because control over the constituent species of an assembled community has proven challenging. Here we demonstrate, in silico, the efficacy of an approach from artificial intelligence – reinforcement learning – in the control of co-cultures within continuous bioreactors. We confirm that feedback via reinforcement learning can be used to maintain populations at target levels, and that model-free performance with bang-bang control can outperform traditional proportional integral controller with continuous control, when faced with infrequent sampling. Further, we demonstrate that a satisfactory control policy can be learned in one twenty-four hour experiment, by running five bioreactors in parallel. Finally, we show that reinforcement learning can directly optimise the output of a co-culture bioprocess. Overall, reinforcement learning is a promising technique for the control of microbial communities.

Download Full-text

Adaptive Quadruped Balance Control for Dynamic Environments Using Maximum-Entropy Reinforcement Learning

Sensors ◽

10.3390/s21175907 ◽

2021 ◽

Vol 21 (17) ◽

pp. 5907

Author(s):

Haoran Sun ◽

Tingting Fu ◽

Yuanhuai Ling ◽

Chaoming He

Keyword(s):

Neural Network ◽

Reinforcement Learning ◽

Maximum Entropy ◽

Maximum Entropy Method ◽

Balance Control ◽

External Disturbance ◽

Control Policy ◽

Dynamic Environments ◽

Entropy Method ◽

Analytical Models

External disturbance poses the primary threat to robot balance in dynamic environments. This paper provides a learning-based control architecture for quadrupedal self-balancing, which is adaptable to multiple unpredictable scenes of external continuous disturbance. Different from conventional methods which construct analytical models which explicitly reason the balancing process, our work utilized reinforcement learning and artificial neural network to avoid incomprehensible mathematical modeling. The control policy is composed of a neural network and a Tanh Gaussian policy, which implicitly establishes the fuzzy mapping from proprioceptive signals to action commands. During the training process, the maximum-entropy method (soft actor-critic algorithm) is employed to endow the policy with powerful exploration and generalization ability. The trained policy is validated in both simulations and realistic experiments with a customized quadruped robot. The results demonstrate that the policy can be easily transferred to the real world without elaborate configurations. Moreover, although this policy is trained in merely one specific vibration condition, it demonstrates robustness under conditions that were never encountered during training.

Download Full-text

Training Champion-level Race Car Drivers Using Deep Reinforcement Learning

10.21203/rs.3.rs-795954/v1 ◽

2021 ◽

Author(s):

Peter Wurman ◽

Samuel Barrett ◽

Kenta Kawamoto ◽

James MacGlashan ◽

Kaushik Subramanian ◽

...

Keyword(s):

Reinforcement Learning ◽

Real Time ◽

Learning Algorithm ◽

Integrated Control ◽

Extreme Case ◽

Control Policy ◽

Reward Function ◽

Race Car ◽

Model Free ◽

Race Car Drivers

Abstract Many potential applications of artificial intelligence involve making real-time decisions in physical systems. Automobile racing represents an extreme case of real-time decision making in close proximity to other highly-skilled drivers while near the limits of vehicular control. Racing simulations, such as the PlayStation game Gran Turismo, faithfully reproduce the nonlinear control challenges of real race cars while also encapsulating the complex multi-agent interactions. We attack, and solve for the first time, the simulated racing challenge using model-free deep reinforcement learning. We introduce a novel reinforcement learning algorithm and enhance the learning process with mixed scenario training to encourage the agent to incorporate racing tactics into an integrated control policy. In addition, we construct a reward function that enables the agent to adhere to the sport's under-specified racing etiquette rules. We demonstrate the capabilities of our agent, GT Sophy, by winning two of three races against four of the world's best Gran Turismo drivers and being competitive in the overall team score. By showing that these techniques can be successfully used to train championship-level race car drivers, we open up the possibility of their use in other complex dynamical systems and real-world applications.

Download Full-text