A Tour of Reinforcement Learning: The View from Continuous Control

This article surveys reinforcement learning from the perspective of optimization and control, with a focus on continuous control applications. It reviews the general formulation, terminology, and typical experimental implementations of reinforcement learning as well as competing solution paradigms. In order to compare the relative merits of various techniques, it presents a case study of the linear quadratic regulator (LQR) with unknown dynamics, perhaps the simplest and best-studied problem in optimal control. It also describes how merging techniques from learning theory and control can provide nonasymptotic characterizations of LQR performance and shows that these characterizations tend to match experimental behavior. In turn, when revisiting more complex applications, many of the observed phenomena in LQR persist. In particular, theory and experiment demonstrate the role and importance of models and the cost of generality in reinforcement learning algorithms. The article concludes with a discussion of some of the challenges in designing learning systems that safely and reliably interact with complex and uncertain environments and how tools from reinforcement learning and control might be combined to approach these challenges.

Download Full-text

Modeling and Control of a Complex Interacting Process

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.403-408.3758 ◽

2011 ◽

Vol 403-408 ◽

pp. 3758-3762

Author(s):

Subhajit Patra ◽

Prabirkumar Saha

Keyword(s):

Storage System ◽

Linear Quadratic Regulator ◽

Model Parameters ◽

Linear Quadratic ◽

Modeling And Control ◽

Dynamic Matrix ◽

Liquid Storage ◽

Efficient Control ◽

And Control

In this paper, two efficient control algorithms are discussed viz., Linear Quadratic Regulator (LQR) and Dynamic Matrix Controller (DMC) and their applicability has been demonstrated through case study with a complex interacting process viz., a laboratory based four tank liquid storage system. The process has Two Input Two Output (TITO) structure and is available for experimental study. A mathematical model of the process has been developed using first principles. Model parameters have been estimated through the experimentation results. The performance of the controllers (LQR and DMC) has been compared to that of industrially more accepted PID controller.

Download Full-text

Autopilot design for an aircraft by using Luenberger observer design

Aircraft Engineering and Aerospace Technology ◽

10.1108/aeat-11-2016-0224 ◽

2018 ◽

Vol 90 (5) ◽

pp. 858-868 ◽

Cited By ~ 1

Author(s):

Muhammad Taimoor ◽

Li Aijun ◽

Rooh ul Amin ◽

Hongshi Lu

Keyword(s):

Design Methodology ◽

Research Work ◽

Linear Quadratic Regulator ◽

Observer Design ◽

Linear Quadratic ◽

Content Type ◽

Luenberger Observer ◽

Level Flight ◽

Aerial Vehicle ◽

The Cost

Purpose The purpose of this paper is to design linear quadratic regulator (LQR) based Luenberger observer for the estimation of unknown states of aircraft. Design/methodology/approach In this paper, the LQR-based Luenberger observer is deliberated for autonomous level flight of unmanned aerial vehicle (UAV) which has been attained productively. Various modes like phugoid and roll modes are exploited for controlling the rates of UAV. The Luenberger observer is exploited for estimation of the mysterious states of the system. The rates of roll, yaw and pitch are used as an input to the observer, while the remaining states such as velocities and angles have been anticipated. The main advantage of using Luenberger observer was to reduce the cost of the system which has been achieved lucratively. The Luenberger observer proposes sturdiness at the rate of completion to conquest over the turmoil and insecurities to overcome the privileged recital. The FlightGear simulator is exploited for the endorsement of the recital of the Luenberger observer-based autopilot. The level flight has been subjugated lucratively and has been legitimated by exploiting the FlightGear simulator. The authenticated and the validated results are offered in this paper. Microsoft Visual Studio has been engaged as a medium between the MATLAB and FlightGear Simulator. Findings The suggested observer based on LQR ensures the lucrative approximation of the unknown states of the system as well as the successful level flight of the system. The Luenberger observer is used for approximation of states while LQR is used as controller. Originality/value In this research work, not only the estimation of unknown states of both longitudinal and lateral model is made but also the level flight is achieved by using those estimated states and the autopilot is validated by using the FlightGear, while in most of the research work only the estimation is made of only longitudinal or lateral model.

Download Full-text

Mathematical Modelling and Control of a Two-Wheeled PUMA-LikeVehicle

Mechanical Engineering Research ◽

10.5539/mer.v6n2p11 ◽

2016 ◽

Vol 6 (2) ◽

pp. 11 ◽

Cited By ~ 1

Author(s):

Khaled M Goher

Keyword(s):

Mathematical Modelling ◽

Sliding Mode ◽

Impact Load ◽

State Space Model ◽

Linear Quadratic Regulator ◽

The Body ◽

Pole Placement ◽

General Motors ◽

Linear Quadratic ◽

And Control

<p class="1Body">This paper presents mathematical modelling and control of a two-wheeled single-seat vehicle. The design of the vehicle is inspired by the Personal Urban Mobility and Accessibility (PUMA) vehicle developed by General Motors® in collaboration with Segway®. The body of the vehicle is designed to have two main parts. The vehicle is activated using three motors; a linear motor to activate the upper part in a sliding mode and two DC motors activating the vehicle while moving forward/backward and/or manoeuvring. Two stages proportional-integral-derivative (PID) control schemes are designed and implemented on the system models. The state space model of the vehicle is derived from the linearized equations. Controller based on the Linear Quadratic Regulator (LQR) and the pole placement techniques are developed and implemented. Further investigation of the robustness of the developed LQR and the pole placement techniques is emphasized through various experiments using an applied impact load on the vehicle.</p>

Download Full-text

Online Reinforcement Learning-Based Control of an Active Suspension System Using the Actor Critic Approach

Applied Sciences ◽

10.3390/app10228060 ◽

2020 ◽

Vol 10 (22) ◽

pp. 8060

Author(s):

Ahmad Fares ◽

Ahmad Bani Younes

Keyword(s):

Reinforcement Learning ◽

Least Squares Method ◽

Linear Quadratic Regulator ◽

Active Suspension ◽

Suspension System ◽

Recursive Least Squares ◽

Linear Quadratic ◽

Reward Function ◽

Road Profile ◽

The One

In this paper, a controller learns to adaptively control an active suspension system using reinforcement learning without prior knowledge of the environment. The Temporal Difference (TD) advantage actor critic algorithm is used with the appropriate reward function. The actor produces the actions, and the critic criticizes the actions taken based on the new state of the system. During the training process, a simple and uniform road profile is used while maintaining constant system parameters. The controller is tested using two road profiles: the first one is similar to the one used during the training, while the other one is bumpy with an extended range. The performance of the controller is compared with the Linear Quadratic Regulator (LQR) and optimum Proportional-Integral-Derivative (PID), and the adaptiveness is tested by estimating some of the system’s parameters using the Recursive Least Squares method (RLS). The results show that the controller outperforms the LQR in terms of the lower overshoot and the PID in terms of reducing the acceleration.

Download Full-text

Control Design and Validation for Floating Piston Electro-Pneumatic Gearbox Actuator

Applied Sciences ◽

10.3390/app10103514 ◽

2020 ◽

Vol 10 (10) ◽

pp. 3514 ◽

Cited By ~ 2

Author(s):

Adam Szabo ◽

Tamas Becsi ◽

Peter Gaspar

Keyword(s):

Piecewise Linear ◽

Control Design ◽

Continuous Control ◽

Control Methods ◽

Linear Quadratic ◽

Modeling And Control ◽

Control Frequency ◽

Heavy Duty Vehicle ◽

High Control ◽

And Control

The paper presents the modeling and control design of a floating piston electro-pneumatic gearbox actuator and, moreover, the industrial validation of the controller system. As part of a heavy-duty vehicle, it needs to meet strict and contradictory requirements and units applying the system with different supply pressures in order to operate under various environmental conditions. Because of the high control frequency domain of the real system, post-modern control methods with high computational demands could not be used as they do not meet real-time requirements on automotive level. During the modeling phase, the essential simplifications are shown with the awareness of the trade-off between calculation speed and numerical accuracy to generate a multi-state piecewise-linear system. Two LTI control methods are introduced, i.e., a PD and an Linear-Quadratic Regulators (LQR) solution, in which the continuous control signals are transformed into discrete voltage solenoid commands for the valves. The validation of both the model and the control system are performed on a real physical implementation. The results show that both modeling and control design are suitable for the control tasks using floating piston cylinders and, moreover, these methods can be extended to electro-pneumatic cylinders with different layouts.

Download Full-text

Acoustic Control Modeling of Conical Bores With Actuating Boundary Conditions

Volume 6B: 18th Biennial Conference on Mechanical Vibration and Noise ◽

10.1115/detc2001/vib-21480 ◽

2001 ◽

Author(s):

Kevin M. Farinholt ◽

Donald J. Leo

Keyword(s):

Boundary Conditions ◽

Driving Forces ◽

State Space Model ◽

Linear Quadratic Regulator ◽

Mode Shapes ◽

Linear Quadratic ◽

Acoustic Control ◽

Average Control ◽

The One ◽

And Control

Abstract An investigation of the natural frequencies and mode shapes associated with sealed conical bores having actuating boundary conditions is presented. Beginning with the one dimensional wave equation for spherically expanding waves, modal characteristics are developed as functions of cone geometry and actuator parameters. This paper presents both analytical and experimental comparisons for the purpose of validating model and development techniques. An investigation of the orthogonality and adjointness of the solution is presented. A discussion of incorporating driving forces in the system model for the purpose of coupling control actuators with internal acoustics is also included. Including these driving forces, a state space model of the system is developed for the purpose of applying modern feedback control. This paper concludes with a study on applying Linear Quadratic Regulator techniques to this system, relating tradeoffs between spatially averaged pressure and control voltages. The results of our simulations indicate that pressure reductions of 30% are attainable with average control voltages of 14.4 volts, given an example geometry.

Download Full-text

System design for inverted pendulum using LQR control via IoT

International Journal for Simulation and Multidisciplinary Design Optimization ◽

10.1051/smdo/2020007 ◽

2020 ◽

Vol 11 ◽

pp. 12

Author(s):

Dechrit Maneetham ◽

Petrus Sutyasadi

Keyword(s):

Inverted Pendulum ◽

Control Method ◽

Linear Quadratic Regulator ◽

Linear Quadratic ◽

Lqr Control ◽

Lqr Controller ◽

Model Tuning ◽

And Performance ◽

Performance Results ◽

And Control

This research proposes control method to balance and stabilize an inverted pendulum. A robust control was analyzed and adjusted to the model output with real time feedback. The feedback was obtained using state space equation of the feedback controller. A linear quadratic regulator (LQR) model tuning and control was applied to the inverted pendulum using internet of things (IoT). The system's conditions and performance could be monitored and controlled via personal computer (PC) and mobile phone. Finally, the inverted pendulum was able to be controlled using the LQR controller and the IoT communication developed will monitor to check the all conditions and performance results as well as help the inverted pendulum improved various operations of IoT control is discussed.

Download Full-text

Augmented-Stability Controller Design for a UAV’s Longitudinal Motion

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.63-64.533 ◽

2011 ◽

Vol 63-64 ◽

pp. 533-536

Author(s):

Xiao Jun Xing ◽

Jian Guo Yan

Keyword(s):

Dynamic Characteristics ◽

Controller Design ◽

Damping Ratio ◽

Linear Quadratic Regulator ◽

Longitudinal Motion ◽

Linear Quadratic ◽

Short Period ◽

Quality Specifications ◽

Simulation Results ◽

And Control

With the purpose of overcoming the defect that unmanned air vehicles (UAVs) are easily disturbed by air current and tend to be unstable, an augmented-stability controller was developed for a certain UAV’s longitudinal motion. According to requirements of short-period damping ratio and control anticipation parameter (CAP) in flight quality specifications of GJB185-86 and C*, linear quadratic regulator (LQR) theory was used in the augmented-stability controller’s design. The simulation results show that the augmented-stability controller not only improves the UAV’s stability and dynamic characteristics but also enhances the UAV’s robustness.

Download Full-text

A Strategy To Estimate Unknown Viral Diversity in Mammals

mBio ◽

10.1128/mbio.00598-13 ◽

2013 ◽

Vol 4 (5) ◽

Cited By ~ 193

Author(s):

Simon J. Anthony ◽

Jonathan H. Epstein ◽

Kris A. Murray ◽

Isamara Navarrete-Macias ◽

Carlos M. Zambrana-Torrelio ◽

...

Keyword(s):

Public Health Intervention ◽

Time Frame ◽

Mammalian Host ◽

Viral Diversity ◽

Emerging Zoonoses ◽

Simple Extrapolation ◽

Occurrence Patterns ◽

The Cost ◽

And Control

ABSTRACTThe majority of emerging zoonoses originate in wildlife, and many are caused by viruses. However, there are no rigorous estimates of total viral diversity (here termed “virodiversity”) for any wildlife species, despite the utility of this to future surveillance and control of emerging zoonoses. In this case study, we repeatedly sampled a mammalian wildlife host known to harbor emerging zoonotic pathogens (the Indian Flying Fox,Pteropus giganteus) and used PCR with degenerate viral family-level primers to discover and analyze the occurrence patterns of 55 viruses from nine viral families. We then adapted statistical techniques used to estimate biodiversity in vertebrates and plants and estimated the total viral richness of these nine families inP. giganteusto be 58 viruses. Our analyses demonstrate proof-of-concept of a strategy for estimating viral richness and provide the first statistically supported estimate of the number of undiscovered viruses in a mammalian host. We used a simple extrapolation to estimate that there are a minimum of 320,000 mammalian viruses awaiting discovery within these nine families, assuming all species harbor a similar number of viruses, with minimal turnover between host species. We estimate the cost of discovering these viruses to be ~$6.3 billion (or ~$1.4 billion for 85% of the total diversity), which if annualized over a 10-year study time frame would represent a small fraction of the cost of many pandemic zoonoses.IMPORTANCERecent years have seen a dramatic increase in viral discovery efforts. However, most lack rigorous systematic design, which limits our ability to understand viral diversity and its ecological drivers and reduces their value to public health intervention. Here, we present a new framework for the discovery of novel viruses in wildlife and use it to make the first-ever estimate of the number of viruses that exist in a mammalian host. As pathogens continue to emerge from wildlife, this estimate allows us to put preliminary bounds around the potential size of the total zoonotic pool and facilitates a better understanding of where best to allocate resources for the subsequent discovery of global viral diversity.

Download Full-text

The Coupled Orbit-Attitude Dynamics and Control of Electric Sail in Displaced Solar Orbits

International Journal of Aerospace Engineering ◽

10.1155/2017/3812397 ◽

2017 ◽

Vol 2017 ◽

pp. 1-12 ◽

Cited By ~ 2

Author(s):

Mingying Huo ◽

He Liao ◽

Yanfang Liu ◽

Naiming Qi

Keyword(s):

Linear Quadratic Regulator ◽

Coupled System ◽

Control Force ◽

Attitude Dynamics ◽

Voltage Distribution ◽

Linear Quadratic ◽

Displaced Orbit ◽

Dynamics And Control ◽

And Control ◽

Small Torque

Displaced solar orbits for spacecraft propelled by electric sails are investigated. Since the propulsive thrust is induced by the sail attitude, the orbital and attitude dynamics of electric-sail-based spacecraft are coupled and required to be investigated together. However, the coupled dynamics and control of electric sails have not been discussed in most published literatures. In this paper, the equilibrium point of the coupled dynamical system in displaced orbit is obtained, and its stability is analyzed through a linearization. The results of stability analysis show that only some of the orbits are marginally stable. For unstable displaced orbits, linear quadratic regulator is employed to control the coupled attitude-orbit system. Numerical simulations show that the proposed strategy can control the coupled system and a small torque can stabilize both the attitude and orbit. In order to generate the control force and torque, the voltage distribution problem is studied in an optimal framework. The numerical results show that the control force and torque of electric sail can be realized by adjusting the voltage distribution of charged tethers.

Download Full-text