Differences and similarities between reinforcement learning and the classical optimal control framework

AbstractIn this paper, we will deal with a linear quadratic optimal control problem with unknown dynamics. As a modeling assumption, we will suppose that the knowledge that an agent has on the current system is represented by a probability distribution $$\pi $$ π on the space of matrices. Furthermore, we will assume that such a probability measure is opportunely updated to take into account the increased experience that the agent obtains while exploring the environment, approximating with increasing accuracy the underlying dynamics. Under these assumptions, we will show that the optimal control obtained by solving the “average” linear quadratic optimal control problem with respect to a certain $$\pi $$ π converges to the optimal control driven related to the linear quadratic optimal control problem governed by the actual, underlying dynamics. This approach is closely related to model-based reinforcement learning algorithms where prior and posterior probability distributions describing the knowledge on the uncertain system are recursively updated. In the last section, we will show a numerical test that confirms the theoretical results.

Download Full-text

Data-driven dynamic multi-objective optimal control: A Hamiltonian-inequality driven satisficing reinforcement learning approach

IFAC-PapersOnLine ◽

10.1016/j.ifacol.2020.12.2275 ◽

2020 ◽

Vol 53 (2) ◽

pp. 8070-8075

Author(s):

Majid Mazouchi ◽

Yongliang Yang ◽

Hamidreza Modares

Keyword(s):

Optimal Control ◽

Reinforcement Learning ◽

Data Driven ◽

Learning Approach ◽

Multi Objective

Download Full-text

Reinforcement Learning and Adaptive Optimal Control for Continuous-Time Nonlinear Systems: A Value Iteration Approach

IEEE Transactions on Neural Networks and Learning Systems ◽

10.1109/tnnls.2020.3045087 ◽

2021 ◽

pp. 1-10

Author(s):

Tao Bian ◽

Zhong-Ping Jiang

Keyword(s):

Optimal Control ◽

Reinforcement Learning ◽

Nonlinear Systems ◽

Continuous Time ◽

Value Iteration ◽

Adaptive Optimal Control ◽

A Value

Download Full-text

Hierarchical Terrain-Aware Control for Quadrupedal Locomotion by Combining Deep Reinforcement Learning and Optimal Control

10.1109/iros51168.2021.9636738 ◽

2021 ◽

Author(s):

Qingfeng Yao ◽

Jilong Wang ◽

Donglin Wang ◽

Shuyu Yang ◽

Hongyin Zhang ◽

...

Keyword(s):

Optimal Control ◽

Reinforcement Learning ◽

Quadrupedal Locomotion

Download Full-text

An Economic Model Predictive Control Approach for Wind Power Smoothing and Tower Load Mitigation

Volume 2: Control and Optimization of Connected and Automated Ground Vehicles; Dynamic Systems and Control Education; Dynamics and Control of Renewable Energy Systems; Energy Harvesting; Energy Systems; Estimation and Identification; Intelligent Transportation and Vehicles; Manufacturing; Mechatronics; Modeling and Control of IC Engines and Aftertreatment Systems; Modeling and Control of IC Engines and Powertrain Systems; Modeling and Management of Power Systems ◽

10.1115/dscc2018-9032 ◽

2018 ◽

Author(s):

Mohamed M. Alhneaish ◽

Mohamed L. Shaltout ◽

Sayed M. Metwalli

Keyword(s):

Optimal Control ◽

Optimal Control Problem ◽

Model Predictive Control ◽

Wind Power ◽

Predictive Control ◽

Economic Model ◽

Control Algorithm ◽

Fatigue Load ◽

Economic Model Predictive Control ◽

Control Framework

An economic model predictive control framework is presented in this study for an integrated wind turbine and flywheel energy storage system. The control objective is to smooth wind power output and mitigate tower fatigue load. The optimal control problem within the model predictive control framework has been formulated as a convex optimal control problem with linear dynamics and convex constraints that can be solved globally. The performance of the proposed control algorithm is compared to that of a standard wind turbine controller. The effect of the proposed control actions on the fatigue loads acting on the tower and blades is studied. The simulation results, with various wind scenarios, showed the ability of the proposed control algorithm to achieve the aforementioned objectives in terms of smoothing output power and mitigating tower fatigue load at the cost of a minimal reduction of the wind energy harvested.

Download Full-text

A hybrid modeling and optimal control framework for layer-by-layer radiative processing of thick sections

2015 American Control Conference (ACC) ◽

10.1109/acc.2015.7171892 ◽

2015 ◽

Cited By ~ 4

Author(s):

Adamu Yebi ◽

Beshah Ayalew

Keyword(s):

Optimal Control ◽

Hybrid Modeling ◽

Layer By Layer ◽

Control Framework

Download Full-text

Reinforcement Learning-Based Approximate Optimal Control for Attitude Reorientation Under State Constraints

IEEE Transactions on Control Systems Technology ◽

10.1109/tcst.2020.3007401 ◽

2020 ◽

pp. 1-10

Author(s):

Hongyang Dong ◽

Xiaowei Zhao ◽

Haoyang Yang

Keyword(s):

Optimal Control ◽

Reinforcement Learning ◽

State Constraints

Download Full-text

Online Optimal Control of Robotic Systems with Single Critic NN-Based Reinforcement Learning

Complexity ◽

10.1155/2021/8839391 ◽

2021 ◽

Vol 2021 ◽

pp. 1-7

Author(s):

Xiaoyi Long ◽

Zheng He ◽

Zhongyuan Wang

Keyword(s):

Optimal Control ◽

Reinforcement Learning ◽

Tracking Control ◽

Learning Algorithm ◽

Tracking Error ◽

Adaptive Dynamic Programming ◽

Robotic Systems ◽

Control Synthesis ◽

Optimal Tracking ◽

Optimal Tracking Control

This paper suggests an online solution for the optimal tracking control of robotic systems based on a single critic neural network (NN)-based reinforcement learning (RL) method. To this end, we rewrite the robotic system model as a state-space form, which will facilitate the realization of optimal tracking control synthesis. To maintain the tracking response, a steady-state control is designed, and then an adaptive optimal tracking control is used to ensure that the tracking error can achieve convergence in an optimal sense. To solve the obtained optimal control via the framework of adaptive dynamic programming (ADP), the command trajectory to be tracked and the modified tracking Hamilton-Jacobi-Bellman (HJB) are all formulated. An online RL algorithm is the developed to address the HJB equation using a critic NN with online learning algorithm. Simulation results are given to verify the effectiveness of the proposed method.

Download Full-text