Overview of Reinforcement Learning and its Application in Control Theory

Author(s):  
Jan Sikora ◽  
Renata Wagnerova
2013 ◽  
Vol 2013 ◽  
pp. 1-16 ◽  
Author(s):  
Bo Dong ◽  
Yuanchun Li

A novel decentralized reinforcement learning robust optimal tracking control theory for time varying constrained reconfigurable modular robots based on action-critic-identifier (ACI) and state-action value function (Q-function) has been presented to solve the problem of the continuous time nonlinear optimal control policy for strongly coupled uncertainty robotic system. The dynamics of time varying constrained reconfigurable modular robot is described as a synthesis of interconnected subsystem, and continuous time state equation andQ-function have been designed in this paper. Combining with ACI and RBF network, the global uncertainty of the subsystem and the HJB (Hamilton-Jacobi-Bellman) equation have been estimated, where critic-NN and action-NN are used to approximate the optimalQ-function and the optimal control policy, and the identifier is adopted to identify the global uncertainty as well as RBF-NN which is used to update the weights of ACI-NN. On this basis, a novel decentralized robust optimal tracking controller of the subsystem is proposed, so that the subsystem can track the desired trajectory and the tracking error can converge to zero in a finite time. The stability of ACI and the robust optimal tracking controller are confirmed by Lyapunov theory. Finally, comparative simulation examples are presented to illustrate the effectiveness of the proposed ACI and decentralized control theory.


2020 ◽  
Author(s):  
Sean Nolan ◽  
Matthew Lanier ◽  
Andrew Haines ◽  
Linus Mockus ◽  
Kristopher Ezra ◽  
...  

2020 ◽  
Author(s):  
Sean Nolan ◽  
Matthew Lanier ◽  
Andrew Haines ◽  
Linus Mockus ◽  
Kristopher Ezra ◽  
...  

Author(s):  
Juan Parras ◽  
Santiago Zazo

The significant increase in the number of interconnected devices has brought new services and applications, as well as new network vulnerabilities. The increasing hardware capacities of these devices and the developments in the artificial intelligence field mean that new and complex attack methods are being developed. This chapter focuses on the backoff attack in a wireless network using CSMA/CA multiple access, and it shows that an intelligent attacker, making use of control theory, can successfully exploit a sequential probability ratio test-based defense mechanism. Also, recent developments in the deep reinforcement learning field allows that attackers that do not have full knowledge of the defense mechanism are able to successfully learn to attack it. Thus, this chapter illustrates by means of the backoff attack, the possibilities that the recent advances in the artificial intelligence field bring to intelligent attackers, and highlights the importance of researching in intelligent defense methods able to cope with such attackers.


2020 ◽  
Vol 2020 (4) ◽  
pp. 43-54
Author(s):  
S.V. Khoroshylov ◽  
◽  
M.O. Redka ◽  

The aim of the article is to approximate optimal relative control of an underactuated spacecraft using reinforcement learning and to study the influence of various factors on the quality of such a solution. In the course of this study, methods of theoretical mechanics, control theory, stability theory, machine learning, and computer modeling were used. The problem of in-plane spacecraft relative control using only control actions applied tangentially to the orbit is considered. This approach makes it possible to reduce the propellant consumption of reactive actuators and to simplify the architecture of the control system. However, in some cases, methods of the classical control theory do not allow one to obtain acceptable results. In this regard, the possibility of solving this problem by reinforcement learning methods has been investigated, which allows designers to find control algorithms close to optimal ones as a result of interactions of the control system with the plant using a reinforcement signal characterizing the quality of control actions. The well-known quadratic criterion is used as a reinforcement signal, which makes it possible to take into account both the accuracy requirements and the control costs. A search for control actions based on reinforcement learning is made using the policy iteration algorithm. This algorithm is implemented using the actor–critic architecture. Various representations of the actor for control law implementation and the critic for obtaining value function estimates using neural network approximators are considered. It is shown that the optimal control approximation accuracy depends on a number of features, namely, an appropriate structure of the approximators, the neural network parameter updating method, and the learning algorithm parameters. The investigated approach makes it possible to solve the considered class of control problems for controllers of different structures. Moreover, the approach allows the control system to refine its control algorithms during the spacecraft operation.


Author(s):  
Siddharth Sharma

The prevalence of differential equations as a mathematical technique has refined the fields of control theory and constrained optimization due to the newfound ability to accurately model chaotic, unbalanced systems. However, in recent research, systems are increasingly more nonlinear and difficult to model using Differential Equations only. Thus, a newer technique is to use policy iteration and Reinforcement Learning, techniques that center around an action and reward sequence for a controller. Reinforcement Learning (RL) can be applied to control theory problems since a system can robustly apply RL in a dynamic environment such as the cartpole system (an inverted pendulum). This solution successfully avoids use of PID or other dynamics optimization systems, in favor of a more robust, reward-based control mechanism. This paper applies RL and Q-Learning to the classic cartpole problem, while also discussing the mathematical background and differential equations which are used to model the aforementioned system.


Sign in / Sign up

Export Citation Format

Share Document