Self-tuning Gains of a Quadrotor using a Simple Model for Policy Gradient Reinforcement Learning

Proportional–integral–derivative (PID) control remains the primary choice for industrial process control problems. However, owing to the increased complexity and precision requirement of current industrial processes, a conventional PID controller may provide only unsatisfactory performance, or the determination of PID gains may become quite difficult. To address these issues, studies have suggested the use of reinforcement learning in combination with PID control laws. The present study aims to extend this idea to the control of a multiple-input multiple-output (MIMO) process that suffers from both physical coupling between inputs and a long input/output lag. We specifically target a thin film production process as an example of such a MIMO process and propose a self-tuning two-degree-of-freedom PI controller for the film thickness control problem. Theoretically, the self-tuning functionality of the proposed control system is based on the actor-critic reinforcement learning algorithm. We also propose a method to compensate for the input coupling. Numerical simulations are conducted under several likely scenarios to demonstrate the enhanced control performance relative to that of a conventional static gain PI controller.

Download Full-text

Reinforcement Learning based on MPC and the Stochastic Policy Gradient Method

2021 American Control Conference (ACC) ◽

10.23919/acc50511.2021.9482765 ◽

2021 ◽

Author(s):

Sebastien Gros ◽

Mario Zanon

Keyword(s):

Reinforcement Learning ◽

Gradient Method ◽

Policy Gradient

Download Full-text

Strategy Generation Based on Reinforcement Learning with Deep Deterministic Policy Gradient for UCAV

2020 16th International Conference on Control, Automation, Robotics and Vision (ICARCV) ◽

10.1109/icarcv50220.2020.9305446 ◽

2020 ◽

Author(s):

Yunhong Ma ◽

Shuyao Bai ◽

Yifei Zhao ◽

Chao Song ◽

Jie Yang

Keyword(s):

Reinforcement Learning ◽

Policy Gradient

Download Full-text

Formula-E race strategy development using distributed policy gradient reinforcement learning

Knowledge-Based Systems ◽

10.1016/j.knosys.2021.106781 ◽

2021 ◽

Vol 216 ◽

pp. 106781

Author(s):

Xuze Liu ◽

Abbas Fotouhi ◽

Daniel J. Auger

Keyword(s):

Reinforcement Learning ◽

Strategy Development ◽

Policy Gradient

Download Full-text

Bias Correction in Reinforcement Learning via the Deterministic Policy Gradient Method for MPC-Based Policies

2021 American Control Conference (ACC) ◽

10.23919/acc50511.2021.9483016 ◽

2021 ◽

Author(s):

Sebastien Gros ◽

Mario Zanon

Keyword(s):

Reinforcement Learning ◽

Gradient Method ◽

Bias Correction ◽

Policy Gradient

Download Full-text

Reinforced Knowledge Distillation: Multi-class Imbalanced Classifier Based on Policy Gradient Reinforcement Learning

Neurocomputing ◽

10.1016/j.neucom.2021.08.040 ◽

2021 ◽

Author(s):

Saite Fan ◽

Xinmin Zhang ◽

Zhihuan Song

Keyword(s):

Reinforcement Learning ◽

Policy Gradient ◽

Knowledge Distillation

Download Full-text

Preceding vehicle following algorithm with human driving characteristics

Proceedings of the Institution of Mechanical Engineers Part D Journal of Automobile Engineering ◽

10.1177/0954407020981546 ◽

2021 ◽

pp. 095440702098154

Author(s):

Feng Pan ◽

Hong Bao

Keyword(s):

Reinforcement Learning ◽

Weight Vector ◽

Gradient Algorithm ◽

Inner Product ◽

Inverse Reinforcement Learning ◽

Reward Function ◽

Human Driver ◽

Policy Gradient ◽

Preceding Vehicle ◽

Action Spaces

This paper proposes a new approach of using reinforcement learning (RL) to train an agent to perform the task of vehicle following with human driving characteristics. We refer to the ideal of inverse reinforcement learning to design the reward function of the RL model. The factors that need to be weighed in vehicle following were vectorized into reward vectors, and the reward function was defined as the inner product of the reward vector and weights. Driving data of human drivers was collected and analyzed to obtain the true reward function. The RL model was trained with the deterministic policy gradient algorithm because the state and action spaces are continuous. We adjusted the weight vector of the reward function so that the value vector of the RL model could continuously approach that of a human driver. After dozens of rounds of training, we selected the policy with the nearest value vector to that of a human driver and tested it in the PanoSim simulation environment. The results showed the desired performance for the task of an agent following the preceding vehicle safely and smoothly.

Download Full-text

The synthesis method of regulators for multichannel systems using neural networks

Вычислительные технологии ◽

10.25743/ict.2020.25.3.012 ◽

2020 ◽

pp. 111-118

Author(s):

Александр Александрович Воевода ◽

Дмитрий Олегович Романников

Keyword(s):

Neural Networks ◽

Reinforcement Learning ◽

Automatic Control ◽

Control Systems ◽

Synthesis Method ◽

Neural Net ◽

Initial State ◽

Set Point ◽

Automatic Control Systems ◽

Policy Gradient

Синтез регуляторов для многоканальных систем - актуальная и сложная задача. Одним из возможных способов синтеза является применение нейронных сетей. Нейронный регулятор либо обучают на предварительно рассчитанных данных, либо используют для настройки параметров ПИД-регулятора из начального устойчивого положения замкнутой системы. Предложено использовать нейронные сети для регулирования двухканального объекта, при этом обучение будет выполняться из неустойчивого (произвольного) начального положения с применением методов обучения нейронных сетей с подкреплением. Предложена структура нейронной сети и замкнутой системы, в которой уставка задается при помощи входного параметра нейронной сети регулятора The problem for synthesis of automatic control systems is hard, especially for multichannel objects. One of the approaches is the use of neural networks. For the approaches that are based on the use of reinforcement learning, there is an additional issue - supporting of range of values for the set points. The method of synthesis of automatic control systems using neural networks and the process of its learning with reinforcement learning that allows neural networks learning for supporting regulation is proposed in the predefined range of set points. The main steps of the method are 1) to form a neural net input as a state of the object and system set point; 2) to perform modelling of the system with a set of randomly generated set points from the desired range; 3) to perform a one-step of the learning using the Deterministic Policy Gradient method. The originality of the proposed method is that, in contrast to existing methods of using a neural network to synthesize a controller, the proposed method allows training a controller from an unstable initial state in a closed system and set of a range of set points. The method was applied to the problem of stabilizing the outputs of a two-channel object, for which stabilization both outputs and the first near the input set point is required

Download Full-text

Enhanced Delta-tolling: Traffic Optimization via Policy Gradient Reinforcement Learning

2018 21st International Conference on Intelligent Transportation Systems (ITSC) ◽

10.1109/itsc.2018.8569737 ◽

2018 ◽

Cited By ~ 3

Author(s):

Hamid Mirzaei ◽

Guni Sharon ◽

Stephen Boyles ◽

Tony Givargis ◽

Peter Stone

Keyword(s):

Reinforcement Learning ◽

Policy Gradient ◽

Traffic Optimization

Download Full-text

Deep reinforcement learning based control for Autonomous Vehicles in CARLA

Multimedia Tools and Applications ◽

10.1007/s11042-021-11437-3 ◽

2022 ◽

Author(s):

Óscar Pérez-Gil ◽

Rafael Barea ◽

Elena López-Guillén ◽

Luis M. Bergasa ◽

Carlos Gómez-Huélamo ◽

...

Keyword(s):

Reinforcement Learning ◽

Autonomous Vehicles ◽

Autonomous Vehicle ◽

Vehicle Control ◽

Data Sources ◽

Simulation Environment ◽

Urban Simulation ◽

Policy Gradient ◽

Almost All ◽

Control Layer

AbstractNowadays, Artificial Intelligence (AI) is growing by leaps and bounds in almost all fields of technology, and Autonomous Vehicles (AV) research is one more of them. This paper proposes the using of algorithms based on Deep Learning (DL) in the control layer of an autonomous vehicle. More specifically, Deep Reinforcement Learning (DRL) algorithms such as Deep Q-Network (DQN) and Deep Deterministic Policy Gradient (DDPG) are implemented in order to compare results between them. The aim of this work is to obtain a trained model, applying a DRL algorithm, able of sending control commands to the vehicle to navigate properly and efficiently following a determined route. In addition, for each of the algorithms, several agents are presented as a solution, so that each of these agents uses different data sources to achieve the vehicle control commands. For this purpose, an open-source simulator such as CARLA is used, providing to the system with the ability to perform a multitude of tests without any risk into an hyper-realistic urban simulation environment, something that is unthinkable in the real world. The results obtained show that both DQN and DDPG reach the goal, but DDPG obtains a better performance. DDPG perfoms trajectories very similar to classic controller as LQR. In both cases RMSE is lower than 0.1m following trajectories with a range 180-700m. To conclude, some conclusions and future works are commented.

Download Full-text