Inferring Cost Functions Using Reward Parameter Search and Policy Gradient Reinforcement Learning

This paper proposes a new approach of using reinforcement learning (RL) to train an agent to perform the task of vehicle following with human driving characteristics. We refer to the ideal of inverse reinforcement learning to design the reward function of the RL model. The factors that need to be weighed in vehicle following were vectorized into reward vectors, and the reward function was defined as the inner product of the reward vector and weights. Driving data of human drivers was collected and analyzed to obtain the true reward function. The RL model was trained with the deterministic policy gradient algorithm because the state and action spaces are continuous. We adjusted the weight vector of the reward function so that the value vector of the RL model could continuously approach that of a human driver. After dozens of rounds of training, we selected the policy with the nearest value vector to that of a human driver and tested it in the PanoSim simulation environment. The results showed the desired performance for the task of an agent following the preceding vehicle safely and smoothly.

Download Full-text

The synthesis method of regulators for multichannel systems using neural networks

Вычислительные технологии ◽

10.25743/ict.2020.25.3.012 ◽

2020 ◽

pp. 111-118

Author(s):

Александр Александрович Воевода ◽

Дмитрий Олегович Романников

Keyword(s):

Neural Networks ◽

Reinforcement Learning ◽

Automatic Control ◽

Control Systems ◽

Synthesis Method ◽

Neural Net ◽

Initial State ◽

Set Point ◽

Automatic Control Systems ◽

Policy Gradient

Синтез регуляторов для многоканальных систем - актуальная и сложная задача. Одним из возможных способов синтеза является применение нейронных сетей. Нейронный регулятор либо обучают на предварительно рассчитанных данных, либо используют для настройки параметров ПИД-регулятора из начального устойчивого положения замкнутой системы. Предложено использовать нейронные сети для регулирования двухканального объекта, при этом обучение будет выполняться из неустойчивого (произвольного) начального положения с применением методов обучения нейронных сетей с подкреплением. Предложена структура нейронной сети и замкнутой системы, в которой уставка задается при помощи входного параметра нейронной сети регулятора The problem for synthesis of automatic control systems is hard, especially for multichannel objects. One of the approaches is the use of neural networks. For the approaches that are based on the use of reinforcement learning, there is an additional issue - supporting of range of values for the set points. The method of synthesis of automatic control systems using neural networks and the process of its learning with reinforcement learning that allows neural networks learning for supporting regulation is proposed in the predefined range of set points. The main steps of the method are 1) to form a neural net input as a state of the object and system set point; 2) to perform modelling of the system with a set of randomly generated set points from the desired range; 3) to perform a one-step of the learning using the Deterministic Policy Gradient method. The originality of the proposed method is that, in contrast to existing methods of using a neural network to synthesize a controller, the proposed method allows training a controller from an unstable initial state in a closed system and set of a range of set points. The method was applied to the problem of stabilizing the outputs of a two-channel object, for which stabilization both outputs and the first near the input set point is required

Download Full-text

Enhanced Delta-tolling: Traffic Optimization via Policy Gradient Reinforcement Learning

2018 21st International Conference on Intelligent Transportation Systems (ITSC) ◽

10.1109/itsc.2018.8569737 ◽

2018 ◽

Cited By ~ 3

Author(s):

Hamid Mirzaei ◽

Guni Sharon ◽

Stephen Boyles ◽

Tony Givargis ◽

Peter Stone

Keyword(s):

Reinforcement Learning ◽

Policy Gradient ◽

Traffic Optimization

Download Full-text

Self-tuning Gains of a Quadrotor using a Simple Model for Policy Gradient Reinforcement Learning

AIAA Guidance, Navigation, and Control Conference ◽

10.2514/6.2016-1387 ◽

2016 ◽

Cited By ~ 5

Author(s):

Jaime Junell ◽

Tommaso Mannucci ◽

Ye Zhou ◽

Erik-Jan Van Kampen

Keyword(s):

Reinforcement Learning ◽

Simple Model ◽

Policy Gradient ◽

Self Tuning

Download Full-text

Deep reinforcement learning based control for Autonomous Vehicles in CARLA

Multimedia Tools and Applications ◽

10.1007/s11042-021-11437-3 ◽

2022 ◽

Author(s):

Óscar Pérez-Gil ◽

Rafael Barea ◽

Elena López-Guillén ◽

Luis M. Bergasa ◽

Carlos Gómez-Huélamo ◽

...

Keyword(s):

Reinforcement Learning ◽

Autonomous Vehicles ◽

Autonomous Vehicle ◽

Vehicle Control ◽

Data Sources ◽

Simulation Environment ◽

Urban Simulation ◽

Policy Gradient ◽

Almost All ◽

Control Layer

AbstractNowadays, Artificial Intelligence (AI) is growing by leaps and bounds in almost all fields of technology, and Autonomous Vehicles (AV) research is one more of them. This paper proposes the using of algorithms based on Deep Learning (DL) in the control layer of an autonomous vehicle. More specifically, Deep Reinforcement Learning (DRL) algorithms such as Deep Q-Network (DQN) and Deep Deterministic Policy Gradient (DDPG) are implemented in order to compare results between them. The aim of this work is to obtain a trained model, applying a DRL algorithm, able of sending control commands to the vehicle to navigate properly and efficiently following a determined route. In addition, for each of the algorithms, several agents are presented as a solution, so that each of these agents uses different data sources to achieve the vehicle control commands. For this purpose, an open-source simulator such as CARLA is used, providing to the system with the ability to perform a multitude of tests without any risk into an hyper-realistic urban simulation environment, something that is unthinkable in the real world. The results obtained show that both DQN and DDPG reach the goal, but DDPG obtains a better performance. DDPG perfoms trajectories very similar to classic controller as LQR. In both cases RMSE is lower than 0.1m following trajectories with a range 180-700m. To conclude, some conclusions and future works are commented.

Download Full-text