Autonomous Driving in Roundabout Maneuvers Using Reinforcement Learning with Q-Learning

Laura García Cuenca; Enrique Puertas; Javier Fernandez Andrés; Nourdine Aliane

doi:10.3390/electronics8121536

Autonomous Driving in Roundabout Maneuvers Using Reinforcement Learning with Q-Learning

Electronics ◽

10.3390/electronics8121536 ◽

2019 ◽

Vol 8 (12) ◽

pp. 1536 ◽

Cited By ~ 4

Author(s):

Laura García Cuenca ◽

Enrique Puertas ◽

Javier Fernandez Andrés ◽

Nourdine Aliane

Keyword(s):

Reinforcement Learning ◽

Autonomous Vehicles ◽

Learning Algorithm ◽

Autonomous Vehicle ◽

Autonomous Driving ◽

Simulation Environment ◽

Q Learning ◽

Driving Scenario

Navigating roundabouts is a complex driving scenario for both manual and autonomous vehicles. This paper proposes an approach based on the use of the Q-learning algorithm to train an autonomous vehicle agent to learn how to appropriately navigate roundabouts. The proposed learning algorithm is implemented using the CARLA simulation environment. Several simulations are performed to train the algorithm in two scenarios: navigating a roundabout with and without surrounding traffic. The results illustrate that the Q-learning-algorithm-based vehicle agent is able to learn smooth and efficient driving to perform maneuvers within roundabouts.

Download Full-text

Autonomous Drifting Using Reinforcement Learning

Periodica Polytechnica Transportation Engineering ◽

10.3311/pptr.18581 ◽

2021 ◽

Author(s):

László Orgován ◽

Tamás Bécsi ◽

Szilárd Aradi

Keyword(s):

Reinforcement Learning ◽

Autonomous Vehicles ◽

Learning Algorithm ◽

Autonomous Vehicle ◽

Autonomous Driving ◽

The Road ◽

Model Free ◽

On The Road ◽

Self Driving Cars ◽

Planning Problems

Autonomous vehicles or self-driving cars are prevalent nowadays, many vehicle manufacturers, and other tech companies are trying to develop autonomous vehicles. One major goal of the self-driving algorithms is to perform manoeuvres safely, even when some anomaly arises. To solve these kinds of complex issues, Artificial Intelligence and Machine Learning methods are used. One of these motion planning problems is when the tires lose their grip on the road, an autonomous vehicle should handle this situation. Thus the paper provides an Autonomous Drifting algorithm using Reinforcement Learning. The algorithm is based on a model-free learning algorithm, Twin Delayed Deep Deterministic Policy Gradients (TD3). The model is trained on six different tracks in a simulator, which is developed specifically for autonomous driving systems; namely CARLA.

Download Full-text

Deep reinforcement learning based control for Autonomous Vehicles in CARLA

Multimedia Tools and Applications ◽

10.1007/s11042-021-11437-3 ◽

2022 ◽

Author(s):

Óscar Pérez-Gil ◽

Rafael Barea ◽

Elena López-Guillén ◽

Luis M. Bergasa ◽

Carlos Gómez-Huélamo ◽

...

Keyword(s):

Reinforcement Learning ◽

Autonomous Vehicles ◽

Autonomous Vehicle ◽

Vehicle Control ◽

Data Sources ◽

Simulation Environment ◽

Urban Simulation ◽

Policy Gradient ◽

Almost All ◽

Control Layer

AbstractNowadays, Artificial Intelligence (AI) is growing by leaps and bounds in almost all fields of technology, and Autonomous Vehicles (AV) research is one more of them. This paper proposes the using of algorithms based on Deep Learning (DL) in the control layer of an autonomous vehicle. More specifically, Deep Reinforcement Learning (DRL) algorithms such as Deep Q-Network (DQN) and Deep Deterministic Policy Gradient (DDPG) are implemented in order to compare results between them. The aim of this work is to obtain a trained model, applying a DRL algorithm, able of sending control commands to the vehicle to navigate properly and efficiently following a determined route. In addition, for each of the algorithms, several agents are presented as a solution, so that each of these agents uses different data sources to achieve the vehicle control commands. For this purpose, an open-source simulator such as CARLA is used, providing to the system with the ability to perform a multitude of tests without any risk into an hyper-realistic urban simulation environment, something that is unthinkable in the real world. The results obtained show that both DQN and DDPG reach the goal, but DDPG obtains a better performance. DDPG perfoms trajectories very similar to classic controller as LQR. In both cases RMSE is lower than 0.1m following trajectories with a range 180-700m. To conclude, some conclusions and future works are commented.

Download Full-text

Safety-aware Adversarial Inverse Reinforcement Learning (S-AIRL) for Highway Autonomous Driving

Journal of Autonomous Vehicles and Systems ◽

10.1115/1.4053427 ◽

2022 ◽

pp. 1-14

Author(s):

Fangjian Li ◽

John R Wagner ◽

Yue Wang

Keyword(s):

Reinforcement Learning ◽

Learning Algorithm ◽

Risky Behaviors ◽

Autonomous Driving ◽

Inverse Reinforcement Learning ◽

Safety Issues ◽

Reward Function ◽

Sampling Process ◽

Safety Awareness ◽

Driving Scenario

Abstract Inverse reinforcement learning (IRL) has been successfully applied in many robotics and autonomous driving studies without the need for hand-tuning a reward function. However, it suffers from safety issues. Compared to the reinforcement learning (RL) algorithms, IRL is even more vulnerable to unsafe situations as it can only infer the importance of safety based on expert demonstrations. In this paper, we propose a safety-aware adversarial inverse reinforcement learning algorithm (S-AIRL). First, the control barrier function (CBF) is used to guide the training of a safety critic, which leverages the knowledge of system dynamics in the sampling process without training an additional guiding policy. The trained safety critic is then integrated into the discriminator to help discern the generated data and expert demonstrations from the standpoint of safety. Finally, to further improve the safety awareness, a regulator is introduced in the loss function of the discriminator training to prevent the recovered reward function from assigning high rewards to the risky behaviors. We tested our S-AIRL in the highway autonomous driving scenario. Comparing to the original AIRL algorithm, with the same level of imitation learning (IL) performance, the proposed S-AIRL can reduce the collision rate by 32.6%.

Download Full-text

Autonomous Vehicle Fuel Economy Optimization with Deep Reinforcement Learning

Electronics ◽

10.3390/electronics9111911 ◽

2020 ◽

Vol 9 (11) ◽

pp. 1911

Author(s):

Hyunkun Kim ◽

Hyeongoo Pyeon ◽

Jong Sool Park ◽

Jin Young Hwang ◽

Sejoon Lim

Keyword(s):

Neural Network ◽

Neural Networks ◽

Reinforcement Learning ◽

Velocity Profile ◽

Fuel Economy ◽

Autonomous Vehicles ◽

Learning Algorithm ◽

Autonomous Vehicle ◽

Simulation Program ◽

On The Road

The ever-increasing number of vehicles on the road puts pressure on car manufacturers to make their car fuel-efficient. With autonomous vehicles, we can find new strategies to optimize fuels. We propose a reinforcement learning algorithm that trains deep neural networks to generate a fuel-efficient velocity profile for autonomous vehicles given road altitude information for the planned trip. Using a highly accurate industry-accepted fuel economy simulation program, we train our deep neural network model. We developed a technique for adapting the heterogeneous simulation program on top of an open-source deep learning framework, and reduced dimension of the problem output with suitable parameterization to train the neural network much faster. The learned model combined with reinforcement learning-based strategy generation effectively generated the velocity profile so that autonomous vehicles can follow to control itself in a fuel efficient way. We evaluate our algorithm’s performance using the fuel economy simulation program for various altitude profiles. We also demonstrate that our method can teach neural networks to generate useful strategies to increase fuel economy even on unseen roads. Our method improved fuel economy by 8% compared to a simple grid search approach.

Download Full-text

A METHOD INTEGRATING SIMULATION AND REINFORCEMENT LEARNING FOR OPERATION SCHEDULING IN CONTAINER TERMINALS

Transport ◽

10.3846/16484142.2011.638022 ◽

2012 ◽

Vol 26 (4) ◽

pp. 383-393 ◽

Cited By ~ 2

Author(s):

Qingcheng Zeng ◽

Zhongzhen Yang ◽

Xiangpei Hu

Keyword(s):

Reinforcement Learning ◽

Numerical Experiments ◽

Learning Algorithm ◽

Optimal Scheduling ◽

Container Terminals ◽

Scheduling Problem ◽

Simulation Environment ◽

Q Learning ◽

Scheduling Scheme ◽

Operation Scheduling

The objective of operation scheduling in container terminals is to determine a schedule that minimizes time for loading or unloading a given set of containers. This paper presents a method integrating reinforcement learning and simulation to optimize operation scheduling in container terminals. The introduced method uses a simulation model to construct the system environment while the Q-learning algorithm (reinforcement learning algorithm) is applied to learn optimal dispatching rules for different equipment (e.g. yard cranes, yard trailers). The optimal scheduling scheme is obtained by the interaction of the Q-learning algorithm and simulation environment. To evaluate the effectiveness of the proposed method, a lower bound is calculated considering the characteristics of the scheduling problem in container terminals. Finally, numerical experiments are provided to illustrate the validity of the proposed method.

Download Full-text

Aircraft Maintenance Check Scheduling Using Reinforcement Learning

Aerospace ◽

10.3390/aerospace8040113 ◽

2021 ◽

Vol 8 (4) ◽

pp. 113

Author(s):

Pedro Andrade ◽

Catarina Silva ◽

Bernardete Ribeiro ◽

Bruno F. Santos

Keyword(s):

Reinforcement Learning ◽

Time Horizon ◽

Learning Algorithm ◽

Initial Conditions ◽

Q Learning ◽

Scheduling Policy ◽

Real Scenario ◽

Maintenance Plan ◽

Small Disturbances

This paper presents a Reinforcement Learning (RL) approach to optimize the long-term scheduling of maintenance for an aircraft fleet. The problem considers fleet status, maintenance capacity, and other maintenance constraints to schedule hangar checks for a specified time horizon. The checks are scheduled within an interval, and the goal is to, schedule them as close as possible to their due date. In doing so, the number of checks is reduced, and the fleet availability increases. A Deep Q-learning algorithm is used to optimize the scheduling policy. The model is validated in a real scenario using maintenance data from 45 aircraft. The maintenance plan that is generated with our approach is compared with a previous study, which presented a Dynamic Programming (DP) based approach and airline estimations for the same period. The results show a reduction in the number of checks scheduled, which indicates the potential of RL in solving this problem. The adaptability of RL is also tested by introducing small disturbances in the initial conditions. After training the model with these simulated scenarios, the results show the robustness of the RL approach and its ability to generate efficient maintenance plans in only a few seconds.

Download Full-text

An Efficiency Enhancing Methodology for Multiple Autonomous Vehicles in an Urban Network Adopting Deep Reinforcement Learning

Applied Sciences ◽

10.3390/app11041514 ◽

2021 ◽

Vol 11 (4) ◽

pp. 1514 ◽

Cited By ~ 2

Author(s):

Quang-Duy Tran ◽

Sang-Hoon Bae

Keyword(s):

Reinforcement Learning ◽

Traffic Congestion ◽

Autonomous Vehicles ◽

Penetration Rate ◽

Autonomous Vehicle ◽

Effective Means ◽

Urban Network ◽

Learning Agents ◽

Policy Optimization ◽

The Impact

To reduce the impact of congestion, it is necessary to improve our overall understanding of the influence of the autonomous vehicle. Recently, deep reinforcement learning has become an effective means of solving complex control tasks. Accordingly, we show an advanced deep reinforcement learning that investigates how the leading autonomous vehicles affect the urban network under a mixed-traffic environment. We also suggest a set of hyperparameters for achieving better performance. Firstly, we feed a set of hyperparameters into our deep reinforcement learning agents. Secondly, we investigate the leading autonomous vehicle experiment in the urban network with different autonomous vehicle penetration rates. Thirdly, the advantage of leading autonomous vehicles is evaluated using entire manual vehicle and leading manual vehicle experiments. Finally, the proximal policy optimization with a clipped objective is compared to the proximal policy optimization with an adaptive Kullback–Leibler penalty to verify the superiority of the proposed hyperparameter. We demonstrate that full automation traffic increased the average speed 1.27 times greater compared with the entire manual vehicle experiment. Our proposed method becomes significantly more effective at a higher autonomous vehicle penetration rate. Furthermore, the leading autonomous vehicles could help to mitigate traffic congestion.

Download Full-text

Improved Q-Learning Algorithm Based on Approximate State Matching in Agricultural Plant Protection Environment

Entropy ◽

10.3390/e23060737 ◽

2021 ◽

Vol 23 (6) ◽

pp. 737

Author(s):

Fengjie Sun ◽

Xianchang Wang ◽

Rui Zhang

Keyword(s):

Decision Making ◽

Reinforcement Learning ◽

Optimal Policy ◽

Feasible Solution ◽

Learning Algorithm ◽

Plant Protection ◽

Agricultural Plant ◽

Q Learning ◽

Aerial Vehicle ◽

Optimal Action

An Unmanned Aerial Vehicle (UAV) can greatly reduce manpower in the agricultural plant protection such as watering, sowing, and pesticide spraying. It is essential to develop a Decision-making Support System (DSS) for UAVs to help them choose the correct action in states according to the policy. In an unknown environment, the method of formulating rules for UAVs to help them choose actions is not applicable, and it is a feasible solution to obtain the optimal policy through reinforcement learning. However, experiments show that the existing reinforcement learning algorithms cannot get the optimal policy for a UAV in the agricultural plant protection environment. In this work we propose an improved Q-learning algorithm based on similar state matching, and we prove theoretically that there has a greater probability for UAV choosing the optimal action according to the policy learned by the algorithm we proposed than the classic Q-learning algorithm in the agricultural plant protection environment. This proposed algorithm is implemented and tested on datasets that are evenly distributed based on real UAV parameters and real farm information. The performance evaluation of the algorithm is discussed in detail. Experimental results show that the algorithm we proposed can efficiently learn the optimal policy for UAVs in the agricultural plant protection environment.

Download Full-text

Barrier Lyapunov Function-Based Safe Reinforcement Learning Algorithm for Autonomous Vehicles with System Uncertainty

10.23919/iccas52745.2021.9649902 ◽

2021 ◽

Author(s):

Yuxiang Zhang ◽

Xiaoling Liang ◽

Shuzhi Sam Ge ◽

Bingzhao Gao ◽

Tong Heng Lee

Keyword(s):

Reinforcement Learning ◽

Lyapunov Function ◽

Autonomous Vehicles ◽

Learning Algorithm ◽

System Uncertainty ◽

Barrier Lyapunov Function ◽

Reinforcement Learning Algorithm

Download Full-text

Decision making at unsignalized inner city intersections using discrete events systems

tm - Technisches Messen ◽

10.1515/teme-2021-0140 ◽

2022 ◽

Vol 0 (0) ◽

Author(s):

Hannes Weinreuter ◽

Balázs Szigeti ◽

Nadine-Rebecca Strelau ◽

Barbara Deml ◽

Michael Heizmann

Keyword(s):

Inner City ◽

Road Safety ◽

Autonomous Vehicles ◽

Discrete Event ◽

Autonomous Driving ◽

Simulation Environment ◽

Discrete Events ◽

Real World Data ◽

Event System ◽

Discrete Events Systems

Abstract Autonomous driving is a promising technology to, among many aspects, improve road safety. There are however several scenarios that are challenging for autonomous vehicles. One of these are unsignalized junctions. There exist scenarios in which there is no clear regulation as to is allowed to drive first. Instead, communication and cooperation are necessary to solve such scenarios. This is especially challenging when interacting with human drivers. In this work we focus on unsignalized T-intersections. For that scenario we propose a discrete event system (DES) that is able to solve the cooperation with human drivers at a T-intersection with limited visibility and no direct communication. The algorithm is validated in a simulation environment, and the parameters for the algorithm are based on an analysis of typical human behavior at intersections using real-world data.

Download Full-text