Safe Driving Of Autonomous Vehicles Through Improved Deep Reinforcement Learning

Mapping Intimacies ◽

10.32920/17313137.v1 ◽

2021 ◽

Author(s):

Abhishek Gupta

Keyword(s):

Reinforcement Learning ◽

Object Detection ◽

Autonomous Vehicles ◽

Autonomous Driving ◽

Safe Driving ◽

The Road ◽

Reward Function ◽

Continuous State ◽

Value Loss ◽

Policy Gradient

In this thesis, we propose an environment perception framework for autonomous driving using deep reinforcement learning (DRL) that exhibits learning in autonomous vehicles under complex interactions with the environment, without being explicitly trained on driving datasets. Unlike existing techniques, our proposed technique takes the learning loss into account under deterministic as well as stochastic policy gradient. We apply DRL to object detection and safe navigation while enhancing a self-driving vehicle’s ability to discern meaningful information from surrounding data. For efficient environmental perception and object detection, various Q-learning based methods have been proposed in the literature. Unlike other works, this thesis proposes a collaborative deterministic as well as stochastic policy gradient based on DRL. Our technique is a combination of variational autoencoder (VAE), deep deterministic policy gradient (DDPG), and soft actor-critic (SAC) that adequately trains a self-driving vehicle. In this work, we focus on uninterrupted and reasonably safe autonomous driving without colliding with an obstacle or steering off the track. We propose a collaborative framework that utilizes best features of VAE, DDPG, and SAC and models autonomous driving as partly stochastic and partly deterministic policy gradient problem in continuous action space, and continuous state space. To ensure that the vehicle traverses the road over a considerable period of time, we employ a reward-penalty based system where a higher negative penalty is associated with an unfavourable action and a comparatively lower positive reward is awarded for favourable actions. We also examine the variations in policy loss, value loss, reward function, and cumulative reward for ‘VAE+DDPG’ and ‘VAE+SAC’ over the learning process.

Download Full-text

Safe Driving Of Autonomous Vehicles Through Improved Deep Reinforcement Learning

10.32920/17313137 ◽

2021 ◽

Author(s):

Abhishek Gupta

Keyword(s):

Reinforcement Learning ◽

Object Detection ◽

Autonomous Vehicles ◽

Autonomous Driving ◽

Safe Driving ◽

The Road ◽

Reward Function ◽

Continuous State ◽

Value Loss ◽

Policy Gradient

Download Full-text

Policy-Gradient and Actor-Critic Based State Representation Learning for Safe Driving of Autonomous Vehicles

Sensors ◽

10.3390/s20215991 ◽

2020 ◽

Vol 20 (21) ◽

pp. 5991 ◽

Cited By ~ 1

Author(s):

Abhishek Gupta ◽

Ahmed Shaharyar Khwaja ◽

Alagan Anpalagan ◽

Ling Guan ◽

Bala Venkatesh

Keyword(s):

Autonomous Vehicles ◽

Representation Learning ◽

Autonomous Driving ◽

State Representation ◽

Learning Loss ◽

Safe Driving ◽

Reward Function ◽

Value Loss ◽

Policy Gradient ◽

Environment Perception

In this paper, we propose an environment perception framework for autonomous driving using state representation learning (SRL). Unlike existing Q-learning based methods for efficient environment perception and object detection, our proposed method takes the learning loss into account under deterministic as well as stochastic policy gradient. Through a combination of variational autoencoder (VAE), deep deterministic policy gradient (DDPG), and soft actor-critic (SAC), we focus on uninterrupted and reasonably safe autonomous driving without steering off the track for a considerable driving distance. Our proposed technique exhibits learning in autonomous vehicles under complex interactions with the environment, without being explicitly trained on driving datasets. To ensure the effectiveness of the scheme over a sustained period of time, we employ a reward-penalty based system where a negative reward is associated with an unfavourable action and a positive reward is awarded for favourable actions. The results obtained through simulations on DonKey simulator show the effectiveness of our proposed method by examining the variations in policy loss, value loss, reward function, and cumulative reward for ‘VAE+DDPG’ and ‘VAE+SAC’ over the learning process.

Download Full-text

Deep Deterministic Policy Gradient Algorithm Based on Convolutional Block Attention for Autonomous Driving

Symmetry ◽

10.3390/sym13061061 ◽

2021 ◽

Vol 13 (6) ◽

pp. 1061

Author(s):

Yanliang Jin ◽

Qianhong Liu ◽

Liquan Shen ◽

Leiji Zhu

Keyword(s):

Reinforcement Learning ◽

Supervised Learning ◽

State Of The Art ◽

Autonomous Driving ◽

Attention Mechanism ◽

Gradient Algorithm ◽

Excellent Performance ◽

Average Speed ◽

The Road ◽

Policy Gradient

The research on autonomous driving based on deep reinforcement learning algorithms is a research hotspot. Traditional autonomous driving requires human involvement, and the autonomous driving algorithms based on supervised learning must be trained in advance using human experience. To deal with autonomous driving problems, this paper proposes an improved end-to-end deep deterministic policy gradient (DDPG) algorithm based on the convolutional block attention mechanism, and it is called multi-input attention prioritized deep deterministic policy gradient algorithm (MAPDDPG). Both the actor network and the critic network of the model have the same structure with symmetry. Meanwhile, the attention mechanism is introduced to help the vehicles focus on useful environmental information. The experiments are conducted in the open racing car simulator (TORCS)and the results of five experiment runs on the test tracks are averaged to obtain the final result. Compared with the state-of-the-art algorithm, the maximum reward increases from 62,207 to 116,347, and the average speed increases from 135 km/h to 193 km/h, while the number of success episodes to complete a circle increases from 96 to 147. Also, the variance of the distance from the vehicle to the center of the road is compared, and the result indicates that the variance of the DDPG is 0.6 m while that of the MAPDDPG is only 0.2 m. The above results indicate that the proposed MAPDDPG achieves excellent performance.

Download Full-text

Autonomous Drifting Using Reinforcement Learning

Periodica Polytechnica Transportation Engineering ◽

10.3311/pptr.18581 ◽

2021 ◽

Author(s):

László Orgován ◽

Tamás Bécsi ◽

Szilárd Aradi

Keyword(s):

Reinforcement Learning ◽

Autonomous Vehicles ◽

Learning Algorithm ◽

Autonomous Vehicle ◽

Autonomous Driving ◽

The Road ◽

Model Free ◽

On The Road ◽

Self Driving Cars ◽

Planning Problems

Autonomous vehicles or self-driving cars are prevalent nowadays, many vehicle manufacturers, and other tech companies are trying to develop autonomous vehicles. One major goal of the self-driving algorithms is to perform manoeuvres safely, even when some anomaly arises. To solve these kinds of complex issues, Artificial Intelligence and Machine Learning methods are used. One of these motion planning problems is when the tires lose their grip on the road, an autonomous vehicle should handle this situation. Thus the paper provides an Autonomous Drifting algorithm using Reinforcement Learning. The algorithm is based on a model-free learning algorithm, Twin Delayed Deep Deterministic Policy Gradients (TD3). The model is trained on six different tracks in a simulator, which is developed specifically for autonomous driving systems; namely CARLA.

Download Full-text

A visual approach towards forward collision warning for autonomous vehicles on Malaysian public roads

F1000Research ◽

10.12688/f1000research.72897.1 ◽

2021 ◽

Vol 10 ◽

pp. 928

Author(s):

Man Kiat Wong ◽

Tee Connie ◽

Michael Kah Ong Goh ◽

Li Pei Wong ◽

Pin Shen Teh ◽

...

Keyword(s):

Autonomous Vehicles ◽

Autonomous Driving ◽

Safe Driving ◽

Collision Warning ◽

Smart Transportation ◽

The Road ◽

Safety Mechanism ◽

Visual Approach ◽

On The Road ◽

Forward Collision Warning

Background: Autonomous vehicles are important in smart transportation. Although exciting progress has been made, it remains challenging to design a safety mechanism for autonomous vehicles despite uncertainties and obstacles that occur dynamically on the road. Collision detection and avoidance are indispensable for a reliable decision-making module in autonomous driving. Methods: This study presents a robust approach for forward collision warning using vision data for autonomous vehicles on Malaysian public roads. The proposed architecture combines environment perception and lane localization to define a safe driving region for the ego vehicle. If potential risks are detected in the safe driving region, a warning will be triggered. The early warning is important to help avoid rear-end collision. Besides, an adaptive lane localization method that considers geometrical structure of the road is presented to deal with different road types. Results: Precision scores of mean average precision (mAP) 0.5, mAP 0.95 and recall of 0.14, 0.06979 and 0.6356 were found in this study. Conclusions: Experimental results have validated the effectiveness of the proposed approach under different lighting and environmental conditions.

Download Full-text

PORF-DDPG: Learning Personalized Autonomous Driving Behavior with Progressively Optimized Reward Function

Sensors ◽

10.3390/s20195626 ◽

2020 ◽

Vol 20 (19) ◽

pp. 5626

Author(s):

Jie Chen ◽

Tao Wu ◽

Meiping Shi ◽

Wei Jiang

Keyword(s):

Autonomous Vehicles ◽

Autonomous Driving ◽

Driving Behavior ◽

Human Observer ◽

Learning Capability ◽

The Road ◽

Human In The Loop ◽

Reward Function ◽

Artificial Intelligence Technology ◽

Reward Functions

Autonomous driving with artificial intelligence technology has been viewed as promising for autonomous vehicles hitting the road in the near future. In recent years, considerable progress has been made with Deep Reinforcement Learnings (DRLs) for realizing end-to-end autonomous driving. Still, driving safely and comfortably in real dynamic scenarios with DRL is nontrivial due to the reward functions being typically pre-defined with expertise. This paper proposes a human-in-the-loop DRL algorithm for learning personalized autonomous driving behavior in a progressive learning way. Specifically, a progressively optimized reward function (PORF) learning model is built and integrated into the Deep Deterministic Policy Gradient (DDPG) framework, which is called PORF-DDPG in this paper. PORF consists of two parts: the first part of the PORF is a pre-defined typical reward function on the system state, the second part is modeled as a Deep Neural Network (DNN) for representing driving adjusting intention by the human observer, which is the main contribution of this paper. The DNN-based reward model is progressively learned using the front-view images as the input and via active human supervision and intervention. The proposed approach is potentially useful for driving in dynamic constrained scenarios when dangerous collision events might occur frequently with classic DRLs. The experimental results show that the proposed autonomous driving behavior learning method exhibits online learning capability and environmental adaptability.

Download Full-text

Optimizing hyperparameters of deep reinforcement learning for autonomous driving based on whale optimization algorithm

PLoS ONE ◽

10.1371/journal.pone.0252754 ◽

2021 ◽

Vol 16 (6) ◽

pp. e0252754

Author(s):

Nesma M. Ashraf ◽

Reham R. Mostafa ◽

Rasha H. Sakr ◽

M. Z. Rashad

Keyword(s):

Reinforcement Learning ◽

Optimization Algorithm ◽

Autonomous Driving ◽

Whale Optimization Algorithm ◽

Complex Environments ◽

Reward Function ◽

Processing Times ◽

Whale Optimization ◽

Total Rewards ◽

Policy Gradient

Deep Reinforcement Learning (DRL) enables agents to make decisions based on a well-designed reward function that suites a particular environment without any prior knowledge related to a given environment. The adaptation of hyperparameters has a great impact on the overall learning process and the learning processing times. Hyperparameters should be accurately estimated while training DRL algorithms, which is one of the key challenges that we attempt to address. This paper employs a swarm-based optimization algorithm, namely the Whale Optimization Algorithm (WOA), for optimizing the hyperparameters of the Deep Deterministic Policy Gradient (DDPG) algorithm to achieve the optimum control strategy in an autonomous driving control problem. DDPG is capable of handling complex environments, which contain continuous spaces for actions. To evaluate the proposed algorithm, the Open Racing Car Simulator (TORCS), a realistic autonomous driving simulation environment, was chosen to its ease of design and implementation. Using TORCS, the DDPG agent with optimized hyperparameters was compared with a DDPG agent with reference hyperparameters. The experimental results showed that the DDPG’s hyperparameters optimization leads to maximizing the total rewards, along with testing episodes and maintaining a stable driving policy.

Download Full-text

Automatic Roadway Features Detection with Oriented Object Detection

Applied Sciences ◽

10.3390/app11083531 ◽

2021 ◽

Vol 11 (8) ◽

pp. 3531

Author(s):

Hesham M. Eraqi ◽

Karim Soliman ◽

Dalia Said ◽

Omar R. Elezaby ◽

Mohamed N. Moustafa ◽

...

Keyword(s):

Object Detection ◽

Safety Evaluation ◽

Autonomous Driving ◽

Detection Accuracy ◽

The Road ◽

Detection Model ◽

Detection Approach ◽

Roadway Safety ◽

Safety Features ◽

Oriented Object

Extensive research efforts have been devoted to identify and improve roadway features that impact safety. Maintaining roadway safety features relies on costly manual operations of regular road surveying and data analysis. This paper introduces an automatic roadway safety features detection approach, which harnesses the potential of artificial intelligence (AI) computer vision to make the process more efficient and less costly. Given a front-facing camera and a global positioning system (GPS) sensor, the proposed system automatically evaluates ten roadway safety features. The system is composed of an oriented (or rotated) object detection model, which solves an orientation encoding discontinuity problem to improve detection accuracy, and a rule-based roadway safety evaluation module. To train and validate the proposed model, a fully-annotated dataset for roadway safety features extraction was collected covering 473 km of roads. The proposed method baseline results are found encouraging when compared to the state-of-the-art models. Different oriented object detection strategies are presented and discussed, and the developed model resulted in improving the mean average precision (mAP) by 16.9% when compared with the literature. The roadway safety feature average prediction accuracy is 84.39% and ranges between 91.11% and 63.12%. The introduced model can pervasively enable/disable autonomous driving (AD) based on safety features of the road; and empower connected vehicles (CV) to send and receive estimated safety features, alerting drivers about black spots or relatively less-safe segments or roads.

Download Full-text

A Set of Single YOLO Modalities to Detect Occluded Entities via Viewpoint Conversion

Applied Sciences ◽

10.3390/app11136016 ◽

2021 ◽

Vol 11 (13) ◽

pp. 6016

Author(s):

Jinsoo Kim ◽

Jeongho Cho

Keyword(s):

Object Detection ◽

Autonomous Vehicles ◽

Autonomous Driving ◽

Detection Algorithm ◽

Detection Accuracy ◽

Cloud Data ◽

Detection Techniques ◽

Bounding Boxes ◽

Partially Occluded ◽

Rgb Image

For autonomous vehicles, it is critical to be aware of the driving environment to avoid collisions and drive safely. The recent evolution of convolutional neural networks has contributed significantly to accelerating the development of object detection techniques that enable autonomous vehicles to handle rapid changes in various driving environments. However, collisions in an autonomous driving environment can still occur due to undetected obstacles and various perception problems, particularly occlusion. Thus, we propose a robust object detection algorithm for environments in which objects are truncated or occluded by employing RGB image and light detection and ranging (LiDAR) bird’s eye view (BEV) representations. This structure combines independent detection results obtained in parallel through “you only look once” networks using an RGB image and a height map converted from the BEV representations of LiDAR’s point cloud data (PCD). The region proposal of an object is determined via non-maximum suppression, which suppresses the bounding boxes of adjacent regions. A performance evaluation of the proposed scheme was performed using the KITTI vision benchmark suite dataset. The results demonstrate the detection accuracy in the case of integration of PCD BEV representations is superior to when only an RGB camera is used. In addition, robustness is improved by significantly enhancing detection accuracy even when the target objects are partially occluded when viewed from the front, which demonstrates that the proposed algorithm outperforms the conventional RGB-based model.

Download Full-text

Preceding vehicle following algorithm with human driving characteristics

Proceedings of the Institution of Mechanical Engineers Part D Journal of Automobile Engineering ◽

10.1177/0954407020981546 ◽

2021 ◽

pp. 095440702098154

Author(s):

Feng Pan ◽

Hong Bao

Keyword(s):

Reinforcement Learning ◽

Weight Vector ◽

Gradient Algorithm ◽

Inner Product ◽

Inverse Reinforcement Learning ◽

Reward Function ◽

Human Driver ◽

Policy Gradient ◽

Preceding Vehicle ◽

Action Spaces

This paper proposes a new approach of using reinforcement learning (RL) to train an agent to perform the task of vehicle following with human driving characteristics. We refer to the ideal of inverse reinforcement learning to design the reward function of the RL model. The factors that need to be weighed in vehicle following were vectorized into reward vectors, and the reward function was defined as the inner product of the reward vector and weights. Driving data of human drivers was collected and analyzed to obtain the true reward function. The RL model was trained with the deterministic policy gradient algorithm because the state and action spaces are continuous. We adjusted the weight vector of the reward function so that the value vector of the RL model could continuously approach that of a human driver. After dozens of rounds of training, we selected the policy with the nearest value vector to that of a human driver and tested it in the PanoSim simulation environment. The results showed the desired performance for the task of an agent following the preceding vehicle safely and smoothly.

Download Full-text