Deep Reinforcement Learning Based Left-Turn Connected and Automated Vehicle Control at Signalized Intersection in Vehicle-to-Infrastructure Environment

Juan Chen; Zhengxuan Xue; Daiqian Fan

doi:10.3390/info11020077

Deep Reinforcement Learning Based Left-Turn Connected and Automated Vehicle Control at Signalized Intersection in Vehicle-to-Infrastructure Environment

Information ◽

10.3390/info11020077 ◽

2020 ◽

Vol 11 (2) ◽

pp. 77 ◽

Cited By ~ 1

Author(s):

Juan Chen ◽

Zhengxuan Xue ◽

Daiqian Fan

Keyword(s):

Reinforcement Learning ◽

Control Method ◽

Signalized Intersection ◽

Signal Control ◽

Left Turn ◽

Automated Vehicle ◽

Whole Process ◽

Policy Gradient ◽

Experience Replay ◽

Automated Vehicle Control

In order to solve the problem of vehicle delay caused by stops at signalized intersections, a micro-control method of a left-turning connected and automated vehicle (CAV) based on an improved deep deterministic policy gradient (DDPG) is designed in this paper. In this paper, the micro-control of the whole process of a left-turn vehicle approaching, entering, and leaving a signalized intersection is considered. In addition, in order to solve the problems of low sampling efficiency and overestimation of the critic network of the DDPG algorithm, a positive and negative reward experience replay buffer sampling mechanism and multi-critic network structure are adopted in the DDPG algorithm in this paper. Finally, the effectiveness of the signal control method, six DDPG-based methods (DDPG, PNRERB-1C-DDPG, PNRERB-3C-DDPG, PNRERB-5C-DDPG, PNRERB-5CNG-DDPG, and PNRERB-7C-DDPG), and four DQN-based methods (DQN, Dueling DQN, Double DQN, and Prioritized Replay DQN) are verified under 0.2, 0.5, and 0.7 saturation degrees of left-turning vehicles at a signalized intersection within a VISSIM simulation environment. The results show that the proposed deep reinforcement learning method can get a number of stops benefits ranging from 5% to 94%, stop time benefits ranging from 1% to 99%, and delay benefits ranging from −17% to 93%, respectively compared with the traditional signal control method.

Download Full-text

Research on Signal Control Method of Single Intersection Based on Reinforcement Learning

CICTP 2020 ◽

10.1061/9780784483053.015 ◽

2020 ◽

Author(s):

Yilong Ren ◽

Le Zhang ◽

Han Jiang ◽

Chengsheng Liu

Keyword(s):

Reinforcement Learning ◽

Control Method ◽

Signal Control

Download Full-text

An Online Evolving Framework for Advancing Reinforcement-Learning based Automated Vehicle Control

IFAC-PapersOnLine ◽

10.1016/j.ifacol.2020.12.2283 ◽

2020 ◽

Vol 53 (2) ◽

pp. 8118-8123

Author(s):

Teawon Han ◽

Subramanya Nageshrao ◽

Dimitar P. Filev ◽

Ümit Özgüner

Keyword(s):

Reinforcement Learning ◽

Vehicle Control ◽

Automated Vehicle ◽

Automated Vehicle Control

Download Full-text

An Edge Based Multi-Agent Auto Communication Method for Traffic Light Control

Sensors ◽

10.3390/s20154291 ◽

2020 ◽

Vol 20 (15) ◽

pp. 4291 ◽

Cited By ~ 3

Author(s):

Qiang Wu ◽

Jianqing Wu ◽

Jun Shen ◽

Binbin Yong ◽

Qingguo Zhou

Keyword(s):

Reinforcement Learning ◽

Control Method ◽

Communication Protocol ◽

Traffic Signal ◽

Signal Control ◽

Traffic Signal Control ◽

Light Control ◽

Traffic Light ◽

Traffic Light Control ◽

Multi Agent

With smart city infrastructures growing, the Internet of Things (IoT) has been widely used in the intelligent transportation systems (ITS). The traditional adaptive traffic signal control method based on reinforcement learning (RL) has expanded from one intersection to multiple intersections. In this paper, we propose a multi-agent auto communication (MAAC) algorithm, which is an innovative adaptive global traffic light control method based on multi-agent reinforcement learning (MARL) and an auto communication protocol in edge computing architecture. The MAAC algorithm combines multi-agent auto communication protocol with MARL, allowing an agent to communicate the learned strategies with others for achieving global optimization in traffic signal control. In addition, we present a practicable edge computing architecture for industrial deployment on IoT, considering the limitations of the capabilities of network transmission bandwidth. We demonstrate that our algorithm outperforms other methods over 17% in experiments in a real traffic simulation environment.

Download Full-text

A Novel Left-Turn Signal Control Method for Improving Intersection Capacity in a Connected Vehicle Environment

Electronics ◽

10.3390/electronics8091058 ◽

2019 ◽

Vol 8 (9) ◽

pp. 1058 ◽

Cited By ~ 1

Author(s):

Chuanxiang Ren ◽

Jinbo Wang ◽

Lingqiao Qin ◽

Shen Li ◽

Yang Cheng

Keyword(s):

Traffic Safety ◽

Control Strategy ◽

Control Method ◽

Control Model ◽

Signal Control ◽

Signal Phase ◽

Phase Duration ◽

Left Turn ◽

Intersection Signal Control ◽

And Control

Setting up an exclusive left-turn lane and corresponding signal phase for intersection traffic safety and efficiency will decrease the capacity of the intersection when there are less or no left-turn movements. This is especially true during rush hours because of the ineffective use of left-turn lane space and signal phase duration. With the advantages of vehicle-to-infrastructure (V2I) communication, a novel intersection signal control model is proposed which sets up variable lane direction arrow marking and turns the left-turn lane into a controllable shared lane for left-turn and through movements. The new intersection signal control model and its control strategy are presented and simulated using field data. After comparison with two other intersection control models and control strategies, the new model is validated to improve the intersection capacity in rush hours. Besides, variable lane lines and the corresponding control method are designed and combined with the left-turn waiting area to overcome the shortcomings of the proposed intersection signal control model and control strategy.

Download Full-text

UAV Maneuvering Target Tracking in Uncertain Environments Based on Deep Reinforcement Learning and Meta-Learning

Remote Sensing ◽

10.3390/rs12223789 ◽

2020 ◽

Vol 12 (22) ◽

pp. 3789

Author(s):

Bo Li ◽

Zhigang Gan ◽

Daqing Chen ◽

Dyachenko Sergey Aleksandrovich

Keyword(s):

Reinforcement Learning ◽

Target Tracking ◽

Uncertain Environments ◽

Target Movement ◽

Maneuvering Target Tracking ◽

Novel Approach ◽

Policy Gradient ◽

Meta Learning ◽

Experience Replay ◽

Task Experience

This paper combines deep reinforcement learning (DRL) with meta-learning and proposes a novel approach, named meta twin delayed deep deterministic policy gradient (Meta-TD3), to realize the control of unmanned aerial vehicle (UAV), allowing a UAV to quickly track a target in an environment where the motion of a target is uncertain. This approach can be applied to a variety of scenarios, such as wildlife protection, emergency aid, and remote sensing. We consider a multi-task experience replay buffer to provide data for the multi-task learning of the DRL algorithm, and we combine meta-learning to develop a multi-task reinforcement learning update method to ensure the generalization capability of reinforcement learning. Compared with the state-of-the-art algorithms, namely the deep deterministic policy gradient (DDPG) and twin delayed deep deterministic policy gradient (TD3), experimental results show that the Meta-TD3 algorithm has achieved a great improvement in terms of both convergence value and convergence rate. In a UAV target tracking problem, Meta-TD3 only requires a few steps to train to enable a UAV to adapt quickly to a new target movement mode more and maintain a better tracking effectiveness.

Download Full-text

Reinforcement learning‐based bird‐view automated vehicle control to avoid crossing traffic

Computer-Aided Civil and Infrastructure Engineering ◽

10.1111/mice.12572 ◽

2020 ◽

Cited By ~ 1

Author(s):

Yipei Wang ◽

Shuaikun Hou ◽

Xin Wang

Keyword(s):

Reinforcement Learning ◽

Vehicle Control ◽

Automated Vehicle ◽

Automated Vehicle Control

Download Full-text

Exploration with Multiple Random ε-Buffers in Off-Policy Deep Reinforcement Learning

Symmetry ◽

10.3390/sym11111352 ◽

2019 ◽

Vol 11 (11) ◽

pp. 1352 ◽

Cited By ~ 1

Author(s):

Kim ◽

Park

Keyword(s):

Reinforcement Learning ◽

Experimental Comparison ◽

Continuous Control ◽

Policy Gradient ◽

Experience Replay ◽

Discrete Action ◽

Original Goal ◽

Efficient Exploration ◽

Greedy Policy ◽

Theoretical Results

In terms of deep reinforcement learning (RL), exploration is highly significant in achieving better generalization. In benchmark studies, ε-greedy random actions have been used to encourage exploration and prevent over-fitting, thereby improving generalization. Deep RL with random ε-greedy policies, such as deep Q-networks (DQNs), can demonstrate efficient exploration behavior. A random ε-greedy policy exploits additional replay buffers in an environment of sparse and binary rewards, such as in the real-time online detection of network securities by verifying whether the network is “normal or anomalous.” Prior studies have illustrated that a prioritized replay memory attributed to a complex temporal difference error provides superior theoretical results. However, another implementation illustrated that in certain environments, the prioritized replay memory is not superior to the randomly-selected buffers of random ε-greedy policy. Moreover, a key challenge of hindsight experience replay inspires our objective by using additional buffers corresponding to each different goal. Therefore, we attempt to exploit multiple random ε-greedy buffers to enhance explorations for a more near-perfect generalization with one original goal in off-policy RL. We demonstrate the benefit of off-policy learning from our method through an experimental comparison of DQN and a deep deterministic policy gradient in terms of discrete action, as well as continuous control for complete symmetric environments.

Download Full-text

A Traffic Signal Control Method Based on Asynchronous Reinforcement Learning

CICTP 2018 ◽

10.1061/9780784481523.144 ◽

2018 ◽

Cited By ~ 1

Author(s):

Yusen Huo ◽

Jianming Hu ◽

Guan Wang ◽

Junhan Chen

Keyword(s):

Reinforcement Learning ◽

Control Method ◽

Traffic Signal ◽

Signal Control ◽

Traffic Signal Control

Download Full-text

Joint Optimization of Intersection Control and Trajectory Planning Accounting for Pedestrians in a Connected and Automated Vehicle Environment

Sustainability ◽

10.3390/su13031135 ◽

2021 ◽

Vol 13 (3) ◽

pp. 1135

Author(s):

Biao Yin ◽

Monica Menendez ◽

Kaidi Yang

Keyword(s):

Control Method ◽

Optimal Solution ◽

Joint Optimization ◽

Signal Control ◽

Traffic Signal Control ◽

Ant Colony System ◽

Automated Vehicle ◽

Vehicle Delay ◽

Vehicle Demand ◽

Pedestrian Crossing

Connected and automated vehicle (CAV) technology makes it possible to track and control the movement of vehicles, thus providing enormous potential to improve intersection operations. In this paper, we study the traffic signal control problem at an isolated intersection in a CAV environment, considering mixed traffic including various types of vehicles and pedestrians. Both the vehicle delay and the pedestrian delay are incorporated into the model formulation. This introduces some additional complexity, as any benefits to pedestrians will come at the expense of higher delays for the vehicles. Thus, some valid questions we answer in this paper are as follows: Under which circumstances could we provide priority to pedestrians without over penalizing the vehicles at the intersection? How important are the connectivity and autonomy associated with CAV technology in this context? What type of signal control algorithm could be used to minimize person delay accounting for both vehicles and pedestrians? How could it be solved efficiently? To address these questions, we present a model that optimizes signal control (i.e., vehicle departure sequence), automated vehicle trajectories, and the treatment of pedestrian crossing. In each decision step, the weighted sum of the vehicle delay and the pedestrian delay (e.g., the total person delay) is minimized by the joint optimization on the basis of the predicted departure sequences of vehicles and pedestrians. Moreover, a near-optimal solution of the integrated problem is obtained with an ant colony system algorithm, which is computationally very efficient. Simulations are conducted for different demand scenarios and different CAV penetration rates. The performance of the proposed algorithm in terms of the average person delay is investigated. The simulation results show that the proposed algorithm has potential to reduce the delay compared to an actuated signal control method. Moreover, in comparison to a CAV-based signal control that does not account for the pedestrian delay, the joint optimization proposed here can achieve improvement in the low- and moderate-vehicle-demand scenarios.

Download Full-text

Efficient hindsight reinforcement learning using demonstrations for robotic tasks with sparse rewards

International Journal of Advanced Robotic Systems ◽

10.1177/1729881419898342 ◽

2020 ◽

Vol 17 (1) ◽

pp. 172988141989834

Author(s):

Guoyu Zuo ◽

Qishen Zhao ◽

Jiahao Lu ◽

Jiangeng Li

Keyword(s):

Reinforcement Learning ◽

Gradient Algorithm ◽

Learning To Learn ◽

Model Free ◽

Learning Speed ◽

Policy Gradient ◽

Experience Replay ◽

Speed Up ◽

Reward Functions ◽

Robotic Tasks

The goal of reinforcement learning is to enable an agent to learn by using rewards. However, some robotic tasks naturally specify with sparse rewards, and manually shaping reward functions is a difficult project. In this article, we propose a general and model-free approach for reinforcement learning to learn robotic tasks with sparse rewards. First, a variant of Hindsight Experience Replay, Curious and Aggressive Hindsight Experience Replay, is proposed to improve the sample efficiency of reinforcement learning methods and avoid the need for complicated reward engineering. Second, based on Twin Delayed Deep Deterministic policy gradient algorithm, demonstrations are leveraged to overcome the exploration problem and speed up the policy training process. Finally, the action loss is added into the loss function in order to minimize the vibration of output action while maximizing the value of the action. The experiments on simulated robotic tasks are performed with different hyperparameters to verify the effectiveness of our method. Results show that our method can effectively solve the sparse reward problem and obtain a high learning speed.

Download Full-text