scholarly journals Deep Reinforcement Learning Based Left-Turn Connected and Automated Vehicle Control at Signalized Intersection in Vehicle-to-Infrastructure Environment

Information ◽  
2020 ◽  
Vol 11 (2) ◽  
pp. 77 ◽  
Author(s):  
Juan Chen ◽  
Zhengxuan Xue ◽  
Daiqian Fan

In order to solve the problem of vehicle delay caused by stops at signalized intersections, a micro-control method of a left-turning connected and automated vehicle (CAV) based on an improved deep deterministic policy gradient (DDPG) is designed in this paper. In this paper, the micro-control of the whole process of a left-turn vehicle approaching, entering, and leaving a signalized intersection is considered. In addition, in order to solve the problems of low sampling efficiency and overestimation of the critic network of the DDPG algorithm, a positive and negative reward experience replay buffer sampling mechanism and multi-critic network structure are adopted in the DDPG algorithm in this paper. Finally, the effectiveness of the signal control method, six DDPG-based methods (DDPG, PNRERB-1C-DDPG, PNRERB-3C-DDPG, PNRERB-5C-DDPG, PNRERB-5CNG-DDPG, and PNRERB-7C-DDPG), and four DQN-based methods (DQN, Dueling DQN, Double DQN, and Prioritized Replay DQN) are verified under 0.2, 0.5, and 0.7 saturation degrees of left-turning vehicles at a signalized intersection within a VISSIM simulation environment. The results show that the proposed deep reinforcement learning method can get a number of stops benefits ranging from 5% to 94%, stop time benefits ranging from 1% to 99%, and delay benefits ranging from −17% to 93%, respectively compared with the traditional signal control method.

2020 ◽  
Vol 53 (2) ◽  
pp. 8118-8123
Author(s):  
Teawon Han ◽  
Subramanya Nageshrao ◽  
Dimitar P. Filev ◽  
Ümit Özgüner

Sensors ◽  
2020 ◽  
Vol 20 (15) ◽  
pp. 4291 ◽  
Author(s):  
Qiang Wu ◽  
Jianqing Wu ◽  
Jun Shen ◽  
Binbin Yong ◽  
Qingguo Zhou

With smart city infrastructures growing, the Internet of Things (IoT) has been widely used in the intelligent transportation systems (ITS). The traditional adaptive traffic signal control method based on reinforcement learning (RL) has expanded from one intersection to multiple intersections. In this paper, we propose a multi-agent auto communication (MAAC) algorithm, which is an innovative adaptive global traffic light control method based on multi-agent reinforcement learning (MARL) and an auto communication protocol in edge computing architecture. The MAAC algorithm combines multi-agent auto communication protocol with MARL, allowing an agent to communicate the learned strategies with others for achieving global optimization in traffic signal control. In addition, we present a practicable edge computing architecture for industrial deployment on IoT, considering the limitations of the capabilities of network transmission bandwidth. We demonstrate that our algorithm outperforms other methods over 17% in experiments in a real traffic simulation environment.


Electronics ◽  
2019 ◽  
Vol 8 (9) ◽  
pp. 1058 ◽  
Author(s):  
Chuanxiang Ren ◽  
Jinbo Wang ◽  
Lingqiao Qin ◽  
Shen Li ◽  
Yang Cheng

Setting up an exclusive left-turn lane and corresponding signal phase for intersection traffic safety and efficiency will decrease the capacity of the intersection when there are less or no left-turn movements. This is especially true during rush hours because of the ineffective use of left-turn lane space and signal phase duration. With the advantages of vehicle-to-infrastructure (V2I) communication, a novel intersection signal control model is proposed which sets up variable lane direction arrow marking and turns the left-turn lane into a controllable shared lane for left-turn and through movements. The new intersection signal control model and its control strategy are presented and simulated using field data. After comparison with two other intersection control models and control strategies, the new model is validated to improve the intersection capacity in rush hours. Besides, variable lane lines and the corresponding control method are designed and combined with the left-turn waiting area to overcome the shortcomings of the proposed intersection signal control model and control strategy.


2020 ◽  
Vol 12 (22) ◽  
pp. 3789
Author(s):  
Bo Li ◽  
Zhigang Gan ◽  
Daqing Chen ◽  
Dyachenko Sergey Aleksandrovich

This paper combines deep reinforcement learning (DRL) with meta-learning and proposes a novel approach, named meta twin delayed deep deterministic policy gradient (Meta-TD3), to realize the control of unmanned aerial vehicle (UAV), allowing a UAV to quickly track a target in an environment where the motion of a target is uncertain. This approach can be applied to a variety of scenarios, such as wildlife protection, emergency aid, and remote sensing. We consider a multi-task experience replay buffer to provide data for the multi-task learning of the DRL algorithm, and we combine meta-learning to develop a multi-task reinforcement learning update method to ensure the generalization capability of reinforcement learning. Compared with the state-of-the-art algorithms, namely the deep deterministic policy gradient (DDPG) and twin delayed deep deterministic policy gradient (TD3), experimental results show that the Meta-TD3 algorithm has achieved a great improvement in terms of both convergence value and convergence rate. In a UAV target tracking problem, Meta-TD3 only requires a few steps to train to enable a UAV to adapt quickly to a new target movement mode more and maintain a better tracking effectiveness.


Symmetry ◽  
2019 ◽  
Vol 11 (11) ◽  
pp. 1352 ◽  
Author(s):  
Kim ◽  
Park

In terms of deep reinforcement learning (RL), exploration is highly significant in achieving better generalization. In benchmark studies, ε-greedy random actions have been used to encourage exploration and prevent over-fitting, thereby improving generalization. Deep RL with random ε-greedy policies, such as deep Q-networks (DQNs), can demonstrate efficient exploration behavior. A random ε-greedy policy exploits additional replay buffers in an environment of sparse and binary rewards, such as in the real-time online detection of network securities by verifying whether the network is “normal or anomalous.” Prior studies have illustrated that a prioritized replay memory attributed to a complex temporal difference error provides superior theoretical results. However, another implementation illustrated that in certain environments, the prioritized replay memory is not superior to the randomly-selected buffers of random ε-greedy policy. Moreover, a key challenge of hindsight experience replay inspires our objective by using additional buffers corresponding to each different goal. Therefore, we attempt to exploit multiple random ε-greedy buffers to enhance explorations for a more near-perfect generalization with one original goal in off-policy RL. We demonstrate the benefit of off-policy learning from our method through an experimental comparison of DQN and a deep deterministic policy gradient in terms of discrete action, as well as continuous control for complete symmetric environments.


2021 ◽  
Vol 13 (3) ◽  
pp. 1135
Author(s):  
Biao Yin ◽  
Monica Menendez ◽  
Kaidi Yang

Connected and automated vehicle (CAV) technology makes it possible to track and control the movement of vehicles, thus providing enormous potential to improve intersection operations. In this paper, we study the traffic signal control problem at an isolated intersection in a CAV environment, considering mixed traffic including various types of vehicles and pedestrians. Both the vehicle delay and the pedestrian delay are incorporated into the model formulation. This introduces some additional complexity, as any benefits to pedestrians will come at the expense of higher delays for the vehicles. Thus, some valid questions we answer in this paper are as follows: Under which circumstances could we provide priority to pedestrians without over penalizing the vehicles at the intersection? How important are the connectivity and autonomy associated with CAV technology in this context? What type of signal control algorithm could be used to minimize person delay accounting for both vehicles and pedestrians? How could it be solved efficiently? To address these questions, we present a model that optimizes signal control (i.e., vehicle departure sequence), automated vehicle trajectories, and the treatment of pedestrian crossing. In each decision step, the weighted sum of the vehicle delay and the pedestrian delay (e.g., the total person delay) is minimized by the joint optimization on the basis of the predicted departure sequences of vehicles and pedestrians. Moreover, a near-optimal solution of the integrated problem is obtained with an ant colony system algorithm, which is computationally very efficient. Simulations are conducted for different demand scenarios and different CAV penetration rates. The performance of the proposed algorithm in terms of the average person delay is investigated. The simulation results show that the proposed algorithm has potential to reduce the delay compared to an actuated signal control method. Moreover, in comparison to a CAV-based signal control that does not account for the pedestrian delay, the joint optimization proposed here can achieve improvement in the low- and moderate-vehicle-demand scenarios.


2020 ◽  
Vol 17 (1) ◽  
pp. 172988141989834
Author(s):  
Guoyu Zuo ◽  
Qishen Zhao ◽  
Jiahao Lu ◽  
Jiangeng Li

The goal of reinforcement learning is to enable an agent to learn by using rewards. However, some robotic tasks naturally specify with sparse rewards, and manually shaping reward functions is a difficult project. In this article, we propose a general and model-free approach for reinforcement learning to learn robotic tasks with sparse rewards. First, a variant of Hindsight Experience Replay, Curious and Aggressive Hindsight Experience Replay, is proposed to improve the sample efficiency of reinforcement learning methods and avoid the need for complicated reward engineering. Second, based on Twin Delayed Deep Deterministic policy gradient algorithm, demonstrations are leveraged to overcome the exploration problem and speed up the policy training process. Finally, the action loss is added into the loss function in order to minimize the vibration of output action while maximizing the value of the action. The experiments on simulated robotic tasks are performed with different hyperparameters to verify the effectiveness of our method. Results show that our method can effectively solve the sparse reward problem and obtain a high learning speed.


Sign in / Sign up

Export Citation Format

Share Document