Data-Driven Online Energy Scheduling of a Microgrid Based on Deep Reinforcement Learning

The proliferation of distributed renewable energy resources (RESs) poses major challenges to the operation of microgrids due to uncertainty. Traditional online scheduling approaches relying on accurate forecasts become difficult to implement due to the increase of uncertain RESs. Although several data-driven methods have been proposed recently to overcome the challenge, they generally suffer from a scalability issue due to the limited ability to optimize high-dimensional continuous control variables. To address these issues, we propose a data-driven online scheduling method for microgrid energy optimization based on continuous-control deep reinforcement learning (DRL). We formulate the online scheduling problem as a Markov decision process (MDP). The objective is to minimize the operating cost of the microgrid considering the uncertainty of RESs generation, load demand, and electricity prices. To learn the optimal scheduling strategy, a Gated Recurrent Unit (GRU)-based network is designed to extract temporal features of uncertainty and generate the optimal scheduling decisions in an end-to-end manner. To optimize the policy with high-dimensional and continuous actions, proximal policy optimization (PPO) is employed to train the neural network-based policy in a data-driven fashion. The proposed method does not require any forecasting information on the uncertainty or a prior knowledge of the physical model of the microgrid. Simulation results using realistic power system data of California Independent System Operator (CAISO) demonstrate the effectiveness of the proposed method.

Download Full-text

Proximal Policy Optimization Through a Deep Reinforcement Learning Framework for Multiple Autonomous Vehicles at a Non-Signalized Intersection

Applied Sciences ◽

10.3390/app10165722 ◽

2020 ◽

Vol 10 (16) ◽

pp. 5722 ◽

Cited By ~ 1

Author(s):

Duy Quang Tran ◽

Sang-Hoon Bae

Keyword(s):

Reinforcement Learning ◽

Autonomous Vehicles ◽

Autonomous Vehicle ◽

Signalized Intersection ◽

Continuous Control ◽

Simulation Performance ◽

Perceptron Algorithm ◽

Learning Framework ◽

Positive Effects ◽

Policy Optimization

Advanced deep reinforcement learning shows promise as an approach to addressing continuous control tasks, especially in mixed-autonomy traffic. In this study, we present a deep reinforcement-learning-based model that considers the effectiveness of leading autonomous vehicles in mixed-autonomy traffic at a non-signalized intersection. This model integrates the Flow framework, the simulation of urban mobility simulator, and a reinforcement learning library. We also propose a set of proximal policy optimization hyperparameters to obtain reliable simulation performance. First, the leading autonomous vehicles at the non-signalized intersection are considered with varying autonomous vehicle penetration rates that range from 10% to 100% in 10% increments. Second, the proximal policy optimization hyperparameters are input into the multiple perceptron algorithm for the leading autonomous vehicle experiment. Finally, the superiority of the proposed model is evaluated using all human-driven vehicle and leading human-driven vehicle experiments. We demonstrate that full-autonomy traffic can improve the average speed and delay time by 1.38 times and 2.55 times, respectively, compared with all human-driven vehicle experiments. Our proposed method generates more positive effects when the autonomous vehicle penetration rate increases. Additionally, the leading autonomous vehicle experiment can be used to dissipate the stop-and-go waves at a non-signalized intersection.

Download Full-text

A Deep Reinforcement Learning Based Car Following Model for Electric Vehicle

智能城市应用 ◽

10.33142/sca.v2i5.813 ◽

2019 ◽

Vol 2 (5) ◽

Author(s):

Yuankai Wu ◽

Huachun Tan ◽

Jiankun Peng ◽

Bin Ran

Keyword(s):

Reinforcement Learning ◽

Electric Vehicle ◽

Electricity Consumption ◽

Trust Region ◽

Research Area ◽

Data Driven ◽

Car Following ◽

Car Following Model ◽

On The Road ◽

Policy Optimization

Car following (CF) models are an appealing research area because they fundamentally describe longitudinal interactions of vehicles on the road, and contribute significantly to an understanding of traffic flow. There is an emerging trend to use data-driven method to build CF models. One challenge to the data-driven CF models is their capability to achieve optimal longitudinal driven behavior because a lot of bad driving behaviors will be learnt from human drivers by the supervised learning manner. In this study, by utilizing the deep reinforcement learning (DRL) techniques trust region policy optimization (TRPO), a DRL based CF model for electric vehicle (EV) is built. The proposed CF model can learn optimal driving behavior by itself in simulation. The experiments on following standard driving cycle show that the DRL model outperforms the traditional CF model in terms of electricity consumption.

Download Full-text

Stochastic Actor-Executor-Critic for Image-to-Image Translation

Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2021/382 ◽

2021 ◽

Author(s):

Ziwei Luo ◽

Jing Hu ◽

Xin Wang ◽

Siwei Lyu ◽

Bin Kong ◽

...

Keyword(s):

Reinforcement Learning ◽

Control Policy ◽

High Dimensional ◽

Continuous Control ◽

Continuous Space ◽

Model Free ◽

Recent Success ◽

Image Translation ◽

Continuous State ◽

And Control

Training a model-free deep reinforcement learning model to solve image-to-image translation is difficult since it involves high-dimensional continuous state and action spaces. In this paper, we draw inspiration from the recent success of the maximum entropy reinforcement learning framework designed for challenging continuous control problems to develop stochastic policies over high dimensional continuous spaces including image representation, generation, and control simultaneously. Central to this method is the Stochastic Actor-Executor-Critic (SAEC) which is an off-policy actor-critic model with an additional executor to generate realistic images. Specifically, the actor focuses on the high-level representation and control policy by a stochastic latent action, as well as explicitly directs the executor to generate low-level actions to manipulate the state. Experiments on several image-to-image translation tasks have demonstrated the effectiveness and robustness of the proposed SAEC when facing high-dimensional continuous space problems.

Download Full-text

Accelerating the training of deep reinforcement learning in autonomous driving

IAES International Journal of Artificial Intelligence (IJ-AI) ◽

10.11591/ijai.v10.i3.pp649-656 ◽

2021 ◽

Vol 10 (3) ◽

pp. 649

Author(s):

Emmanuel Ifeanyi Iroegbu ◽

Devaraj Madhavi

Keyword(s):

Reinforcement Learning ◽

Autonomous Vehicle ◽

Autonomous Driving ◽

High Dimensional ◽

Training Time ◽

Learning Agent ◽

Policy Gradient ◽

Low Dimensional ◽

Policy Optimization

Deep reinforcement learning has been successful in solving common autonomous driving tasks such as lane-keeping by simply using pixel data from the front view camera as input. However, raw pixel data contains a very high-dimensional observation that affects the learning quality of the agent due to the complexity imposed by a 'realistic' urban environment. Ergo, we investigate how compressing the raw pixel data from high-dimensional state to low-dimensional latent space offline using a variational autoencoder can significantly improve the training of a deep reinforcement learning agent. We evaluated our method on a simulated autonomous vehicle in car learning to act and compared our results with many baselines including deep deterministic policy gradient, proximal policy optimization, and soft actorcritic. The result shows that the method greatly accelerates the training time and there was a remarkable improvement in the quality of the deep reinforcement learning agent.

Download Full-text

Missile guidance with assisted deep reinforcement learning for head-on interception of maneuvering target

Complex & Intelligent Systems ◽

10.1007/s40747-021-00577-6 ◽

2021 ◽

Author(s):

Weifan Li ◽

Yuanheng Zhu ◽

Dongbin Zhao

Keyword(s):

Reinforcement Learning ◽

Learning Task ◽

Imitation Learning ◽

Continuous Control ◽

Missile Guidance ◽

Maneuvering Target ◽

Detection Delay ◽

The Neural Network ◽

The Mean ◽

Policy Optimization

AbstractIn missile guidance, pursuit performance is seriously degraded due to the uncertainty and randomness in target maneuverability, detection delay, and environmental noise. In many methods, accurately estimating the acceleration of the target or the time-to-go is needed to intercept the maneuvering target, which is hard in an environment with uncertainty. In this paper, we propose an assisted deep reinforcement learning (ARL) algorithm to optimize the neural network-based missile guidance controller for head-on interception. Based on the relative velocity, distance, and angle, ARL can control the missile to intercept the maneuvering target and achieve large terminal intercept angle. To reduce the influence of environmental uncertainty, ARL predicts the target’s acceleration as an auxiliary supervised task. The supervised learning task improves the ability of the agent to extract information from observations. To exploit the agent’s good trajectories, ARL presents the Gaussian self-imitation learning to make the mean of action distribution approach the agent’s good actions. Compared with vanilla self-imitation learning, Gaussian self-imitation learning improves the exploration in continuous control. Simulation results validate that ARL outperforms traditional methods and proximal policy optimization algorithm with higher hit rate and larger terminal intercept angle in the simulation environment with noise, delay, and maneuverable target.

Download Full-text

Policy Optimization with Second-Order Advantage Information

Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2018/699 ◽

2018 ◽

Cited By ~ 1

Author(s):

Jiajin Li ◽

Baoxiang Wang ◽

Shengyu Zhang

Keyword(s):

Empirical Studies ◽

Second Order ◽

High Dimensional ◽

Continuous Control ◽

Unified Framework ◽

Performance Improvements ◽

Factorization Structure ◽

Policy Gradient ◽

Policy Optimization ◽

And Control

Policy optimization on high-dimensional continuous control tasks exhibits its difficulty caused by the large variance of the policy gradient estimators. We present the action subspace dependent gradient (ASDG) estimator which incorporates the Rao-Blackwell theorem (RB) and Control Variates (CV) into a unified framework to reduce the variance. To invoke RB, our proposed algorithm (POSA) learns the underlying factorization structure among the action space based on the second-order advantage information. POSA captures the quadratic information explicitly and efficiently by utilizing the wide \& deep architecture. Empirical studies show that our proposed approach demonstrates the performance improvements on high-dimensional synthetic settings and OpenAI Gym's MuJoCo continuous control tasks.

Download Full-text

Average-Reward Reinforcement Learning with Trust Region Methods

Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2021/385 ◽

2021 ◽

Author(s):

Xiaoteng Ma ◽

Xiaohang Tang ◽

Li Xia ◽

Jun Yang ◽

Qianchuan Zhao

Keyword(s):

Reinforcement Learning ◽

Trust Region ◽

Continuous Control ◽

Performance Bound ◽

Average Value ◽

Long Run ◽

Average Criterion ◽

Region Theory ◽

Policy Optimization ◽

Discounted Criterion

Most of reinforcement learning algorithms optimize the discounted criterion which is beneficial to accelerate the convergence and reduce the variance of estimates. Although the discounted criterion is appropriate for certain tasks such as financial related problems, many engineering problems treat future rewards equally and prefer a long-run average criterion. In this paper, we study the reinforcement learning problem with the long-run average criterion. Firstly, we develop a unified trust region theory with discounted and average criteria. With the average criterion, a novel performance bound within the trust region is derived with the Perturbation Analysis (PA) theory. Secondly, we propose a practical algorithm named Average Policy Optimization (APO), which improves the value estimation with a novel technique named Average Value Constraint. To the best of our knowledge, our work is the first one to study the trust region approach with the average criterion and it complements the framework of reinforcement learning beyond the discounted criterion. Finally, experiments are conducted in the continuous control environment MuJoCo. In most tasks, APO performs better than the discounted PPO, which demonstrates the effectiveness of our approach.

Download Full-text

Reinforcement Learning With High-Dimensional, Continuous Actions

10.21236/ada280844 ◽

1993 ◽

Cited By ~ 23

Author(s):

III. Baird ◽

Klopf Leemon C. ◽

A. H.

Keyword(s):

Reinforcement Learning ◽

High Dimensional ◽

Continuous Actions

Download Full-text

A monotonic policy optimization algorithm for high-dimensional continuous control problem in 3D MuJoCo

Multimedia Tools and Applications ◽

10.1007/s11042-018-6098-y ◽

2018 ◽

Vol 78 (20) ◽

pp. 28665-28680

Author(s):

Qunyong Yuan ◽

Nanfeng Xiao

Keyword(s):

Control Problem ◽

Optimization Algorithm ◽

High Dimensional ◽

Continuous Control ◽

Policy Optimization

Download Full-text

Vision-based reinforcement learning for lane-tracking control

ACTA IMEKO ◽

10.21014/acta_imeko.v10i3.1020 ◽

2021 ◽

Vol 10 (3) ◽

pp. 7

Author(s):

András Kalapos ◽

Csaba Gór ◽

Róbert Moni ◽

István Harmati

Keyword(s):

Reinforcement Learning ◽

Performance Metrics ◽

Small Scale ◽

Convolutional Network ◽

Lane Tracking ◽

Real Performance ◽

Continuous Actions ◽

The Right ◽

Policy Optimization

<p class="Abstract">The present study focused on vision-based end-to-end reinforcement learning in relation to<strong> </strong>vehicle control problems such as lane following and collision avoidance. The controller policy presented in this paper is able to control a small-scale robot to follow the right-hand lane of a real two-lane road, although its training has only been carried out in a simulation. This model, realised by a simple, convolutional network, relies on images of a forward-facing monocular camera and generates continuous actions that directly control the vehicle. To train this policy, proximal policy optimization was used, and to achieve the generalisation capability required for real performance, domain randomisation was used. A thorough analysis of the trained policy was conducted by measuring multiple performance metrics and comparing these to baselines that rely on other methods. To assess the quality of the simulation-to-reality transfer learning process and the performance of the controller in the real world, simple metrics were measured on a real track and compared with results from a matching simulation. Further analysis was carried out by visualising salient object maps.</p>

Download Full-text