Optimal Operation of a Microgrid with Hydrogen Storage Based on Deep Reinforcement Learning

Zhenshan Zhu; Zhimin Weng; Hailin Zheng

doi:10.3390/electronics11020196

Optimal Operation of a Microgrid with Hydrogen Storage Based on Deep Reinforcement Learning

Electronics ◽

10.3390/electronics11020196 ◽

2022 ◽

Vol 11 (2) ◽

pp. 196

Author(s):

Zhenshan Zhu ◽

Zhimin Weng ◽

Hailin Zheng

Keyword(s):

Hydrogen Storage ◽

Interpolation Method ◽

Linear Interpolation ◽

Optimal Operation ◽

Gradient Algorithm ◽

Sequential Decision ◽

Generalization Capability ◽

Characteristic Model ◽

Decision Making Problem ◽

Policy Gradient

Microgrid with hydrogen storage is an effective way to integrate renewable energy and reduce carbon emissions. This paper proposes an optimal operation method for a microgrid with hydrogen storage. The electrolyzer efficiency characteristic model is established based on the linear interpolation method. The optimal operation model of microgrid is incorporated with the electrolyzer efficiency characteristic model. The sequential decision-making problem of the optimal operation of microgrid is solved by a deep deterministic policy gradient algorithm. Simulation results show that the proposed method can reduce about 5% of the operation cost of the microgrid compared with traditional algorithms and has a certain generalization capability.

Download Full-text

Selective network discovery via deep reinforcement learning on embedded spaces

Applied Network Science ◽

10.1007/s41109-021-00365-8 ◽

2021 ◽

Vol 6 (1) ◽

Author(s):

Peter Morales ◽

Rajmonda Sulo Caceres ◽

Tina Eliassi-Rad

Keyword(s):

Reinforcement Learning ◽

Learning Algorithm ◽

Sequential Decision ◽

Network Discovery ◽

Learning Tasks ◽

Partially Observed ◽

Decision Making Problem ◽

Resource Collection ◽

Improved Performance ◽

Discovery Algorithms

AbstractComplex networks are often either too large for full exploration, partially accessible, or partially observed. Downstream learning tasks on these incomplete networks can produce low quality results. In addition, reducing the incompleteness of the network can be costly and nontrivial. As a result, network discovery algorithms optimized for specific downstream learning tasks given resource collection constraints are of great interest. In this paper, we formulate the task-specific network discovery problem as a sequential decision-making problem. Our downstream task is selective harvesting, the optimal collection of vertices with a particular attribute. We propose a framework, called network actor critic (NAC), which learns a policy and notion of future reward in an offline setting via a deep reinforcement learning algorithm. The NAC paradigm utilizes a task-specific network embedding to reduce the state space complexity. A detailed comparative analysis of popular network embeddings is presented with respect to their role in supporting offline planning. Furthermore, a quantitative study is presented on various synthetic and real benchmarks using NAC and several baselines. We show that offline models of reward and network discovery policies lead to significantly improved performance when compared to competitive online discovery algorithms. Finally, we outline learning regimes where planning is critical in addressing sparse and changing reward signals.

Download Full-text

Heuristic Gait Learning of Quadruped Robot Based on Deep Deterministic Policy Gradient Algorithm

2020 Chinese Automation Congress (CAC) ◽

10.1109/cac51589.2020.9326973 ◽

2020 ◽

Author(s):

Mingchao Wang ◽

Xiaogang Ruan ◽

Xiaoqing Zhu

Keyword(s):

Quadruped Robot ◽

Gradient Algorithm ◽

Policy Gradient ◽

Gait Learning

Download Full-text

Safe option-critic: learning safety in the option-critic architecture

The Knowledge Engineering Review ◽

10.1017/s0269888921000035 ◽

2021 ◽

Vol 36 ◽

Author(s):

Arushi Jain ◽

Khimya Khetarpal ◽

Doina Precup

Keyword(s):

Model Uncertainty ◽

Gradient Algorithm ◽

Intrinsic Variability ◽

Expected Return ◽

Practical Applications ◽

Hierarchical Reinforcement Learning ◽

Continuous State ◽

End Conditions ◽

Policy Gradient ◽

High Uncertainty

Abstract Designing hierarchical reinforcement learning algorithms that exhibit safe behaviour is not only vital for practical applications but also facilitates a better understanding of an agent’s decisions. We tackle this problem in the options framework (Sutton, Precup & Singh, 1999), a particular way to specify temporally abstract actions which allow an agent to use sub-policies with start and end conditions. We consider a behaviour as safe that avoids regions of state space with high uncertainty in the outcomes of actions. We propose an optimization objective that learns safe options by encouraging the agent to visit states with higher behavioural consistency. The proposed objective results in a trade-off between maximizing the standard expected return and minimizing the effect of model uncertainty in the return. We propose a policy gradient algorithm to optimize the constrained objective function. We examine the quantitative and qualitative behaviours of the proposed approach in a tabular grid world, continuous-state puddle world, and three games from the Arcade Learning Environment: Ms. Pacman, Amidar, and Q*Bert. Our approach achieves a reduction in the variance of return, boosts performance in environments with intrinsic variability in the reward structure, and compares favourably both with primitive actions and with risk-neutral options.

Download Full-text

An experimental study on optimal spark timing control for improved performance of a flex fuel vehicle engine

Proceedings of the Institution of Mechanical Engineers Part D Journal of Automobile Engineering ◽

10.1177/0954407019869773 ◽

2019 ◽

Vol 234 (5) ◽

pp. 1294-1303

Author(s):

Junsang Yoo ◽

Taeyong Lee ◽

Pyungsik Go ◽

Yongseok Cho ◽

Kwangsoon Choi ◽

...

Keyword(s):

Empirical Equation ◽

Interpolation Method ◽

Linear Interpolation ◽

Combustion Characteristics ◽

Combustion Stability ◽

Timing Control ◽

Vehicle Engine ◽

Spark Timing ◽

Linear Interpolation Method ◽

Blended Fuels

In the American continent, the most frequently used alternative fuel is ethanol. Especially in Brazil, various blends of gasoline–ethanol fuels are widely spread. The vehicle using blended fuel is called flexible fuel vehicle. Because of several selections for the blending ratios in gas stations, the fuel properties may vary after refueling depending on a driver’s selection. Also, the combustion characteristics of the flexible fuel vehicle engine may change. In order to respond to the flexible fuel vehicle market in Brazil, a study on blended fuels is performed. The main purpose of this study is to enhance performance of the flexible fuel vehicle engine to target Brazilian market. Therefore, we investigated combustion characteristics and optimal spark timings of the blended fuels with various blending ratios to improve the performance of the flexible fuel vehicle engine. As a tool for prediction of the optimal spark timing for the 1.6L flexible fuel vehicle engine, the empirical equation was suggested. The validity of the equation was investigated by comparing the predicted optimal spark timings with the stock spark timings through engine tests. When the stock spark timings of E0 and E100 were optimal, the empirical equation predicted the actual optimal spark timings for blended fuels with a good accuracy. In all conditions, by optimizing spark timing control, performance was improved. Especially, torque improvements of E30 and E50 fuels were 5.4% and 1.8%, respectively, without affecting combustion stability. From these results, it was concluded that the linear interpolation method is not suitable for flexible fuel vehicle engine control. Instead of linear interpolation method, optimal spark timing which reflects specific octane numbers of gasoline–ethanol blended fuels should be applied to maximize performance of the flexible fuel vehicle engine. The results of this study are expected to save the effort required for engine calibration when developing new flexible fuel vehicle engines and to be used as a basic strategy to improve the performance of other flexible fuel vehicle engines.

Download Full-text

Improved linear interpolation method for the estimation of snow-covered area from optical data

Remote Sensing of Environment ◽

10.1016/s0034-4257(02)00025-1 ◽

2002 ◽

Vol 82 (1) ◽

pp. 64-78 ◽

Cited By ~ 44

Author(s):

Sari Metsämäki ◽

Jenni Vepsäläinen ◽

Jouni Pulliainen ◽

Yrjö Sucksdorff

Keyword(s):

Interpolation Method ◽

Linear Interpolation ◽

Optical Data ◽

Snow Covered Area ◽

Covered Area ◽

Linear Interpolation Method

Download Full-text

Preceding vehicle following algorithm with human driving characteristics

Proceedings of the Institution of Mechanical Engineers Part D Journal of Automobile Engineering ◽

10.1177/0954407020981546 ◽

2021 ◽

pp. 095440702098154

Author(s):

Feng Pan ◽

Hong Bao

Keyword(s):

Reinforcement Learning ◽

Weight Vector ◽

Gradient Algorithm ◽

Inner Product ◽

Inverse Reinforcement Learning ◽

Reward Function ◽

Human Driver ◽

Policy Gradient ◽

Preceding Vehicle ◽

Action Spaces

This paper proposes a new approach of using reinforcement learning (RL) to train an agent to perform the task of vehicle following with human driving characteristics. We refer to the ideal of inverse reinforcement learning to design the reward function of the RL model. The factors that need to be weighed in vehicle following were vectorized into reward vectors, and the reward function was defined as the inner product of the reward vector and weights. Driving data of human drivers was collected and analyzed to obtain the true reward function. The RL model was trained with the deterministic policy gradient algorithm because the state and action spaces are continuous. We adjusted the weight vector of the reward function so that the value vector of the RL model could continuously approach that of a human driver. After dozens of rounds of training, we selected the policy with the nearest value vector to that of a human driver and tested it in the PanoSim simulation environment. The results showed the desired performance for the task of an agent following the preceding vehicle safely and smoothly.

Download Full-text

Study on Lidar Data Interpolation Method Based on GA-BP

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.588-589.1312 ◽

2012 ◽

Vol 588-589 ◽

pp. 1312-1315

Author(s):

Yi Kun Zhang ◽

Ming Hui Zhang ◽

Xin Hong Hei ◽

Deng Xin Hua ◽

Hao Chen

Keyword(s):

Neural Network ◽

Bp Neural Network ◽

Interpolation Method ◽

Linear Interpolation ◽

Lidar Data ◽

Data Interpolation ◽

Interpolation Model ◽

Genetic Method ◽

Interpolation Accuracy ◽

Linear Interpolation Method

Aiming at building a Lidar data interpolation model, this paper designs and implements a GA-BP interpolation method. The proposed method uses genetic method to optimize BP neural network, which greatly improves the calculation accuracy and convergence rate of BP neural network. Experimental results show that the proposed method has a higher interpolation accuracy compared with BP neural network as well as linear interpolation method.

Download Full-text

Comparison of two gap-filling techniques for nitrous oxide fluxes from agricultural soil

Canadian Journal of Soil Science ◽

10.1139/cjss-2018-0041 ◽

2019 ◽

Vol 99 (1) ◽

pp. 12-24 ◽

Cited By ~ 5

Author(s):

Rezvan Taki ◽

Claudia Wagner-Riddle ◽

Gary Parkin ◽

Rob Gordon ◽

Andrew VanderZaag

Keyword(s):

Missing Values ◽

Interpolation Method ◽

Weather Conditions ◽

Linear Interpolation ◽

Bias Error ◽

Research Station ◽

Ann Model ◽

Gap Filling ◽

Adverse Weather ◽

The Impact

Micrometeorological methods are ideally suited for continuous measurements of N2O fluxes, but gaps in the time series occur due to low-turbulence conditions, power failures, and adverse weather conditions. Two gap-filling methods including linear interpolation and artificial neural networks (ANN) were utilized to reconstruct missing N2O flux data from a corn–soybean–wheat rotation and evaluate the impact on annual N2O emissions from 2001 to 2006 at the Elora Research Station, ON, Canada. The single-year ANN method is recommended because this method captured flux variability better than the linear interpolation method (average R2 of 0.41 vs. 0.34). Annual N2O emission and annual bias resulting from linear and single-year ANN were compatible with each other when there were few and short gaps (i.e., percentage of missing values <30%). However, with longer gaps (>20 d), the bias error in annual fluxes varied between 0.082 and 0.344 kg N2O-N ha−1 for linear and 0.069 and 0.109 kg N2O-N ha−1 for single-year ANN. Hence, the single-year ANN with lower annual bias and stable approach over various years is recommended, if the appropriate driving inputs (i.e., soil temperature, soil water content, precipitation, N mineral content, and snow depth) needed for the ANN model are available.

Download Full-text

Seismic tomography of the Gulf of Corinth: a comparison of methods

Annals of Geophysics ◽

10.4401/ag-3931 ◽

1997 ◽

Vol 40 (1) ◽

Author(s):

E. Le Meur ◽

J. Virieux ◽

P. Podvin

Keyword(s):

Linear System ◽

Travel Time ◽

Velocity Structure ◽

Interpolation Method ◽

Spline Interpolation ◽

Linear Interpolation ◽

Regular Grid ◽

Partial Derivatives ◽

Simultaneous Inversion ◽

Gulf Of Corinth

At a local scale, travel-time tomography requires a simultaneous inversion of earthquake positions and velocity structure. We applied a joint iterative inversion scheme where medium parameters and hypocenter parameters were inverted simultaneously. At each step of the inversion, rays between hypocenters and stations were traced, new partial derivatives of travel-time were estimated and scaling between parameters was performed as well. The large sparse linear system modified by the scaling was solved by the LSQR method at each iteration. We compared performances of two different forward techniques. Our first approach was a fast ray tracing based on a paraxial method to solve the two-point boundary value problem. The rays connect sources and stations in a velocity structure described by a 3D B-spline interpolation over a regular grid. The second approach is the finite-difference solution of the eikonal equation with a 3D linear interpolation over a regular grid. The partial derivatives are estimated differently depending on the interpolation method. The reconstructed images are sensitive to the spatial variation of the partial derivatives shown by synthetic examples. We aldo found that a scaling between velocity and hypocenter parameters involved in the linear system to be solved is important in recovering accurate amplitudes of anomalies. This scaling was estimated to be five through synthetic examples with the real configuration of stations and sources. We also found it necessary to scale Pand S velocities in order to recover better amplitudes of S velocity anomaly. The crustal velocity structure of a 50X50X20 km domain near Patras in the Gulf of Corinth (Greece) was recovered using microearthquake data. These data were recorded during a field experiment in 1991 where a dense network of 60 digital stations was deployed. These microearthquakes were widely distributed under the Gulf of Corinth and enabled us to perform a reliable tomography of first arrival P and S travel-times. The obtained images of this seismically active zone show a south/north asymmetry in agreement with the tectonic context. The transition to high velocity lies between 6 km and 9 km indicating a very thin crust related to the active extension regime.At a local scale, travel-time tomography requires a simultaneous inversion of earthquake positions and velocity structure. We applied a joint iterative inversion scheme where medium parameters and hypocenter parameters were inverted simultaneously. At each step of the inversion, rays between hypocenters and stations were traced, new partial derivatives of travel-time were estimated and scaling between parameters was performed as well. The large sparse linear system modified by the scaling was solved by the LSQR method at each iteration. We compared performances of two different forward techniques. Our first approach was a fast ray tracing based on a paraxial method to solve the two-point boundary value problem. The rays connect sources and stations in a velocity structure described by a 3D B-spline interpolation over a regular grid. The second approach is the finite-difference solution of the eikonal equation with a 3D linear interpolation over a regular grid. The partial derivatives are estimated differently depending on the interpolation method. The reconstructed images are sensitive to the spatial variation of the partial derivatives shown by synthetic examples. We aldo found that a scaling between velocity and hypocenter parameters involved in the linear system to be solved is important in recovering accurate amplitudes of anomalies. This scaling was estimated to be five through synthetic examples with the real configuration of stations and sources. We also found it necessary to scale Pand S velocities in order to recover better amplitudes of S velocity anomaly. The crustal velocity structure of a 50X50X20 km domain near Patras in the Gulf of Corinth (Greece) was recovered using microearthquake data. These data were recorded during a field experiment in 1991 where a dense network of 60 digital stations was deployed. These microearthquakes were widely distributed under the Gulf of Corinth and enabled us to perform a reliable tomography of first arrival P and S travel-times. The obtained images of this seismically active zone show a south/north asymmetry in agreement with the tectonic context. The transition to high velocity lies between 6 km and 9 km indicating a very thin crust related to the active extension regime.

Download Full-text

PROCESSING JUMP POINT OF LIDAR DETECTION DATA AND INVERSING THE AEROSOL EXTINCTION COEFFICIENT

ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences ◽

10.5194/isprs-archives-xlii-3-w9-227-2019 ◽

2019 ◽

Vol XLII-3/W9 ◽

pp. 227-232

Author(s):

H. L. Zhang ◽

H. Zhao ◽

Y. P. Liu ◽

X. K. Wang ◽

C. Shu

Keyword(s):

Atmospheric Aerosols ◽

Extinction Coefficient ◽

Interpolation Method ◽

Linear Interpolation ◽

Mie Scattering ◽

Inversion Method ◽

Aerosol Extinction ◽

Detection Range ◽

Long Time ◽

Aerosol Extinction Coefficient

Abstract. For a long time, the research of the optical properties of atmospheric aerosols has aroused a wide concern in the field of atmospheric and environmental. Many scholars commonly use the Klett method to invert the lidar return signal of Mie scattering. However, there are always some negative values in the detection data of lidar, which have no actual meaning，and which are jump points. The jump points are also called wild value points and abnormal points. The jump points are refered to the detecting points that are significantly different from the surrounding detection points, and which are not consistent with the actual situation. As a result, when the far end point is selected as the boundary value, the inversion error is too large to successfully invert the extinction coefficient profile. These negative points are jump points, which must be removed in the inversion process. In order to solve the problem, a method of processing jump points of detection data of lidar and the inversion method of aerosol extinction coefficient is proposed in this paper. In this method, when there are few jump points, the linear interpolation method is used to process the jump points. When the number of continuous jump points is large, the function fitting method is used to process the jump points. The feasibility and reliability of this method are verified by using actual lidar data. The results show that the extinction coefficient profile can be successfully inverted when different remote boundary values are chosen. The extinction coefficient profile inverted by this method is more continuous and smoother. The effective detection range of lidar is greatly increased using this method. The extinction coefficient profile is more realistic. The extinction coefficient profile inverted by this method is more favorable to further analysis of the properties of atmospheric aerosol. Therefore, this method has great practical application and popularization value.

Download Full-text