experience replay Latest Research Papers

Prediction intervals (PIs) offer an effective tool for quantifying uncertainty of loads in distribution systems. The traditional central PIs cannot adapt well to skewed distributions, and their offline training fashion is vulnerable to the unforeseen change in future load patterns. Therefore, we propose an optimal PI estimation approach, which is online and adaptive to different data distributions by adaptively determining symmetric or asymmetric probability proportion pairs for quantiles of PIs’ bounds. It relies on the online learning ability of reinforcement learning (RL) to integrate the two online tasks, i.e., the adaptive selection of probability proportion pairs and quantile predictions, both of which are modeled by neural networks. As such, the quality of quantiles-formed PI can guide the selection process of optimal probability proportion pairs, which forms a closed loop to improve PIs’ quality. Furthermore, to improve the learning efficiency of quantile forecasts, a prioritized experience replay (PER) strategy is proposed for online quantile regression processes. Case studies on both load and net load demonstrate that the proposed method can better adapt to data distribution compared with online central PIs method. Compared with offline-trained methods, it obtains PIs with better quality and is more robust against concept drift.

Download Full-text

Optimal Adaptive Prediction Intervals for Electricity Load Forecasting in Distribution Systems via Reinforcement Learning

10.36227/techrxiv.17925911.v1 ◽

2022 ◽

Author(s):

Yufan Zhang ◽

Honglin Wen ◽

Qiuwei Wu ◽

Qian Ai

Keyword(s):

Reinforcement Learning ◽

Distribution Systems ◽

Selection Process ◽

Concept Drift ◽

Prediction Intervals ◽

Learning Ability ◽

Optimal Probability ◽

Experience Replay ◽

Electricity Load ◽

Net Load

Prediction intervals (PIs) offer an effective tool for quantifying uncertainty of loads in distribution systems. The traditional central PIs cannot adapt well to skewed distributions, and their offline training fashion is vulnerable to the unforeseen change in future load patterns. Therefore, we propose an optimal PI estimation approach, which is online and adaptive to different data distributions by adaptively determining symmetric or asymmetric probability proportion pairs for quantiles of PIs’ bounds. It relies on the online learning ability of reinforcement learning (RL) to integrate the two online tasks, i.e., the adaptive selection of probability proportion pairs and quantile predictions, both of which are modeled by neural networks. As such, the quality of quantiles-formed PI can guide the selection process of optimal probability proportion pairs, which forms a closed loop to improve PIs’ quality. Furthermore, to improve the learning efficiency of quantile forecasts, a prioritized experience replay (PER) strategy is proposed for online quantile regression processes. Case studies on both load and net load demonstrate that the proposed method can better adapt to data distribution compared with online central PIs method. Compared with offline-trained methods, it obtains PIs with better quality and is more robust against concept drift.

Download Full-text

Energy-Efficient UAV Trajectory Design with Information Freshness Constraint via Deep Reinforcement Learning

Mobile Information Systems ◽

10.1155/2021/1430512 ◽

2021 ◽

Vol 2021 ◽

pp. 1-9

Author(s):

Xinmin Li ◽

Jiahui Li ◽

Dandan Liu

Keyword(s):

Energy Efficiency ◽

Optimization Problem ◽

Dynamic Environment ◽

Training Data ◽

Base Stations ◽

Trajectory Design ◽

Rest Energy ◽

Reward Function ◽

Experience Replay ◽

The Individual

Unmanned aerial vehicle (UAV) technique with flexible deployment has enabled the development of Internet of Things (IoT) applications. However, it is difficult to guarantee the freshness of information delivery for the energy-limited UAV. Thus, we study the trajectory design in the multiple-UAV communication system, in which the massive ground devices send the individual information to mobile UAV base stations under the demand of information freshness. First, an energy-efficiency (EE) maximization optimization problem is formulated under the rest energy, safety distance, and age of information (AoI) constraints. However, it is difficult to solve the optimization problem due to the nonconvex objective function and unknown dynamic environment. Second, a trajectory design based on the deep Q-network method is proposed, in which the state space considering energy efficiency, rest energy, and AoI and the efficient reward function related with EE performance are constructed, respectively. Furthermore, to avoid the dependency of training data for the neural network, the experience replay and random sampling for batch are adopted. Finally, we validate the system performance of the proposed scheme. Simulation results show that the proposed scheme can achieve a better EE performance compared with the benchmark scheme.

Download Full-text

Learning efficient navigation in vortical flow fields

Nature Communications ◽

10.1038/s41467-021-27015-y ◽

2021 ◽

Vol 12 (1) ◽

Author(s):

Peter Gunnarson ◽

Ioannis Mandralis ◽

Guido Novati ◽

Petros Koumoutsakos ◽

John O. Dabiri

Keyword(s):

Learning Algorithm ◽

Flow Fields ◽

Vortical Flow ◽

Background Flow ◽

Control Techniques ◽

Experience Replay ◽

Two Dimensional Flow ◽

Point To Point ◽

Target Locations ◽

Robotic Applications

AbstractEfficient point-to-point navigation in the presence of a background flow field is important for robotic applications such as ocean surveying. In such applications, robots may only have knowledge of their immediate surroundings or be faced with time-varying currents, which limits the use of optimal control techniques. Here, we apply a recently introduced Reinforcement Learning algorithm to discover time-efficient navigation policies to steer a fixed-speed swimmer through unsteady two-dimensional flow fields. The algorithm entails inputting environmental cues into a deep neural network that determines the swimmer’s actions, and deploying Remember and Forget Experience Replay. We find that the resulting swimmers successfully exploit the background flow to reach the target, but that this success depends on the sensed environmental cue. Surprisingly, a velocity sensing approach significantly outperformed a bio-mimetic vorticity sensing approach, and achieved a near 100% success rate in reaching the target locations while approaching the time-efficiency of optimal navigation trajectories.

Download Full-text

An Improved Dueling Deep Double-Q Network Based on Prioritized Experience Replay for Path Planning of Unmanned Surface Vehicles

Journal of Marine Science and Engineering ◽

10.3390/jmse9111267 ◽

2021 ◽

Vol 9 (11) ◽

pp. 1267

Author(s):

Zhengwei Zhu ◽

Can Hu ◽

Chenyang Zhu ◽

Yanping Zhu ◽

Yu Sheng

Keyword(s):

Neural Network ◽

Path Planning ◽

Optimal Path ◽

Model Performance ◽

Research Direction ◽

Learning Performance ◽

Utilization Rate ◽

The Neural Network ◽

Experience Replay ◽

The Impact

Unmanned Surface Vehicle (USV) has a broad application prospect and autonomous path planning as its crucial technology has developed into a hot research direction in the field of USV research. This paper proposes an Improved Dueling Deep Double-Q Network Based on Prioritized Experience Replay (IPD3QN) to address the slow and unstable convergence of traditional Deep Q Network (DQN) algorithms in autonomous path planning of USV. Firstly, we use the deep double Q-Network to decouple the selection and calculation of the target Q value action to eliminate overestimation. The prioritized experience replay method is adopted to extract experience samples from the experience replay unit, increase the utilization rate of actual samples, and accelerate the training speed of the neural network. Then, the neural network is optimized by introducing a dueling network structure. Finally, the soft update method is used to improve the stability of the algorithm, and the dynamic ϵ-greedy method is used to find the optimal strategy. The experiments are first conducted in the Open AI Gym test platform to pre-validate the algorithm for two classical control problems: the Cart pole and Mountain Car problems. The impact of algorithm hyperparameters on the model performance is analyzed in detail. The algorithm is then validated in the Maze environment. The comparative analysis of simulation experiments shows that IPD3QN has a significant improvement in learning performance regarding convergence speed and convergence stability compared with DQN, D3QN, PD2QN, PDQN, PD3QN. Also, USV can plan the optimal path according to the actual navigation environment with the IPD3QN algorithm.

Download Full-text

Research on Dynamic Path Planning of Mobile Robot Based on Improved DDPG Algorithm

Mobile Information Systems ◽

10.1155/2021/5169460 ◽

2021 ◽

Vol 2021 ◽

pp. 1-10

Author(s):

Peng Li ◽

Xiangcheng Ding ◽

Hongfang Sun ◽

Shiquan Zhao ◽

Ricardo Cajo

Keyword(s):

Path Planning ◽

Mobile Robot ◽

Success Rate ◽

Dynamic Environment ◽

Simulation Software ◽

Convergence Speed ◽

Good Effect ◽

Dynamic Path Planning ◽

Experience Replay ◽

Dynamic Path

Aiming at the problems of low success rate and slow learning speed of the DDPG algorithm in path planning of a mobile robot in a dynamic environment, an improved DDPG algorithm is designed. In this article, the RAdam algorithm is used to replace the neural network optimizer in DDPG, combined with the curiosity algorithm to improve the success rate and convergence speed. Based on the improved algorithm, priority experience replay is added, and transfer learning is introduced to improve the training effect. Through the ROS robot operating system and Gazebo simulation software, a dynamic simulation environment is established, and the improved DDPG algorithm and DDPG algorithm are compared. For the dynamic path planning task of the mobile robot, the simulation results show that the convergence speed of the improved DDPG algorithm is increased by 21%, and the success rate is increased to 90% compared with the original DDPG algorithm. It has a good effect on dynamic path planning for mobile robots with continuous action space.

Download Full-text

A Divided and Prioritized Experience Replay Approach for Streaming Regression

MethodsX ◽

10.1016/j.mex.2021.101571 ◽

2021 ◽

pp. 101571

Author(s):

Mikkel Leite Arnø ◽

John-Morten Godhavn ◽

Ole Morten Aamo

Keyword(s):

Experience Replay

Download Full-text

Robust experience replay sampling for multi-agent reinforcement learning

Pattern Recognition Letters ◽

10.1016/j.patrec.2021.11.006 ◽

2021 ◽

Author(s):

Isack Thomas Nicholaus ◽

Dae-Ki Kang

Keyword(s):

Reinforcement Learning ◽

Experience Replay ◽

Multi Agent

Download Full-text

Off-Policy Correction for Deep Deterministic Policy Gradient Algorithms via Batch Prioritized Experience Replay

10.1109/ictai52525.2021.00199 ◽

2021 ◽

Author(s):

Dogan C. Cicek ◽

Enes Duran ◽

Baturay Saglam ◽

Furkan B. Mutlu ◽

Suleyman S. Kozat

Keyword(s):

Gradient Algorithms ◽

Policy Gradient ◽

Experience Replay

Download Full-text

Structure Aware Experience Replay for Incremental Learning in Graph-based Recommender Systems

10.1145/3459637.3482193 ◽

2021 ◽

Author(s):

Kian Ahrabian ◽

Yishi Xu ◽

Yingxue Zhang ◽

Jiapeng Wu ◽

Yuening Wang ◽

...

Keyword(s):

Recommender Systems ◽

Incremental Learning ◽

Experience Replay

Download Full-text

experience replay
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

Optimal Adaptive Prediction Intervals for Electricity Load Forecasting in Distribution Systems via Reinforcement Learning

Optimal Adaptive Prediction Intervals for Electricity Load Forecasting in Distribution Systems via Reinforcement Learning

Energy-Efficient UAV Trajectory Design with Information Freshness Constraint via Deep Reinforcement Learning

Learning efficient navigation in vortical flow fields

An Improved Dueling Deep Double-Q Network Based on Prioritized Experience Replay for Path Planning of Unmanned Surface Vehicles

Research on Dynamic Path Planning of Mobile Robot Based on Improved DDPG Algorithm

A Divided and Prioritized Experience Replay Approach for Streaming Regression

Robust experience replay sampling for multi-agent reinforcement learning

Off-Policy Correction for Deep Deterministic Policy Gradient Algorithms via Batch Prioritized Experience Replay

Structure Aware Experience Replay for Incremental Learning in Graph-based Recommender Systems

Export Citation Format

experience replayRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Optimal Adaptive Prediction Intervals for Electricity Load Forecasting in Distribution Systems via Reinforcement Learning

Optimal Adaptive Prediction Intervals for Electricity Load Forecasting in Distribution Systems via Reinforcement Learning

Energy-Efficient UAV Trajectory Design with Information Freshness Constraint via Deep Reinforcement Learning

Learning efficient navigation in vortical flow fields

An Improved Dueling Deep Double-Q Network Based on Prioritized Experience Replay for Path Planning of Unmanned Surface Vehicles

Research on Dynamic Path Planning of Mobile Robot Based on Improved DDPG Algorithm

A Divided and Prioritized Experience Replay Approach for Streaming Regression

Robust experience replay sampling for multi-agent reinforcement learning

Off-Policy Correction for Deep Deterministic Policy Gradient Algorithms via Batch Prioritized Experience Replay

Structure Aware Experience Replay for Incremental Learning in Graph-based Recommender Systems

experience replay
Recently Published Documents