experience replay
Recently Published Documents


TOTAL DOCUMENTS

174
(FIVE YEARS 142)

H-INDEX

11
(FIVE YEARS 3)

2022 ◽  
Author(s):  
Yufan Zhang ◽  
Honglin Wen ◽  
Qiuwei Wu ◽  
Qian Ai

Prediction intervals (PIs) offer an effective tool for quantifying uncertainty of loads in distribution systems. The traditional central PIs cannot adapt well to skewed distributions, and their offline training fashion is vulnerable to the unforeseen change in future load patterns. Therefore, we propose an optimal PI estimation approach, which is online and adaptive to different data distributions by adaptively determining symmetric or asymmetric probability proportion pairs for quantiles of PIs’ bounds. It relies on the online learning ability of reinforcement learning (RL) to integrate the two online tasks, i.e., the adaptive selection of probability proportion pairs and quantile predictions, both of which are modeled by neural networks. As such, the quality of quantiles-formed PI can guide the selection process of optimal probability proportion pairs, which forms a closed loop to improve PIs’ quality. Furthermore, to improve the learning efficiency of quantile forecasts, a prioritized experience replay (PER) strategy is proposed for online quantile regression processes. Case studies on both load and net load demonstrate that the proposed method can better adapt to data distribution compared with online central PIs method. Compared with offline-trained methods, it obtains PIs with better quality and is more robust against concept drift.


2022 ◽  
Author(s):  
Yufan Zhang ◽  
Honglin Wen ◽  
Qiuwei Wu ◽  
Qian Ai

Prediction intervals (PIs) offer an effective tool for quantifying uncertainty of loads in distribution systems. The traditional central PIs cannot adapt well to skewed distributions, and their offline training fashion is vulnerable to the unforeseen change in future load patterns. Therefore, we propose an optimal PI estimation approach, which is online and adaptive to different data distributions by adaptively determining symmetric or asymmetric probability proportion pairs for quantiles of PIs’ bounds. It relies on the online learning ability of reinforcement learning (RL) to integrate the two online tasks, i.e., the adaptive selection of probability proportion pairs and quantile predictions, both of which are modeled by neural networks. As such, the quality of quantiles-formed PI can guide the selection process of optimal probability proportion pairs, which forms a closed loop to improve PIs’ quality. Furthermore, to improve the learning efficiency of quantile forecasts, a prioritized experience replay (PER) strategy is proposed for online quantile regression processes. Case studies on both load and net load demonstrate that the proposed method can better adapt to data distribution compared with online central PIs method. Compared with offline-trained methods, it obtains PIs with better quality and is more robust against concept drift.


2021 ◽  
Vol 2021 ◽  
pp. 1-9
Author(s):  
Xinmin Li ◽  
Jiahui Li ◽  
Dandan Liu

Unmanned aerial vehicle (UAV) technique with flexible deployment has enabled the development of Internet of Things (IoT) applications. However, it is difficult to guarantee the freshness of information delivery for the energy-limited UAV. Thus, we study the trajectory design in the multiple-UAV communication system, in which the massive ground devices send the individual information to mobile UAV base stations under the demand of information freshness. First, an energy-efficiency (EE) maximization optimization problem is formulated under the rest energy, safety distance, and age of information (AoI) constraints. However, it is difficult to solve the optimization problem due to the nonconvex objective function and unknown dynamic environment. Second, a trajectory design based on the deep Q-network method is proposed, in which the state space considering energy efficiency, rest energy, and AoI and the efficient reward function related with EE performance are constructed, respectively. Furthermore, to avoid the dependency of training data for the neural network, the experience replay and random sampling for batch are adopted. Finally, we validate the system performance of the proposed scheme. Simulation results show that the proposed scheme can achieve a better EE performance compared with the benchmark scheme.


2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Peter Gunnarson ◽  
Ioannis Mandralis ◽  
Guido Novati ◽  
Petros Koumoutsakos ◽  
John O. Dabiri

AbstractEfficient point-to-point navigation in the presence of a background flow field is important for robotic applications such as ocean surveying. In such applications, robots may only have knowledge of their immediate surroundings or be faced with time-varying currents, which limits the use of optimal control techniques. Here, we apply a recently introduced Reinforcement Learning algorithm to discover time-efficient navigation policies to steer a fixed-speed swimmer through unsteady two-dimensional flow fields. The algorithm entails inputting environmental cues into a deep neural network that determines the swimmer’s actions, and deploying Remember and Forget Experience Replay. We find that the resulting swimmers successfully exploit the background flow to reach the target, but that this success depends on the sensed environmental cue. Surprisingly, a velocity sensing approach significantly outperformed a bio-mimetic vorticity sensing approach, and achieved a near 100% success rate in reaching the target locations while approaching the time-efficiency of optimal navigation trajectories.


2021 ◽  
Vol 9 (11) ◽  
pp. 1267
Author(s):  
Zhengwei Zhu ◽  
Can Hu ◽  
Chenyang Zhu ◽  
Yanping Zhu ◽  
Yu Sheng

Unmanned Surface Vehicle (USV) has a broad application prospect and autonomous path planning as its crucial technology has developed into a hot research direction in the field of USV research. This paper proposes an Improved Dueling Deep Double-Q Network Based on Prioritized Experience Replay (IPD3QN) to address the slow and unstable convergence of traditional Deep Q Network (DQN) algorithms in autonomous path planning of USV. Firstly, we use the deep double Q-Network to decouple the selection and calculation of the target Q value action to eliminate overestimation. The prioritized experience replay method is adopted to extract experience samples from the experience replay unit, increase the utilization rate of actual samples, and accelerate the training speed of the neural network. Then, the neural network is optimized by introducing a dueling network structure. Finally, the soft update method is used to improve the stability of the algorithm, and the dynamic ϵ-greedy method is used to find the optimal strategy. The experiments are first conducted in the Open AI Gym test platform to pre-validate the algorithm for two classical control problems: the Cart pole and Mountain Car problems. The impact of algorithm hyperparameters on the model performance is analyzed in detail. The algorithm is then validated in the Maze environment. The comparative analysis of simulation experiments shows that IPD3QN has a significant improvement in learning performance regarding convergence speed and convergence stability compared with DQN, D3QN, PD2QN, PDQN, PD3QN. Also, USV can plan the optimal path according to the actual navigation environment with the IPD3QN algorithm.


2021 ◽  
Vol 2021 ◽  
pp. 1-10
Author(s):  
Peng Li ◽  
Xiangcheng Ding ◽  
Hongfang Sun ◽  
Shiquan Zhao ◽  
Ricardo Cajo

Aiming at the problems of low success rate and slow learning speed of the DDPG algorithm in path planning of a mobile robot in a dynamic environment, an improved DDPG algorithm is designed. In this article, the RAdam algorithm is used to replace the neural network optimizer in DDPG, combined with the curiosity algorithm to improve the success rate and convergence speed. Based on the improved algorithm, priority experience replay is added, and transfer learning is introduced to improve the training effect. Through the ROS robot operating system and Gazebo simulation software, a dynamic simulation environment is established, and the improved DDPG algorithm and DDPG algorithm are compared. For the dynamic path planning task of the mobile robot, the simulation results show that the convergence speed of the improved DDPG algorithm is increased by 21%, and the success rate is increased to 90% compared with the original DDPG algorithm. It has a good effect on dynamic path planning for mobile robots with continuous action space.


MethodsX ◽  
2021 ◽  
pp. 101571
Author(s):  
Mikkel Leite Arnø ◽  
John-Morten Godhavn ◽  
Ole Morten Aamo
Keyword(s):  

2021 ◽  
Author(s):  
Dogan C. Cicek ◽  
Enes Duran ◽  
Baturay Saglam ◽  
Furkan B. Mutlu ◽  
Suleyman S. Kozat

2021 ◽  
Author(s):  
Kian Ahrabian ◽  
Yishi Xu ◽  
Yingxue Zhang ◽  
Jiapeng Wu ◽  
Yuening Wang ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document