scholarly journals Self-Adaptive Priority Correction for Prioritized Experience Replay

2020 ◽  
Vol 10 (19) ◽  
pp. 6925 ◽  
Author(s):  
Hongjie Zhang ◽  
Cheng Qu ◽  
Jindou Zhang ◽  
Jing Li

Deep Reinforcement Learning (DRL) is a promising approach for general artificial intelligence. However, most DRL methods suffer from the problem of data inefficiency. To alleviate this problem, DeepMind proposed Prioritized Experience Replay (PER). Though PER improves data utilization, the priorities of most samples in its Experience Memory (EM) are out of date, as only the priorities of a small part of the data are updated while the Q network parameters are updated. Consequently, the difference between storage and real priority distributions gradually increases, which will introduce bias into the gradients of Deep Q-Learning (DQL) and make the DQL update toward a non-ideal direction. In this work, we propose a novel self-adaptive priority correction algorithm named Importance-PER (Imp-PER) to fix the update deviation. Specifically, we predict the sum of real Temporal-Difference error (TD-error) of all data in EM. Data are corrected by an importance weight, which is estimated by the predicted sum and the real TD-error calculated by the latest agent. To control the unbounded importance weight, we use truncated importance sampling with a self-adaptive truncation threshold. The conducted experiments on various games of Atari 2600 with Double Deep Q-Network and MuJoCo with Deep Deterministic Policy Gradient demonstrate that Imp-PER improves the data utilization and final policy quality on discrete states and continuous states tasks without increasing the computational cost.

Entropy ◽  
2021 ◽  
Vol 23 (8) ◽  
pp. 991
Author(s):  
Yuta Nakahara ◽  
Toshiyasu Matsushima

In information theory, lossless compression of general data is based on an explicit assumption of a stochastic generative model on target data. However, in lossless image compression, researchers have mainly focused on the coding procedure that outputs the coded sequence from the input image, and the assumption of the stochastic generative model is implicit. In these studies, there is a difficulty in discussing the difference between the expected code length and the entropy of the stochastic generative model. We solve this difficulty for a class of images, in which they have non-stationarity among segments. In this paper, we propose a novel stochastic generative model of images by redefining the implicit stochastic generative model in a previous coding procedure. Our model is based on the quadtree so that it effectively represents the variable block size segmentation of images. Then, we construct the Bayes code optimal for the proposed stochastic generative model. It requires the summation of all possible quadtrees weighted by their posterior. In general, its computational cost increases exponentially for the image size. However, we introduce an efficient algorithm to calculate it in the polynomial order of the image size without loss of optimality. As a result, the derived algorithm has a better average coding rate than that of JBIG.


IEEE Access ◽  
2021 ◽  
pp. 1-1
Author(s):  
Chaohai Kang ◽  
Chuiting Rong ◽  
Weijian Ren ◽  
Fengcai Huo ◽  
Pengyun Liu

2021 ◽  
Vol 28 (2) ◽  
pp. 163-182
Author(s):  
José L. Simancas-García ◽  
Kemel George-González

Shannon’s sampling theorem is one of the most important results of modern signal theory. It describes the reconstruction of any band-limited signal from a finite number of its samples. On the other hand, although less well known, there is the discrete sampling theorem, proved by Cooley while he was working on the development of an algorithm to speed up the calculations of the discrete Fourier transform. Cooley showed that a sampled signal can be resampled by selecting a smaller number of samples, which reduces computational cost. Then it is possible to reconstruct the original sampled signal using a reverse process. In principle, the two theorems are not related. However, in this paper we will show that in the context of Non Standard Mathematical Analysis (NSA) and Hyperreal Numerical System R, the two theorems are equivalent. The difference between them becomes a matter of scale. With the scale changes that the hyperreal number system allows, the discrete variables and functions become continuous, and Shannon’s sampling theorem emerges from the discrete sampling theorem.


Author(s):  
Zhuobin Zheng ◽  
Chun Yuan ◽  
Xinrui Zhu ◽  
Zhihui Lin ◽  
Yangyang Cheng ◽  
...  

Learning related tasks in various domains and transferring exploited knowledge to new situations is a significant challenge in Reinforcement Learning (RL). However, most RL algorithms are data inefficient and fail to generalize in complex environments, limiting their adaptability and applicability in multi-task scenarios. In this paper, we propose SelfSupervised Mixture-of-Experts (SUM), an effective algorithm driven by predictive uncertainty estimation for multitask RL. SUM utilizes a multi-head agent with shared parameters as experts to learn a series of related tasks simultaneously by Deep Deterministic Policy Gradient (DDPG). Each expert is extended by predictive uncertainty estimation on known and unknown states to enhance the Q-value evaluation capacity against overfitting and the overall generalization ability. These enable the agent to capture and diffuse the common knowledge across different tasks improving sample efficiency in each task and the effectiveness of expert scheduling across multiple tasks. Instead of task-specific design as common MoEs, a self-supervised gating network is adopted to determine a potential expert to handle each interaction from unseen environments and calibrated completely by the uncertainty feedback from the experts without explicit supervision. To alleviate the imbalanced expert utilization as the crux of MoE, optimization is accomplished via decayedmasked experience replay, which encourages both diversification and specialization of experts during different periods. We demonstrate that our approach learns faster and achieves better performance by efficient transfer and robust generalization, outperforming several related methods on extended OpenAI Gym’s MuJoCo multi-task environments.


Author(s):  
T. Tachi ◽  
Y. Wang ◽  
R. Abe ◽  
T. Kato ◽  
N. Maebashi ◽  
...  

Abstract. Mobile mapping technology is an effective method to collect geospatial data with high point density and accuracy. It is mainly used for asset inventory and map generation, as well as road maintenance (detecting road cracks and ruts, and measuring flatness). Equipment of former mobile mapping systems (MMS) is large in size and usually installed (hard-mounted) onto dedicated vehicle. Cost-effectiveness and flexibility of MMS have not been regarded as important until Leica Pegasus series, a much smaller system with integrated and configurable components, come out. In this paper, we show you how we realize a versatile MMS with a Pegasus II loaded on a remodelled Japanese light vehicle (small size and less than a cubic capacity of 660 cc). Besides Pegasus II and data-processing PC, we equip this system with a small crane to bring the sensor onto a different platform, an electric cart to survey narrow roads or pedestrian walkway, and a boat attachment so that the sensor can be fixed on a boat. Thus, one Pegasus II can collect data from various platforms. This paper also discusses the precision and accuracy of the Pegasus II working on various platforms. When mounted on a light vehicle, we verified the accuracy of the difference with GCP and evaluated the accuracy of the road maintenance (detecting road cracks and ruts, and measuring flatness). When mounted on an electric cart, we verified the accuracy of the difference with GCP on a pedestrian road and generated road hazard map as a data utilization. When mounted on a boat, we verified the accuracy of the difference with GCP on a dam slope and created slope shading map of landslide area as a data utilization. It turns out that Pegasus II can totally achieve to required surveying-grade.


Sensors ◽  
2022 ◽  
Vol 22 (2) ◽  
pp. 453
Author(s):  
Kyosuke Suzuki ◽  
Tomoki Inoue ◽  
Takayuki Nagata ◽  
Miku Kasai ◽  
Taku Nonomura ◽  
...  

We propose a markerless image alignment method for pressure-sensitive paint measurement data replacing the time-consuming conventional alignment method in which the black markers are placed on the model and are detected manually. In the proposed method, feature points are detected by a boundary detection method, in which the PSP boundary is detected using the Moore-Neighbor tracing algorithm. The performance of the proposed method is compared with the conventional method based on black markers, the difference of Gaussian (DoG) detector, and the Hessian corner detector. The results by the proposed method and the DoG detector are equivalent to each other. On the other hand, the performances of the image alignment using the black marker and the Hessian corner detector are slightly worse compared with the DoG and the proposed method. The computational cost of the proposed method is half of that of the DoG method. The proposed method is a promising for the image alignment in the PSP application in the viewpoint of the alignment precision and computational cost.


Information ◽  
2020 ◽  
Vol 11 (2) ◽  
pp. 77 ◽  
Author(s):  
Juan Chen ◽  
Zhengxuan Xue ◽  
Daiqian Fan

In order to solve the problem of vehicle delay caused by stops at signalized intersections, a micro-control method of a left-turning connected and automated vehicle (CAV) based on an improved deep deterministic policy gradient (DDPG) is designed in this paper. In this paper, the micro-control of the whole process of a left-turn vehicle approaching, entering, and leaving a signalized intersection is considered. In addition, in order to solve the problems of low sampling efficiency and overestimation of the critic network of the DDPG algorithm, a positive and negative reward experience replay buffer sampling mechanism and multi-critic network structure are adopted in the DDPG algorithm in this paper. Finally, the effectiveness of the signal control method, six DDPG-based methods (DDPG, PNRERB-1C-DDPG, PNRERB-3C-DDPG, PNRERB-5C-DDPG, PNRERB-5CNG-DDPG, and PNRERB-7C-DDPG), and four DQN-based methods (DQN, Dueling DQN, Double DQN, and Prioritized Replay DQN) are verified under 0.2, 0.5, and 0.7 saturation degrees of left-turning vehicles at a signalized intersection within a VISSIM simulation environment. The results show that the proposed deep reinforcement learning method can get a number of stops benefits ranging from 5% to 94%, stop time benefits ranging from 1% to 99%, and delay benefits ranging from −17% to 93%, respectively compared with the traditional signal control method.


2014 ◽  
Vol 622-623 ◽  
pp. 382-389 ◽  
Author(s):  
Antonio Fiorentino ◽  
G.C. Feriti ◽  
Elisabetta Ceretti ◽  
C . Giardini ◽  
C.M.G. Bort ◽  
...  

The problem of obtaining sound parts by Incremental Sheet Forming is still a relevant issue, despite the numerous efforts spent in improving the toolpath planning of the deforming punch in order to compensate for the dimensional and geometrical part errors related to springback and punch movement. Usually, the toolpath generation strategy takes into account the variation of the toolpath itself for obtaining the desired final part with reduced geometrical errors. In the present paper, a correction algorithm is used to iteratively correct the part geometry on the basis of the measured parts and on the calculation of the error defined as the difference between the actual and the nominal part geometries. In practice, the part geometry is used to generate a first trial toolpath, and the form error distribution of the resulting part is used for modifying the nominal part geometry and, then, generating a new, improved toolpath. This procedure gets iterated until the error distribution becomes less than a specified value, corresponding to the desired part tolerance. The correction algorithm was implemented in software and used with the results of FEM simulations. In particular, with few iterations it was possible to reduce the geometrical error to less than 0.4 mm in the Incremental Sheet Forming process of an Al asymmetric part, with a resulting accuracy good enough for both prototyping and production processes.


Sign in / Sign up

Export Citation Format

Share Document