Self-Adaptive Priority Correction for Prioritized Experience Replay

Deep Reinforcement Learning (DRL) is a promising approach for general artificial intelligence. However, most DRL methods suffer from the problem of data inefficiency. To alleviate this problem, DeepMind proposed Prioritized Experience Replay (PER). Though PER improves data utilization, the priorities of most samples in its Experience Memory (EM) are out of date, as only the priorities of a small part of the data are updated while the Q network parameters are updated. Consequently, the difference between storage and real priority distributions gradually increases, which will introduce bias into the gradients of Deep Q-Learning (DQL) and make the DQL update toward a non-ideal direction. In this work, we propose a novel self-adaptive priority correction algorithm named Importance-PER (Imp-PER) to fix the update deviation. Specifically, we predict the sum of real Temporal-Difference error (TD-error) of all data in EM. Data are corrected by an importance weight, which is estimated by the predicted sum and the real TD-error calculated by the latest agent. To control the unbounded importance weight, we use truncated importance sampling with a self-adaptive truncation threshold. The conducted experiments on various games of Atari 2600 with Double Deep Q-Network and MuJoCo with Deep Deterministic Policy Gradient demonstrate that Imp-PER improves the data utilization and final policy quality on discrete states and continuous states tasks without increasing the computational cost.

Download Full-text

Application of Self-adaptive Vision-Correction Algorithm for Water-Distribution Problem

KSCE Journal of Civil Engineering ◽

10.1007/s12205-021-2330-9 ◽

2021 ◽

Vol 25 (3) ◽

pp. 1106-1115

Author(s):

Eui Hoon Lee

Keyword(s):

Water Distribution ◽

Correction Algorithm ◽

Distribution Problem ◽

Vision Correction ◽

Self Adaptive

Download Full-text

A Stochastic Model for Block Segmentation of Images Based on the Quadtree and the Bayes Code for It

Entropy ◽

10.3390/e23080991 ◽

2021 ◽

Vol 23 (8) ◽

pp. 991

Author(s):

Yuta Nakahara ◽

Toshiyasu Matsushima

Keyword(s):

Computational Cost ◽

Block Size ◽

Input Image ◽

Generative Model ◽

Image Size ◽

Variable Block ◽

General Data ◽

The Difference ◽

Segmentation Of Images ◽

Target Data

In information theory, lossless compression of general data is based on an explicit assumption of a stochastic generative model on target data. However, in lossless image compression, researchers have mainly focused on the coding procedure that outputs the coded sequence from the input image, and the assumption of the stochastic generative model is implicit. In these studies, there is a difficulty in discussing the difference between the expected code length and the entropy of the stochastic generative model. We solve this difficulty for a class of images, in which they have non-stationarity among segments. In this paper, we propose a novel stochastic generative model of images by redefining the implicit stochastic generative model in a previous coding procedure. Our model is based on the quadtree so that it effectively represents the variable block size segmentation of images. Then, we construct the Bayes code optimal for the proposed stochastic generative model. It requires the summation of all possible quadtrees weighted by their posterior. In general, its computational cost increases exponentially for the image size. However, we introduce an efficient algorithm to calculate it in the polynomial order of the image size without loss of optimality. As a result, the derived algorithm has a better average coding rate than that of JBIG.

Download Full-text

Deep Deterministic Policy Gradient Based on Double Network Prioritized Experience Replay

IEEE Access ◽

10.1109/access.2021.3074535 ◽

2021 ◽

pp. 1-1

Author(s):

Chaohai Kang ◽

Chuiting Rong ◽

Weijian Ren ◽

Fengcai Huo ◽

Pengyun Liu

Keyword(s):

Double Network ◽

Policy Gradient ◽

Experience Replay ◽

Gradient Based

Download Full-text

Discrete sampling theorem to Shannon’s sampling theorem using the hyperreal numbers R

Revista de Matemática Teoría y Aplicaciones ◽

10.15517/rmta.v28i2.43356 ◽

2021 ◽

Vol 28 (2) ◽

pp. 163-182

Author(s):

José L. Simancas-García ◽

Kemel George-González

Keyword(s):

Computational Cost ◽

Number System ◽

Sampling Theorem ◽

Discrete Sampling ◽

Sampled Signal ◽

Speed Up ◽

Shannon’S Sampling Theorem ◽

The Difference ◽

Band Limited ◽

Hyperreal Numbers

Shannon’s sampling theorem is one of the most important results of modern signal theory. It describes the reconstruction of any band-limited signal from a finite number of its samples. On the other hand, although less well known, there is the discrete sampling theorem, proved by Cooley while he was working on the development of an algorithm to speed up the calculations of the discrete Fourier transform. Cooley showed that a sampled signal can be resampled by selecting a smaller number of samples, which reduces computational cost. Then it is possible to reconstruct the original sampled signal using a reverse process. In principle, the two theorems are not related. However, in this paper we will show that in the context of Non Standard Mathematical Analysis (NSA) and Hyperreal Numerical System R, the two theorems are equivalent. The difference between them becomes a matter of scale. With the scale changes that the hyperreal number system allows, the discrete variables and functions become continuous, and Shannon’s sampling theorem emerges from the discrete sampling theorem.

Download Full-text

Self-Supervised Mixture-of-Experts by Uncertainty Estimation

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33015933 ◽

2019 ◽

Vol 33 ◽

pp. 5933-5940

Author(s):

Zhuobin Zheng ◽

Chun Yuan ◽

Xinrui Zhu ◽

Zhihui Lin ◽

Yangyang Cheng ◽

...

Keyword(s):

Common Knowledge ◽

Uncertainty Estimation ◽

Complex Environments ◽

Mixture Of Experts ◽

Predictive Uncertainty ◽

Value Evaluation ◽

Policy Gradient ◽

Experience Replay ◽

Gating Network ◽

Efficient Transfer

Learning related tasks in various domains and transferring exploited knowledge to new situations is a significant challenge in Reinforcement Learning (RL). However, most RL algorithms are data inefficient and fail to generalize in complex environments, limiting their adaptability and applicability in multi-task scenarios. In this paper, we propose SelfSupervised Mixture-of-Experts (SUM), an effective algorithm driven by predictive uncertainty estimation for multitask RL. SUM utilizes a multi-head agent with shared parameters as experts to learn a series of related tasks simultaneously by Deep Deterministic Policy Gradient (DDPG). Each expert is extended by predictive uncertainty estimation on known and unknown states to enhance the Q-value evaluation capacity against overfitting and the overall generalization ability. These enable the agent to capture and diffuse the common knowledge across different tasks improving sample efficiency in each task and the effectiveness of expert scheduling across multiple tasks. Instead of task-specific design as common MoEs, a self-supervised gating network is adopted to determine a potential expert to handle each interaction from unseen environments and calibrated completely by the uncertainty feedback from the experts without explicit supervision. To alleviate the imbalanced expert utilization as the crux of MoE, optimization is accomplished via decayedmasked experience replay, which encourages both diversification and specialization of experts during different periods. We demonstrate that our approach learns faster and achieves better performance by efficient transfer and robust generalization, outperforming several related methods on extended OpenAI Gym’s MuJoCo multi-task environments.

Download Full-text

DEVELOPMENT OF VERSATILE MOBILE MAPPING SYSTEM ON A SMALL SCALE

ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences ◽

10.5194/isprs-archives-xliii-b1-2020-271-2020 ◽

2020 ◽

Vol XLIII-B1-2020 ◽

pp. 271-275

Author(s):

T. Tachi ◽

Y. Wang ◽

R. Abe ◽

T. Kato ◽

N. Maebashi ◽

...

Keyword(s):

Small Scale ◽

Road Maintenance ◽

Data Utilization ◽

Mobile Mapping ◽

The Road ◽

Map Generation ◽

Mapping Technology ◽

Light Vehicle ◽

The Difference ◽

Asset Inventory

Abstract. Mobile mapping technology is an effective method to collect geospatial data with high point density and accuracy. It is mainly used for asset inventory and map generation, as well as road maintenance (detecting road cracks and ruts, and measuring flatness). Equipment of former mobile mapping systems (MMS) is large in size and usually installed (hard-mounted) onto dedicated vehicle. Cost-effectiveness and flexibility of MMS have not been regarded as important until Leica Pegasus series, a much smaller system with integrated and configurable components, come out. In this paper, we show you how we realize a versatile MMS with a Pegasus II loaded on a remodelled Japanese light vehicle (small size and less than a cubic capacity of 660 cc). Besides Pegasus II and data-processing PC, we equip this system with a small crane to bring the sensor onto a different platform, an electric cart to survey narrow roads or pedestrian walkway, and a boat attachment so that the sensor can be fixed on a boat. Thus, one Pegasus II can collect data from various platforms. This paper also discusses the precision and accuracy of the Pegasus II working on various platforms. When mounted on a light vehicle, we verified the accuracy of the difference with GCP and evaluated the accuracy of the road maintenance (detecting road cracks and ruts, and measuring flatness). When mounted on an electric cart, we verified the accuracy of the difference with GCP on a pedestrian road and generated road hazard map as a data utilization. When mounted on a boat, we verified the accuracy of the difference with GCP on a dam slope and created slope shading map of landslide area as a data utilization. It turns out that Pegasus II can totally achieve to required surveying-grade.

Download Full-text

Markerless Image Alignment Method for Pressure-Sensitive Paint Image

Sensors ◽

10.3390/s22020453 ◽

2022 ◽

Vol 22 (2) ◽

pp. 453

Author(s):

Kyosuke Suzuki ◽

Tomoki Inoue ◽

Takayuki Nagata ◽

Miku Kasai ◽

Taku Nonomura ◽

...

Keyword(s):

Computational Cost ◽

Measurement Data ◽

Image Alignment ◽

Feature Points ◽

Alignment Method ◽

Pressure Sensitive Paint ◽

Pressure Sensitive ◽

Difference Of Gaussian ◽

Tracing Algorithm ◽

The Difference

We propose a markerless image alignment method for pressure-sensitive paint measurement data replacing the time-consuming conventional alignment method in which the black markers are placed on the model and are detected manually. In the proposed method, feature points are detected by a boundary detection method, in which the PSP boundary is detected using the Moore-Neighbor tracing algorithm. The performance of the proposed method is compared with the conventional method based on black markers, the difference of Gaussian (DoG) detector, and the Hessian corner detector. The results by the proposed method and the DoG detector are equivalent to each other. On the other hand, the performances of the image alignment using the black marker and the Hessian corner detector are slightly worse compared with the DoG and the proposed method. The computational cost of the proposed method is half of that of the DoG method. The proposed method is a promising for the image alignment in the PSP application in the viewpoint of the alignment precision and computational cost.

Download Full-text

Deep Reinforcement Learning Based Left-Turn Connected and Automated Vehicle Control at Signalized Intersection in Vehicle-to-Infrastructure Environment

Information ◽

10.3390/info11020077 ◽

2020 ◽

Vol 11 (2) ◽

pp. 77 ◽

Cited By ~ 1

Author(s):

Juan Chen ◽

Zhengxuan Xue ◽

Daiqian Fan

Keyword(s):

Reinforcement Learning ◽

Control Method ◽

Signalized Intersection ◽

Signal Control ◽

Left Turn ◽

Automated Vehicle ◽

Whole Process ◽

Policy Gradient ◽

Experience Replay ◽

Automated Vehicle Control

In order to solve the problem of vehicle delay caused by stops at signalized intersections, a micro-control method of a left-turning connected and automated vehicle (CAV) based on an improved deep deterministic policy gradient (DDPG) is designed in this paper. In this paper, the micro-control of the whole process of a left-turn vehicle approaching, entering, and leaving a signalized intersection is considered. In addition, in order to solve the problems of low sampling efficiency and overestimation of the critic network of the DDPG algorithm, a positive and negative reward experience replay buffer sampling mechanism and multi-critic network structure are adopted in the DDPG algorithm in this paper. Finally, the effectiveness of the signal control method, six DDPG-based methods (DDPG, PNRERB-1C-DDPG, PNRERB-3C-DDPG, PNRERB-5C-DDPG, PNRERB-5CNG-DDPG, and PNRERB-7C-DDPG), and four DQN-based methods (DQN, Dueling DQN, Double DQN, and Prioritized Replay DQN) are verified under 0.2, 0.5, and 0.7 saturation degrees of left-turning vehicles at a signalized intersection within a VISSIM simulation environment. The results show that the proposed deep reinforcement learning method can get a number of stops benefits ranging from 5% to 94%, stop time benefits ranging from 1% to 99%, and delay benefits ranging from −17% to 93%, respectively compared with the traditional signal control method.

Download Full-text

Path planning for intelligent robots based on deep Q-learning with experience replay and heuristic knowledge

IEEE/CAA Journal of Automatica Sinica ◽

10.1109/jas.2019.1911732 ◽

2020 ◽

Vol 7 (4) ◽

pp. 1179-1189 ◽

Cited By ~ 3

Author(s):

Lan Jiang ◽

Hongyun Huang ◽

Zuohua Ding

Keyword(s):

Path Planning ◽

Heuristic Knowledge ◽

Q Learning ◽

Intelligent Robots ◽

Experience Replay

Download Full-text

Development of Tool Path Correction Algorithm in Incremental Sheet Forming

Key Engineering Materials ◽

10.4028/www.scientific.net/kem.622-623.382 ◽

2014 ◽

Vol 622-623 ◽

pp. 382-389 ◽

Cited By ~ 9

Author(s):

Antonio Fiorentino ◽

G.C. Feriti ◽

Elisabetta Ceretti ◽

C . Giardini ◽

C.M.G. Bort ◽

...

Keyword(s):

Tool Path ◽

Error Distribution ◽

Sheet Forming ◽

Incremental Sheet Forming ◽

Correction Algorithm ◽

Forming Process ◽

Part Geometry ◽

Toolpath Planning ◽

The Difference ◽

Tool Path Correction

The problem of obtaining sound parts by Incremental Sheet Forming is still a relevant issue, despite the numerous efforts spent in improving the toolpath planning of the deforming punch in order to compensate for the dimensional and geometrical part errors related to springback and punch movement. Usually, the toolpath generation strategy takes into account the variation of the toolpath itself for obtaining the desired final part with reduced geometrical errors. In the present paper, a correction algorithm is used to iteratively correct the part geometry on the basis of the measured parts and on the calculation of the error defined as the difference between the actual and the nominal part geometries. In practice, the part geometry is used to generate a first trial toolpath, and the form error distribution of the resulting part is used for modifying the nominal part geometry and, then, generating a new, improved toolpath. This procedure gets iterated until the error distribution becomes less than a specified value, corresponding to the desired part tolerance. The correction algorithm was implemented in software and used with the results of FEM simulations. In particular, with few iterations it was possible to reduce the geometrical error to less than 0.4 mm in the Incremental Sheet Forming process of an Al asymmetric part, with a resulting accuracy good enough for both prototyping and production processes.

Download Full-text