Independence-aware Advantage Estimation

Most of existing advantage function estimation methods in reinforcement learning suffer from the problem of high variance, which scales unfavorably with the time horizon. To address this challenge, we propose to identify the independence property between current action and future states in environments, which can be further leveraged to effectively reduce the variance of the advantage estimation. In particular, the recognized independence property can be naturally utilized to construct a novel importance sampling advantage estimator with close-to-zero variance even when the Monte-Carlo return signal yields a large variance. To further remove the risk of the high variance introduced by the new estimator, we combine it with existing Monte-Carlo estimator via a reward decomposition model learned by minimizing the estimation variance. Experiments demonstrate that our method achieves higher sample efficiency compared with existing advantage estimation methods in complex environments.

Download Full-text

Importance sampling in reinforcement learning with an estimated behavior policy

Machine Learning ◽

10.1007/s10994-020-05938-9 ◽

2021 ◽

Author(s):

Josiah P. Hanna ◽

Scott Niekum ◽

Peter Stone

Keyword(s):

Monte Carlo ◽

Reinforcement Learning ◽

Importance Sampling ◽

Sampling Methods ◽

Sampling Error ◽

Monte Carlo Sampling ◽

Policy Action ◽

General Technique ◽

Gradient Algorithms ◽

Policy Gradient

AbstractIn reinforcement learning, importance sampling is a widely used method for evaluating an expectation under the distribution of data of one policy when the data has in fact been generated by a different policy. Importance sampling requires computing the likelihood ratio between the action probabilities of a target policy and those of the data-producing behavior policy. In this article, we study importance sampling where the behavior policy action probabilities are replaced by their maximum likelihood estimate of these probabilities under the observed data. We show this general technique reduces variance due to sampling error in Monte Carlo style estimators. We introduce two novel estimators that use this technique to estimate expected values that arise in the RL literature. We find that these general estimators reduce the variance of Monte Carlo sampling methods, leading to faster learning for policy gradient algorithms and more accurate off-policy policy evaluation. We also provide theoretical analysis showing that our new estimators are consistent and have asymptotically lower variance than Monte Carlo estimators.

Download Full-text

Drone Ground Impact Footprints with Importance Sampling: Estimation and Sensitivity Analysis

Applied Sciences ◽

10.3390/app11093871 ◽

2021 ◽

Vol 11 (9) ◽

pp. 3871

Author(s):

Jérôme Morio ◽

Baptiste Levasseur ◽

Sylvain Bertrand

Keyword(s):

Sensitivity Analysis ◽

Monte Carlo ◽

Importance Sampling ◽

Loss Of Control ◽

Minimum Volume ◽

Multiple Importance Sampling ◽

Impact Position ◽

Engine Failure ◽

Main Engine ◽

Ground Impact

This paper addresses the estimation of accurate extreme ground impact footprints and probabilistic maps due to a total loss of control of fixed-wing unmanned aerial vehicles after a main engine failure. In this paper, we focus on the ground impact footprints that contains 95%, 99% and 99.9% of the drone impacts. These regions are defined here with density minimum volume sets and may be estimated by Monte Carlo methods. As Monte Carlo approaches lead to an underestimation of extreme ground impact footprints, we consider in this article multiple importance sampling to evaluate them. Then, we perform a reliability oriented sensitivity analysis, to estimate the most influential uncertain parameters on the ground impact position. We show the results of these estimations on a realistic drone flight scenario.

Download Full-text

Aircraft Maintenance Check Scheduling Using Reinforcement Learning

Aerospace ◽

10.3390/aerospace8040113 ◽

2021 ◽

Vol 8 (4) ◽

pp. 113

Author(s):

Pedro Andrade ◽

Catarina Silva ◽

Bernardete Ribeiro ◽

Bruno F. Santos

Keyword(s):

Reinforcement Learning ◽

Time Horizon ◽

Learning Algorithm ◽

Initial Conditions ◽

Q Learning ◽

Scheduling Policy ◽

Real Scenario ◽

Maintenance Plan ◽

Small Disturbances

This paper presents a Reinforcement Learning (RL) approach to optimize the long-term scheduling of maintenance for an aircraft fleet. The problem considers fleet status, maintenance capacity, and other maintenance constraints to schedule hangar checks for a specified time horizon. The checks are scheduled within an interval, and the goal is to, schedule them as close as possible to their due date. In doing so, the number of checks is reduced, and the fleet availability increases. A Deep Q-learning algorithm is used to optimize the scheduling policy. The model is validated in a real scenario using maintenance data from 45 aircraft. The maintenance plan that is generated with our approach is compared with a previous study, which presented a Dynamic Programming (DP) based approach and airline estimations for the same period. The results show a reduction in the number of checks scheduled, which indicates the potential of RL in solving this problem. The adaptability of RL is also tested by introducing small disturbances in the initial conditions. After training the model with these simulated scenarios, the results show the robustness of the RL approach and its ability to generate efficient maintenance plans in only a few seconds.

Download Full-text

Multi-Level Monte Carlo Simulations with Importance Sampling

SSRN Electronic Journal ◽

10.2139/ssrn.2273215 ◽

2013 ◽

Author(s):

Przemyslaw Stan Stilger ◽

Ser-Huang Poon

Keyword(s):

Monte Carlo ◽

Monte Carlo Simulations ◽

Importance Sampling ◽

Multi Level

Download Full-text

Reliability Assessment of Power System Using Importance Sampling Technique Based on Layer Optimization Simulation

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.88-89.554 ◽

2011 ◽

Vol 88-89 ◽

pp. 554-558 ◽

Cited By ~ 1

Author(s):

Bin Wang

Keyword(s):

Monte Carlo Simulation ◽

Monte Carlo ◽

Power System ◽

Importance Sampling ◽

System Reliability ◽

Sampling Method ◽

Sampling Technique ◽

Test System ◽

Importance Sampling Method ◽

Importance Sampling Technique

An improved importance sampling method with layer simulation optimization is presented in this paper. Through the solution sequence of the components’ optimum biased factors according to their importance degree to system reliability, the presented technique can further accelerate the convergence speed of the Monte-Carlo simulation. The idea is that the multivariate distribution’ optimization of components in power system is transferred to many steps’ optimization based on importance sampling method with different optimum biased factors. The practice is that the components are layered according to their importance degree to the system reliability before the Monte-Carlo simulation, the more forward, the more important, and the optimum biased factors of components in the latest layer is searched while the importance sampling is carried out until the demanded accuracy is reached. The validity of the presented is verified using the IEEE-RTS79 test system.

Download Full-text

Reinforcement Learning-Aided Markov Chain Monte Carlo For Lattice Gaussian Sampling

10.1109/itw48936.2021.9611412 ◽

2021 ◽

Author(s):

Zheng Wang ◽

Yili Xia ◽

Shanxiang Lyu ◽

Cong Ling

Keyword(s):

Monte Carlo ◽

Markov Chain ◽

Reinforcement Learning ◽

Markov Chain Monte Carlo ◽

Gaussian Sampling

Download Full-text

An importance sampling-based smoothing approach for quasi-Monte Carlo simulation of discrete barrier options

European Journal of Operational Research ◽

10.1016/j.ejor.2018.10.030 ◽

2019 ◽

Vol 274 (2) ◽

pp. 759-772 ◽

Cited By ~ 4

Author(s):

Fei Xie ◽

Zhijian He ◽

Xiaoqun Wang

Keyword(s):

Monte Carlo Simulation ◽

Monte Carlo ◽

Importance Sampling ◽

Barrier Options ◽

Quasi Monte Carlo

Download Full-text

A strategy learning model for autonomous agents based on classification

International Journal of Applied Mathematics and Computer Science ◽

10.1515/amcs-2015-0035 ◽

2015 ◽

Vol 25 (3) ◽

pp. 471-482 ◽

Cited By ~ 7

Author(s):

Bartłomiej Śnieżyński

Keyword(s):

Reinforcement Learning ◽

Supervised Learning ◽

Learning Process ◽

Autonomous Agents ◽

Good Alternative ◽

Learning Model ◽

Learning Method ◽

Complex Environments ◽

Agent Based ◽

Proposed Model

AbstractIn this paper we propose a strategy learning model for autonomous agents based on classification. In the literature, the most commonly used learning method in agent-based systems is reinforcement learning. In our opinion, classification can be considered a good alternative. This type of supervised learning can be used to generate a classifier that allows the agent to choose an appropriate action for execution. Experimental results show that this model can be successfully applied for strategy generation even if rewards are delayed. We compare the efficiency of the proposed model and reinforcement learning using the farmer-pest domain and configurations of various complexity. In complex environments, supervised learning can improve the performance of agents much faster that reinforcement learning. If an appropriate knowledge representation is used, the learned knowledge may be analyzed by humans, which allows tracking the learning process

Download Full-text