An Incremental Fast Policy Search Using a Single Sample Path

An incremental off-policy search in a model-free Markov decision process using a single sample path

Machine Learning ◽

10.1007/s10994-018-5697-1 ◽

2018 ◽

Vol 107 (6) ◽

pp. 969-1011

Author(s):

Ajin George Joseph ◽

Shalabh Bhatnagar

Keyword(s):

Markov Decision Process ◽

Decision Process ◽

Sample Path ◽

Single Sample ◽

Policy Search ◽

Model Free ◽

Markov Decision

Download Full-text

Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path

Machine Learning ◽

10.1007/s10994-007-5038-2 ◽

2007 ◽

Vol 71 (1) ◽

pp. 89-129 ◽

Cited By ~ 58

Author(s):

András Antos ◽

Csaba Szepesvári ◽

Rémi Munos

Keyword(s):

Sample Path ◽

Policy Iteration ◽

Single Sample ◽

Optimal Policies ◽

Residual Minimization

Download Full-text

Recursive Approaches for Single Sample Path Based Markov Reward Processes

Asian Journal of Control ◽

10.1111/j.1934-6093.2001.tb00038.x ◽

2008 ◽

Vol 3 (1) ◽

pp. 21-26 ◽

Cited By ~ 4

Author(s):

Hai-Tao Fang ◽

Han-Fu Chen ◽

Xi-Ren Cao

Keyword(s):

Sample Path ◽

Single Sample ◽

Markov Reward

Download Full-text

A single sample path-based performance sensitivity formula for Markov chains

IEEE Transactions on Automatic Control ◽

10.1109/9.545747 ◽

1996 ◽

Vol 41 (12) ◽

pp. 1814-1817 ◽

Cited By ~ 21

Author(s):

Xi-Ren Cao ◽

Xue-Ming Yuan ◽

Li Qiu

Keyword(s):

Markov Chains ◽

Sample Path ◽

Single Sample ◽

Performance Sensitivity

Download Full-text

The Maclaurin series for performance functions of Markov chains

Advances in Applied Probability ◽

10.1239/aap/1035228123 ◽

1998 ◽

Vol 30 (3) ◽

pp. 676-692 ◽

Cited By ~ 19

Author(s):

Xi-Ren Cao

Keyword(s):

Markov Chains ◽

Performance Measures ◽

Sample Path ◽

Single Sample ◽

Maclaurin Series ◽

Transition Matrices ◽

Engineering Problems ◽

Realization Factors ◽

Derivatives Of ◽

Steady State Performance

We derive formulas for the first- and higher-order derivatives of the steady state performance measures for changes in transition matrices of irreducible and aperiodic Markov chains. Using these formulas, we obtain a Maclaurin series for the performance measures of such Markov chains. The convergence range of the Maclaurin series can be determined. We show that the derivatives and the coefficients of the Maclaurin series can be easily estimated by analysing a single sample path of the Markov chain. Algorithms for estimating these quantities are provided. Markov chains consisting of transient states and multiple chains are also studied. The results can be easily extended to Markov processes. The derivation of the results is closely related to some fundamental concepts, such as group inverse, potentials, and realization factors in perturbation analysis. Simulation results are provided to illustrate the accuracy of the single sample path based estimation. Possible applications to engineering problems are discussed.

Download Full-text

Single sample path based optimization of Markov systems: examples and algorithms

Proceedings of the 36th IEEE Conference on Decision and Control ◽

10.1109/cdc.1997.650711 ◽

2002 ◽

Author(s):

Xi-Ren Cao

Keyword(s):

Sample Path ◽

Single Sample ◽

Markov Systems

Download Full-text

Convergence of perturbation analysis estimates for discontinuous sample functions: a general approach

Advances in Applied Probability ◽

10.2307/1427270 ◽

1988 ◽

Vol 20 (1) ◽

pp. 59-78 ◽

Cited By ~ 3

Author(s):

Reuven Y. Rubinstein ◽

Ferenc Szidarovszky

Keyword(s):

Monte Carlo ◽

Rate Of Convergence ◽

Performance Measures ◽

Dynamic Systems ◽

Perturbation Analysis ◽

Sample Path ◽

Single Sample ◽

Parameter Vector ◽

Discrete Events ◽

Convergence Conditions

Generalized perturbation analysis (PA) estimates to study sensitivity of performance measures of discrete events dynamic systems for discontinuous sample functions are introduced. Their convergence conditions and rate of convergence are given. It is shown that the PA estimates based on a single sample path always converge faster to the unknown sensitivity parameter (vector of parameters) than their counterpart—crude Monte Carlo ones.

Download Full-text