An Incremental Fast Policy Search Using a Single Sample Path

Author(s):  
Ajin George Joseph ◽  
Shalabh Bhatnagar
2008 ◽  
Vol 3 (1) ◽  
pp. 21-26 ◽  
Author(s):  
Hai-Tao Fang ◽  
Han-Fu Chen ◽  
Xi-Ren Cao

1996 ◽  
Vol 41 (12) ◽  
pp. 1814-1817 ◽  
Author(s):  
Xi-Ren Cao ◽  
Xue-Ming Yuan ◽  
Li Qiu

1998 ◽  
Vol 30 (3) ◽  
pp. 676-692 ◽  
Author(s):  
Xi-Ren Cao

We derive formulas for the first- and higher-order derivatives of the steady state performance measures for changes in transition matrices of irreducible and aperiodic Markov chains. Using these formulas, we obtain a Maclaurin series for the performance measures of such Markov chains. The convergence range of the Maclaurin series can be determined. We show that the derivatives and the coefficients of the Maclaurin series can be easily estimated by analysing a single sample path of the Markov chain. Algorithms for estimating these quantities are provided. Markov chains consisting of transient states and multiple chains are also studied. The results can be easily extended to Markov processes. The derivation of the results is closely related to some fundamental concepts, such as group inverse, potentials, and realization factors in perturbation analysis. Simulation results are provided to illustrate the accuracy of the single sample path based estimation. Possible applications to engineering problems are discussed.


1988 ◽  
Vol 20 (1) ◽  
pp. 59-78 ◽  
Author(s):  
Reuven Y. Rubinstein ◽  
Ferenc Szidarovszky

Generalized perturbation analysis (PA) estimates to study sensitivity of performance measures of discrete events dynamic systems for discontinuous sample functions are introduced. Their convergence conditions and rate of convergence are given. It is shown that the PA estimates based on a single sample path always converge faster to the unknown sensitivity parameter (vector of parameters) than their counterpart—crude Monte Carlo ones.


Sign in / Sign up

Export Citation Format

Share Document