Learning control of finite Markov chains with an explicit trade-off between estimation and control

1988 ◽  
Vol 18 (5) ◽  
pp. 677-684 ◽  
Author(s):  
M. Sato ◽  
K. Abe ◽  
H. Takeda
Author(s):  
A.S. Poznyak ◽  
Kaddour Najim ◽  
E. Gomez-Ramirez

1974 ◽  
Vol 6 (1) ◽  
pp. 40-60 ◽  
Author(s):  
P. Mandl

We consider a finite controlled Markov chain, the description of which depends on an unknown parameter a, and investigate the following control policy. To each a an optimal stationary control is associated. a is estimated recurrently from the trajectory by the minimum contrast method, and the optimal stationary control corresponding to the estimate is used. We present asymptotic properties of the estimate and of the criterion function. They follow from the law of large numbers and from the central limit theorem for controlled Markov chains derived with the aid of martingales.


Sign in / Sign up

Export Citation Format

Share Document