Sample-Path Optimal Stationary Policies in Stable Markov Decision Chains with the Average Reward Criterion
Keyword(s):
This paper concerns discrete-time Markov decision chains with denumerable state and compact action sets. Besides standard continuity requirements, the main assumption on the model is that it admits a Lyapunov function ℓ. In this context the average reward criterion is analyzed from the sample-path point of view. The main conclusion is that if the expected average reward associated to ℓ2 is finite under any policy then a stationary policy obtained from the optimality equation in the standard way is sample-path average optimal in a strong sense.
2015 ◽
Vol 52
(02)
◽
pp. 419-440
◽
2013 ◽
Vol 163
(2)
◽
pp. 674-684
◽
Keyword(s):
1991 ◽
Vol 7
(1)
◽
pp. 6-16
◽
Keyword(s):
2007 ◽
pp. 263-277
◽
2003 ◽
Vol 56
(3)
◽
pp. 451-471
Keyword(s):
2000 ◽
Vol 14
(4)
◽
pp. 533-548