A Simulation-Based Policy Iteration Algorithm for Average Cost Unichain Markov Decision Processes
The policy iteration algorithm for average reward Markov decision processes with general state space
1997 ◽
Vol 42
(12)
◽
pp. 1663-1680
◽
2016 ◽
Vol 133
(10)
◽
pp. 28-33
◽
2003 ◽
Vol 17
(2)
◽
pp. 213-234
◽
1992 ◽
Vol 24
(1-2)
◽
pp. 147-155
◽