Semi-Markov decision processes with polynomial reward
Keyword(s):
Long Run
◽
A semi-Markov decision process, with a denumerable multidimensional state space, is considered. At any given state only a finite number of actions can be taken to control the process. The immediate reward earned in one transition period is merely assumed to be bounded by a polynomial and a bound is imposed on a weighted moment of the next state reached in one transition. It is shown that under an ergodicity assumption there is a stationary optimal policy for the long-run average reward criterion. A queueing network scheduling problem, for which previous criteria are inapplicable, is given as an application.
2007 ◽
pp. 263-277
◽
2000 ◽
Vol 14
(4)
◽
pp. 533-548
1991 ◽
Vol 23
(1)
◽
pp. 193-207
◽
1999 ◽
Vol 30
(7-8)
◽
pp. 7-20
Keyword(s):