Average Reward Criterion

Author(s):  
O. Hernández-Lerma
2015 ◽  
Vol 52 (2) ◽  
pp. 419-440
Author(s):  
Rolando Cavazos-Cadena ◽  
Raúl Montes-De-Oca ◽  
Karel Sladký

This paper concerns discrete-time Markov decision chains with denumerable state and compact action sets. Besides standard continuity requirements, the main assumption on the model is that it admits a Lyapunov function ℓ. In this context the average reward criterion is analyzed from the sample-path point of view. The main conclusion is that if the expected average reward associated to ℓ2 is finite under any policy then a stationary policy obtained from the optimality equation in the standard way is sample-path average optimal in a strong sense.


1999 ◽  
Vol 30 (7-8) ◽  
pp. 7-20
Author(s):  
M. Kurano ◽  
M. Yasuda ◽  
J.-I. Nakagami ◽  
Y. Yoshida

1996 ◽  
Vol 28 (4) ◽  
pp. 1123-1144
Author(s):  
K. D. Glazebrook

A single machine is available to process a collection of jobs J, each of which evolves stochastically under processing. Jobs incur costs while awaiting the machine at a rate which is state dependent and processing must respect a set of precedence constraints Γ. Index policies are optimal in a variety of scenarios. The indices concerned are characterised as values of restart problems with the average reward criterion. This characterisation yields a range of efficient approaches to their computation. Index-based suboptimality bounds are derived for general processing policies. These bounds enable us to develop sensitivity analyses and to evaluate scheduling heuristics.


1982 ◽  
Vol 19 (2) ◽  
pp. 301-309 ◽  
Author(s):  
Zvi Rosberg

A semi-Markov decision process, with a denumerable multidimensional state space, is considered. At any given state only a finite number of actions can be taken to control the process. The immediate reward earned in one transition period is merely assumed to be bounded by a polynomial and a bound is imposed on a weighted moment of the next state reached in one transition. It is shown that under an ergodicity assumption there is a stationary optimal policy for the long-run average reward criterion. A queueing network scheduling problem, for which previous criteria are inapplicable, is given as an application.


Sign in / Sign up

Export Citation Format

Share Document