Optimality conditions for a Markov decision chain with unbounded costs

1980 ◽  
Vol 17 (04) ◽  
pp. 996-1003
Author(s):  
D. R. Robinson

It is known that when costs are unbounded satisfaction of the appropriate dynamic programming ‘optimality' equation by a policy is not sufficient to guarantee its average optimality. A ‘lowest-order potential' condition is introduced which, along with the dynamic programming equation, is sufficient to establish the optimality of the policy. Also, it is shown that under fairly general conditions, if the lowest-order potential condition is not satisfied there exists a non-memoryless policy with smaller average cost than the policy satisfying the dynamic programming equation.

1980 ◽  
Vol 17 (4) ◽  
pp. 996-1003 ◽  
Author(s):  
D. R. Robinson

It is known that when costs are unbounded satisfaction of the appropriate dynamic programming ‘optimality' equation by a policy is not sufficient to guarantee its average optimality. A ‘lowest-order potential' condition is introduced which, along with the dynamic programming equation, is sufficient to establish the optimality of the policy. Also, it is shown that under fairly general conditions, if the lowest-order potential condition is not satisfied there exists a non-memoryless policy with smaller average cost than the policy satisfying the dynamic programming equation.


2011 ◽  
Vol 2011 ◽  
pp. 1-11
Author(s):  
Epaminondas G. Kyriakidis

We introduce a Markov decision process in continuous time for the optimal control of a simple symmetrical immigration-emigration process by the introduction of total catastrophes. It is proved that a particular control-limit policy is average cost optimal within the class of all stationary policies by verifying that the relative values of this policy are the solution of the corresponding optimality equation.


1978 ◽  
Vol 15 (2) ◽  
pp. 356-373 ◽  
Author(s):  
A. Federgruen ◽  
H. C. Tijms

This paper is concerned with the optimality equation for the average costs in a denumerable state semi-Markov decision model. It will be shown that under each of a number of recurrency conditions on the transition probability matrices associated with the stationary policies, the optimality equation has a bounded solution. This solution indeed yields a stationary policy which is optimal for a strong version of the average cost optimality criterion. Besides the existence of a bounded solution to the optimality equation, we will show that both the value-iteration method and the policy-iteration method can be used to determine such a solution. For the latter method we will prove that the average costs and the relative cost functions of the policies generated converge to a solution of the optimality equation.


1993 ◽  
Vol 7 (1) ◽  
pp. 47-67 ◽  
Author(s):  
Linn I. Sennott

We consider a Markov decision chain with countable state space, finite action sets, and nonnegative costs. Conditions for the average cost optimality inequality to be an equality are derived. This extends work of Cavazos-Cadena [8]. It is shown that an optimal stationary policy must satisfy the optimality equation at all positive recurrent states. Structural results on the chain induced by an optimal stationary policy are derived. The results are employed in two examples to prove that any optimal stationary policy must be of critical number form.


Sign in / Sign up

Export Citation Format

Share Document