Solutions of the average cost optimality equation for finite Markov decision chains: risk-sensitive and risk-neutral criteria

2008 ◽  
Vol 70 (3) ◽  
pp. 541-566 ◽  
Author(s):  
Rolando Cavazos-Cadena
2011 ◽  
Vol 2011 ◽  
pp. 1-11
Author(s):  
Epaminondas G. Kyriakidis

We introduce a Markov decision process in continuous time for the optimal control of a simple symmetrical immigration-emigration process by the introduction of total catastrophes. It is proved that a particular control-limit policy is average cost optimal within the class of all stationary policies by verifying that the relative values of this policy are the solution of the corresponding optimality equation.


1993 ◽  
Vol 7 (1) ◽  
pp. 47-67 ◽  
Author(s):  
Linn I. Sennott

We consider a Markov decision chain with countable state space, finite action sets, and nonnegative costs. Conditions for the average cost optimality inequality to be an equality are derived. This extends work of Cavazos-Cadena [8]. It is shown that an optimal stationary policy must satisfy the optimality equation at all positive recurrent states. Structural results on the chain induced by an optimal stationary policy are derived. The results are employed in two examples to prove that any optimal stationary policy must be of critical number form.


2018 ◽  
Vol 50 (01) ◽  
pp. 204-230 ◽  
Author(s):  
Rolando Cavazos-Cadena ◽  
Daniel Hernández-Hernández

Abstract This work concerns Markov decision chains on a finite state space. The decision-maker has a constant and nonnull risk sensitivity coefficient, and the performance of a control policy is measured by two different indices, namely, the discounted and average criteria. Motivated by well-known results for the risk-neutral case, the problem of approximating the optimal risk-sensitive average cost in terms of the optimal risk-sensitive discounted value functions is addressed. Under suitable communication assumptions, it is shown that, as the discount factor increases to 1, appropriate normalizations of the optimal discounted value functions converge to the optimal average cost, and to the functional part of the solution of the risk-sensitive average cost optimality equation.


1980 ◽  
Vol 17 (04) ◽  
pp. 996-1003
Author(s):  
D. R. Robinson

It is known that when costs are unbounded satisfaction of the appropriate dynamic programming ‘optimality' equation by a policy is not sufficient to guarantee its average optimality. A ‘lowest-order potential' condition is introduced which, along with the dynamic programming equation, is sufficient to establish the optimality of the policy. Also, it is shown that under fairly general conditions, if the lowest-order potential condition is not satisfied there exists a non-memoryless policy with smaller average cost than the policy satisfying the dynamic programming equation.


Sign in / Sign up

Export Citation Format

Share Document