Approximation of average cost optimal policies for general Markov decision processes with unbounded costs

Evgueni Gordienko; Ra�l Montes-De-Oca; Adolfo Minj�rez-Sosa

doi:10.1007/bf01193864

Optimal policies for constrained average-cost Markov decision processes

Top ◽

10.1007/s11750-009-0110-7 ◽

2009 ◽

Vol 19 (1) ◽

pp. 107-120 ◽

Cited By ~ 4

Author(s):

Juan González-Hernández ◽

César E. Villarreal

Keyword(s):

Markov Decision Processes ◽

Average Cost ◽

Decision Processes ◽

Optimal Policies ◽

Markov Decision

Download Full-text

On the average cost optimality equation and the structure of optimal policies for partially observable Markov decision processes

Annals of Operations Research ◽

10.1007/bf02283610 ◽

1991 ◽

Vol 29 (1) ◽

pp. 439-469 ◽

Cited By ~ 45

Author(s):

Emmanuel Fernández-Gaucherand ◽

Aristotle Arapostathis ◽

Steven I. Marcus

Keyword(s):

Markov Decision Processes ◽

Average Cost ◽

Decision Processes ◽

Optimality Equation ◽

Optimal Policies ◽

Markov Decision ◽

Partially Observable Markov ◽

Partially Observable ◽

Average Cost Optimality Equation ◽

Cost Optimality

Download Full-text

New discount and average optimality conditions for continuous-time Markov decision processes

Advances in Applied Probability ◽

10.1017/s000186780000447x ◽

2010 ◽

Vol 42 (04) ◽

pp. 953-985 ◽

Cited By ~ 2

Author(s):

Xianping Guo ◽

Liuer Ye

Keyword(s):

Markov Decision Processes ◽

Continuous Time ◽

Average Cost ◽

Nonnegative Solution ◽

Decision Processes ◽

Stationary Policy ◽

Discounted Cost ◽

Markov Decision ◽

Optimal Stationary Policy ◽

Bounded Below

This paper deals with continuous-time Markov decision processes in Polish spaces, under the discounted and average cost criteria. All underlying Markov processes are determined by given transition rates which are allowed to be unbounded, and the costs are assumed to be bounded below. By introducing an occupation measure of a randomized Markov policy and analyzing properties of occupation measures, we first show that the family of all randomized stationary policies is ‘sufficient’ within the class of all randomized Markov policies. Then, under the semicontinuity and compactness conditions, we prove the existence of a discounted cost optimal stationary policy by providing a value iteration technique. Moreover, by developing a new average cost, minimum nonnegative solution method, we prove the existence of an average cost optimal stationary policy under some reasonably mild conditions. Finally, we use some examples to illustrate applications of our results. Except that the costs are assumed to be bounded below, the conditions for the existence of discounted cost (or average cost) optimal policies are much weaker than those in the previous literature, and the minimum nonnegative solution approach is new.

Download Full-text

Average optimal policies in Markov decision drift processes with applications to a queueing and a replacement model

Advances in Applied Probability ◽

10.2307/1426437 ◽

1983 ◽

Vol 15 (2) ◽

pp. 274-303 ◽

Cited By ~ 28

Author(s):

Arie Hordijk ◽

Frank A. Van Der Duyn Schouten

Keyword(s):

Markov Decision Processes ◽

Optimal Policy ◽

Continuous Time ◽

Sufficient Conditions ◽

Decision Processes ◽

Time Parameter ◽

Queueing Model ◽

Replacement Model ◽

Optimal Policies ◽

Markov Decision

Recently the authors introduced the concept of Markov decision drift processes. A Markov decision drift process can be seen as a straightforward generalization of a Markov decision process with continuous time parameter. In this paper we investigate the existence of stationary average optimal policies for Markov decision drift processes. Using a well-known Abelian theorem we derive sufficient conditions, which guarantee that a ‘limit point' of a sequence of discounted optimal policies with the discounting factor approaching 1 is an average optimal policy. An alternative set of sufficient conditions is obtained for the case in which the discounted optimal policies generate regenerative stochastic processes. The latter set of conditions is easier to verify in several applications. The results of this paper are also applicable to Markov decision processes with discrete or continuous time parameter and to semi-Markov decision processes. In this sense they generalize some well-known results for Markov decision processes with finite or compact action space. Applications to an M/M/1 queueing model and a maintenance replacement model are given. It is shown that under certain conditions on the model parameters the average optimal policy for the M/M/1 queueing model is monotone non-decreasing (as a function of the number of waiting customers) with respect to the service intensity and monotone non-increasing with respect to the arrival intensity. For the maintenance replacement model we prove the average optimality of a bang-bang type policy. Special attention is paid to the computation of the optimal control parameters.

Download Full-text

Constrained Optimization for Average Cost Continuous-Time Markov Decision Processes

IEEE Transactions on Automatic Control ◽

10.1109/tac.2007.899040 ◽

2007 ◽

Vol 52 (6) ◽

pp. 1139-1143 ◽

Cited By ~ 20

Author(s):

Xianping Guo

Keyword(s):

Constrained Optimization ◽

Markov Decision Processes ◽

Continuous Time ◽

Average Cost ◽

Decision Processes ◽

Markov Decision

Download Full-text

New discount and average optimality conditions for continuous-time Markov decision processes

Advances in Applied Probability ◽

10.1239/aap/1293113146 ◽

2010 ◽

Vol 42 (4) ◽

pp. 953-985 ◽

Cited By ~ 9

Author(s):

Xianping Guo ◽

Liuer Ye

Keyword(s):

Markov Decision Processes ◽

Continuous Time ◽

Average Cost ◽

Nonnegative Solution ◽

Decision Processes ◽

Stationary Policy ◽

Discounted Cost ◽

Markov Decision ◽

Optimal Stationary Policy ◽

Bounded Below

This paper deals with continuous-time Markov decision processes in Polish spaces, under the discounted and average cost criteria. All underlying Markov processes are determined by given transition rates which are allowed to be unbounded, and the costs are assumed to be bounded below. By introducing an occupation measure of a randomized Markov policy and analyzing properties of occupation measures, we first show that the family of all randomized stationary policies is ‘sufficient’ within the class of all randomized Markov policies. Then, under the semicontinuity and compactness conditions, we prove the existence of a discounted cost optimal stationary policy by providing a value iteration technique. Moreover, by developing a new average cost, minimum nonnegative solution method, we prove the existence of an average cost optimal stationary policy under some reasonably mild conditions. Finally, we use some examples to illustrate applications of our results. Except that the costs are assumed to be bounded below, the conditions for the existence of discounted cost (or average cost) optimal policies are much weaker than those in the previous literature, and the minimum nonnegative solution approach is new.

Download Full-text

Reinforcement Learning Based Algorithms for Average Cost Markov Decision Processes

Discrete Event Dynamic Systems ◽

10.1007/s10626-006-0003-y ◽

2007 ◽

Vol 17 (1) ◽

pp. 23-52 ◽

Cited By ~ 9

Author(s):

Mohammed Shahid Abdulla ◽

Shalabh Bhatnagar

Keyword(s):

Reinforcement Learning ◽

Markov Decision Processes ◽

Average Cost ◽

Decision Processes ◽

Markov Decision

Download Full-text

Conditions for the uniqueness of optimal policies of discounted Markov decision processes

Mathematical Methods of Operations Research ◽

10.1007/s001860400372 ◽

2004 ◽

Vol 60 (3) ◽

pp. 415-436 ◽

Cited By ~ 12

Author(s):

Daniel Cruz-Su�rez ◽

Ra�l Montes-de-Oca ◽

Francisco Salem-Silva

Keyword(s):

Markov Decision Processes ◽

Decision Processes ◽

Optimal Policies ◽

Markov Decision

Download Full-text

Solution to the risk-sensitive average cost optimality equation in a class of Markov decision processes with finite state space

Mathematical Methods of Operations Research ◽

10.1007/s001860200256 ◽

2003 ◽

Vol 57 (2) ◽

pp. 263-285 ◽

Cited By ~ 10

Author(s):

Rolando Cavazos-Cadena

Keyword(s):

Markov Decision Processes ◽

Average Cost ◽

Decision Processes ◽

Optimality Equation ◽

Risk Sensitive ◽

Finite State ◽

Markov Decision ◽

Average Cost Optimality Equation ◽

Cost Optimality ◽

Finite State Space

Download Full-text

The Determination of Approximately Optimal Policies in Markov Decision Processes by the Use of Bounds

Journal of the Operational Research Society ◽

10.2307/2581490 ◽

1982 ◽

Vol 33 (3) ◽

pp. 253

Author(s):

D. J. White

Keyword(s):

Markov Decision Processes ◽

Decision Processes ◽

Optimal Policies ◽

Markov Decision

Download Full-text