Sample-Path Optimal Stationary Policies in Stable Markov Decision Chains with the Average Reward Criterion

This paper concerns discrete-time Markov decision chains with denumerable state and compact action sets. Besides standard continuity requirements, the main assumption on the model is that it admits a Lyapunov function ℓ. In this context the average reward criterion is analyzed from the sample-path point of view. The main conclusion is that if the expected average reward associated to ℓ2 is finite under any policy then a stationary policy obtained from the optimality equation in the standard way is sample-path average optimal in a strong sense.

Download Full-text

Sample-Path Optimal Stationary Policies in Stable Markov Decision Chains with the Average Reward Criterion

Journal of Applied Probability ◽

10.1017/s0021900200012559 ◽

2015 ◽

Vol 52 (02) ◽

pp. 419-440 ◽

Cited By ~ 1

Author(s):

Rolando Cavazos-Cadena ◽

Raúl Montes-De-Oca ◽

Karel Sladký

Keyword(s):

Sample Path ◽

Point Of View ◽

Average Reward ◽

Stationary Policy ◽

Optimality Equation ◽

Markov Decision ◽

Average Reward Criterion ◽

Compact Action Sets ◽

Path Point ◽

Reward Criterion

This paper concerns discrete-time Markov decision chains with denumerable state and compact action sets. Besides standard continuity requirements, the main assumption on the model is that it admits a Lyapunov function ℓ. In this context the average reward criterion is analyzed from the sample-path point of view. The main conclusion is that if the expected average reward associated to ℓ2is finite under any policy then a stationary policy obtained from the optimality equation in the standard way is sample-path average optimal in a strong sense.

Download Full-text

A Counterexample on Sample-Path Optimality in Stable Markov Decision Chains with the Average Reward Criterion

Journal of Optimization Theory and Applications ◽

10.1007/s10957-013-0474-6 ◽

2013 ◽

Vol 163 (2) ◽

pp. 674-684 ◽

Cited By ~ 3

Author(s):

Rolando Cavazos-Cadena ◽

Raúl Montes-de-Oca ◽

Karel Sladký

Keyword(s):

Sample Path ◽

Average Reward ◽

Markov Decision ◽

Average Reward Criterion ◽

Reward Criterion

Download Full-text

Continuous time Markov decision programming with average reward criterion and unbounded reward rate

Acta Mathematicae Applicatae Sinica English Series ◽

10.1007/bf02080199 ◽

1991 ◽

Vol 7 (1) ◽

pp. 6-16 ◽

Cited By ~ 7

Author(s):

Shaohui Zheng

Keyword(s):

Continuous Time ◽

Average Reward ◽

Reward Rate ◽

Markov Decision ◽

Average Reward Criterion ◽

Reward Criterion

Download Full-text

Semi-Markov decision processes with polynomial reward

Journal of Applied Probability ◽

10.2307/3213482 ◽

1982 ◽

Vol 19 (2) ◽

pp. 301-309 ◽

Cited By ~ 6

Author(s):

Zvi Rosberg

Keyword(s):

Transition Period ◽

Queueing Network ◽

Decision Processes ◽

Average Reward ◽

Network Scheduling ◽

Long Run ◽

Markov Decision ◽

Average Reward Criterion ◽

Long Run Average Reward ◽

Reward Criterion

A semi-Markov decision process, with a denumerable multidimensional state space, is considered. At any given state only a finite number of actions can be taken to control the process. The immediate reward earned in one transition period is merely assumed to be bounded by a polynomial and a bound is imposed on a weighted moment of the next state reached in one transition. It is shown that under an ergodicity assumption there is a stationary optimal policy for the long-run average reward criterion. A queueing network scheduling problem, for which previous criteria are inapplicable, is given as an application.

Download Full-text

Estimation and control in finite Markov decision processes with the average reward criterion

Applicationes Mathematicae ◽

10.4064/am31-2-1 ◽

2004 ◽

Vol 31 (2) ◽

pp. 127-154

Author(s):

Rolando Cavazos-Cadena ◽

Raúl Montes-de-Oca

Keyword(s):

Markov Decision Processes ◽

Decision Processes ◽

Average Reward ◽

Markov Decision ◽

Average Reward Criterion ◽

Estimation And Control ◽

And Control ◽

Reward Criterion

Download Full-text

Bounded Parameter Markov Decision Processes with Average Reward Criterion

Learning Theory - Lecture Notes in Computer Science ◽

10.1007/978-3-540-72927-3_20 ◽

2007 ◽

pp. 263-277 ◽

Cited By ~ 9

Author(s):

Ambuj Tewari ◽

Peter L. Bartlett

Keyword(s):

Markov Decision Processes ◽

Decision Processes ◽

Average Reward ◽

Markov Decision ◽

Average Reward Criterion ◽

Reward Criterion

Download Full-text

Reversible Markov Decision Processes with an Average-Reward Criterion

SIAM Journal on Control and Optimization ◽

10.1137/110844957 ◽

2013 ◽

Vol 51 (1) ◽

pp. 402-418

Author(s):

Randy Cogill ◽

Cheng Peng

Keyword(s):

Markov Decision Processes ◽

Decision Processes ◽

Average Reward ◽

Markov Decision ◽

Average Reward Criterion ◽

Reward Criterion

Download Full-text

The optimality equation and ε-optimal strategies in Markov games with average reward criterion

Mathematical Methods of Operations Research ◽

10.1007/s001860200230 ◽

2003 ◽

Vol 56 (3) ◽

pp. 451-471

Author(s):

Heinz-Uwe Küenle ◽

Ronald Schurath

Keyword(s):

Optimal Strategies ◽

Average Reward ◽

Optimality Equation ◽

Markov Games ◽

Average Reward Criterion ◽

Reward Criterion

Download Full-text

VECTOR-VALUED MARKOV DECISION PROCESSES WITH AVERAGE REWARD CRITERION: THE MULTICHAIN CASE

Probability in the Engineering and Informational Sciences ◽

10.1017/s0269964800144092 ◽

2000 ◽

Vol 14 (4) ◽

pp. 533-548

Author(s):

Kazuyoshi Wakuta

Keyword(s):

Decision Process ◽

Decision Processes ◽

Iteration Algorithm ◽

Average Reward ◽

Markov Decision ◽

Policy Iteration Algorithm ◽

Average Reward Criterion ◽

Systems Of Linear Inequalities ◽

Vector Valued ◽

Reward Criterion

We study the multichain case of a vector-valued Markov decision process with average reward criterion. We characterize optimal deterministic stationary policies via systems of linear inequalities and discuss a policy iteration algorithm for finding all optimal deterministic stationary policies.

Download Full-text

Denumerable controlled Markov chains with average reward criterion: sample path optimality

Proceedings of 1994 33rd IEEE Conference on Decision and Control ◽

10.1109/cdc.1994.411028 ◽

2002 ◽

Author(s):

R. Cavazos-Cadena ◽

E. Fernandez-Gaucheraud

Keyword(s):

Markov Chains ◽

Sample Path ◽

Average Reward ◽

Controlled Markov Chains ◽

Average Reward Criterion ◽

Denumerable Controlled Markov Chains ◽

Reward Criterion

Download Full-text