Note on discounted continuous-time Markov decision processes with a lower bounding function

Uniformization permits the replacement of a semi-Markov decision process (SMDP) by a Markov chain exhibiting the same average rewards for simple (non-randomized) policies. It is shown that various anomalies may occur, especially for stationary (randomized) policies; uniformization introduces virtual jumps with concomitant action changes not present in the original process. Since these lead to discrepancies in the average rewards for stationary processes, uniformization can be accepted as valid only for simple policies. We generalize uniformization to yield consistent results for stationary policies also. These results are applied to constrained optimization of SMDP, in which stationary (randomized) policies appear naturally. The structure of optimal constrained SMDP policies can then be elucidated by studying the corresponding controlled Markov chains. Moreover, constrained SMDP optimal policy computations can be more easily implemented in discrete time, the generalized uniformization being employed to relate discrete- and continuous-time optimal constrained policies.

Download Full-text

Average Optimality for Continuous-Time Markov Decision Processes Under Weak Continuity Conditions

Journal of Applied Probability ◽

10.1239/jap/1421763321 ◽

2014 ◽

Vol 51 (4) ◽

pp. 954-970 ◽

Cited By ~ 5

Author(s):

Yi Zhang

Keyword(s):

Continuous Time ◽

Decision Process ◽

Decision Processes ◽

Weak Continuity ◽

Transition Rates ◽

Cost Rate ◽

Upper Semicontinuous ◽

Weakly Continuous ◽

Markov Decision ◽

Action Spaces

This paper considers the average optimality for a continuous-time Markov decision process in Borel state and action spaces, and with an arbitrarily unbounded nonnegative cost rate. The existence of a deterministic stationary optimal policy is proved under the conditions that allow the following; the controlled process can be explosive, the transition rates are weakly continuous, and the multifunction defining the admissible action spaces can be neither compact-valued nor upper semicontinuous.

Download Full-text

Adaptive control of M/M/1 queues—continuous-time Markov decision process approach

Journal of Applied Probability ◽

10.1017/s0021900200023512 ◽

1983 ◽

Vol 20 (02) ◽

pp. 368-379

Author(s):

Lam Yeh ◽

L. C. Thomas

Keyword(s):

Adaptive Control ◽

Markov Decision Process ◽

Markov Decision Processes ◽

Optimal Policy ◽

Continuous Time ◽

Decision Process ◽

Process Approach ◽

Decision Processes ◽

Markov Decision ◽

Discounted Costs

By considering continuous-time Markov decision processes where decisions can be made at any time, we show in the case of M/M/1 queues with discounted costs that there exists a monotone optimal policy among all the regular policies.

Download Full-text

Adaptive control of M/M/1 queues—continuous-time Markov decision process approach

Journal of Applied Probability ◽

10.2307/3213809 ◽

1983 ◽

Vol 20 (2) ◽

pp. 368-379 ◽

Cited By ~ 6

Author(s):

Lam Yeh ◽

L. C. Thomas

Keyword(s):

Adaptive Control ◽

Markov Decision Process ◽

Markov Decision Processes ◽

Optimal Policy ◽

Continuous Time ◽

Decision Process ◽

Process Approach ◽

Decision Processes ◽

Markov Decision ◽

Discounted Costs

By considering continuous-time Markov decision processes where decisions can be made at any time, we show in the case of M/M/1 queues with discounted costs that there exists a monotone optimal policy among all the regular policies.

Download Full-text

Average Optimality for Continuous-Time Markov Decision Processes Under Weak Continuity Conditions

Journal of Applied Probability ◽

10.1017/s0021900200011918 ◽

2014 ◽

Vol 51 (04) ◽

pp. 954-970 ◽

Cited By ~ 1

Author(s):

Yi Zhang

Keyword(s):

Continuous Time ◽

Decision Process ◽

Decision Processes ◽

Weak Continuity ◽

Transition Rates ◽

Cost Rate ◽

Upper Semicontinuous ◽

Weakly Continuous ◽

Markov Decision ◽

Action Spaces

This paper considers the average optimality for a continuous-time Markov decision process in Borel state and action spaces, and with an arbitrarily unbounded nonnegative cost rate. The existence of a deterministic stationary optimal policy is proved under the conditions that allow the following; the controlled process can be explosive, the transition rates are weakly continuous, and the multifunction defining the admissible action spaces can be neither compact-valued nor upper semicontinuous.

Download Full-text

Uniformization for semi-Markov decision processes under stationary policies

Journal of Applied Probability ◽

10.2307/3214096 ◽

1987 ◽

Vol 24 (3) ◽

pp. 644-656 ◽

Cited By ~ 16

Author(s):

Frederick J. Beutler ◽

Keith W. Ross

Keyword(s):

Markov Decision Processes ◽

Continuous Time ◽

Decision Process ◽

Decision Processes ◽

Stationary Processes ◽

Original Process ◽

Markov Decision ◽

Time Optimal ◽

Randomized Policies ◽

Average Rewards

Uniformization permits the replacement of a semi-Markov decision process (SMDP) by a Markov chain exhibiting the same average rewards for simple (non-randomized) policies. It is shown that various anomalies may occur, especially for stationary (randomized) policies; uniformization introduces virtual jumps with concomitant action changes not present in the original process. Since these lead to discrepancies in the average rewards for stationary processes, uniformization can be accepted as valid only for simple policies.We generalize uniformization to yield consistent results for stationary policies also. These results are applied to constrained optimization of SMDP, in which stationary (randomized) policies appear naturally. The structure of optimal constrained SMDP policies can then be elucidated by studying the corresponding controlled Markov chains. Moreover, constrained SMDP optimal policy computations can be more easily implemented in discrete time, the generalized uniformization being employed to relate discrete- and continuous-time optimal constrained policies.

Download Full-text

Denumerable state continuous time Markov decision processes with unbounded cost and transition rates under average criterion

The ANZIAM Journal ◽

10.1017/s144618110001213x ◽

2002 ◽

Vol 43 (4) ◽

pp. 541-557 ◽

Cited By ~ 10

Author(s):

Xianping Guo ◽

Weiping Zhu

Keyword(s):

Markov Decision Processes ◽

Continuous Time ◽

Decision Processes ◽

Transition Rates ◽

Birth And Death Processes ◽

Optimality Equation ◽

Average Criterion ◽

Markov Decision ◽

Unbounded Cost ◽

Queue Model

AbstractIn this paper, we consider denumerable state continuous time Markov decision processes with (possibly unbounded) transition and cost rates under average criterion. We present a set of conditions and prove the existence of both average cost optimal stationary policies and a solution of the average optimality equation under the conditions. The results in this paper are applied to an admission control queue model and controlled birth and death processes.

Download Full-text