Uniformization for semi-Markov decision processes under stationary policies

Uniformization permits the replacement of a semi-Markov decision process (SMDP) by a Markov chain exhibiting the same average rewards for simple (non-randomized) policies. It is shown that various anomalies may occur, especially for stationary (randomized) policies; uniformization introduces virtual jumps with concomitant action changes not present in the original process. Since these lead to discrepancies in the average rewards for stationary processes, uniformization can be accepted as valid only for simple policies. We generalize uniformization to yield consistent results for stationary policies also. These results are applied to constrained optimization of SMDP, in which stationary (randomized) policies appear naturally. The structure of optimal constrained SMDP policies can then be elucidated by studying the corresponding controlled Markov chains. Moreover, constrained SMDP optimal policy computations can be more easily implemented in discrete time, the generalized uniformization being employed to relate discrete- and continuous-time optimal constrained policies.

Download Full-text

Uniformization for semi-Markov decision processes under stationary policies

Journal of Applied Probability ◽

10.2307/3214096 ◽

1987 ◽

Vol 24 (3) ◽

pp. 644-656 ◽

Cited By ~ 16

Author(s):

Frederick J. Beutler ◽

Keith W. Ross

Keyword(s):

Markov Decision Processes ◽

Continuous Time ◽

Decision Process ◽

Decision Processes ◽

Stationary Processes ◽

Original Process ◽

Markov Decision ◽

Time Optimal ◽

Randomized Policies ◽

Average Rewards

Uniformization permits the replacement of a semi-Markov decision process (SMDP) by a Markov chain exhibiting the same average rewards for simple (non-randomized) policies. It is shown that various anomalies may occur, especially for stationary (randomized) policies; uniformization introduces virtual jumps with concomitant action changes not present in the original process. Since these lead to discrepancies in the average rewards for stationary processes, uniformization can be accepted as valid only for simple policies.We generalize uniformization to yield consistent results for stationary policies also. These results are applied to constrained optimization of SMDP, in which stationary (randomized) policies appear naturally. The structure of optimal constrained SMDP policies can then be elucidated by studying the corresponding controlled Markov chains. Moreover, constrained SMDP optimal policy computations can be more easily implemented in discrete time, the generalized uniformization being employed to relate discrete- and continuous-time optimal constrained policies.

Download Full-text

Note on discounted continuous-time Markov decision processes with a lower bounding function

Journal of Applied Probability ◽

10.1017/jpr.2017.53 ◽

2017 ◽

Vol 54 (4) ◽

pp. 1071-1088

Author(s):

Xin Guo ◽

Alexey Piunovskiy ◽

Yi Zhang

Keyword(s):

Markov Decision Processes ◽

Continuous Time ◽

Decision Process ◽

Decision Processes ◽

Positive Part ◽

Negative Part ◽

Cost Rate ◽

Lower Bounding ◽

Markov Decision ◽

Unconstrained Problems

AbstractWe consider the discounted continuous-time Markov decision process (CTMDP), where the negative part of each cost rate is bounded by a drift function, sayw, whereas the positive part is allowed to be arbitrarily unbounded. Our focus is on the existence of a stationary optimal policy for the discounted CTMDP problems out of the more general class. Both constrained and unconstrained problems are considered. Our investigations are based on the continuous-time version of the Veinott transformation. This technique has not been widely employed in the previous literature on CTMDPs, but it clarifies the roles of the imposed conditions in a rather transparent way.

Download Full-text

Adaptive control of M/M/1 queues—continuous-time Markov decision process approach

Journal of Applied Probability ◽

10.1017/s0021900200023512 ◽

1983 ◽

Vol 20 (02) ◽

pp. 368-379

Author(s):

Lam Yeh ◽

L. C. Thomas

Keyword(s):

Adaptive Control ◽

Markov Decision Process ◽

Markov Decision Processes ◽

Optimal Policy ◽

Continuous Time ◽

Decision Process ◽

Process Approach ◽

Decision Processes ◽

Markov Decision ◽

Discounted Costs

By considering continuous-time Markov decision processes where decisions can be made at any time, we show in the case of M/M/1 queues with discounted costs that there exists a monotone optimal policy among all the regular policies.

Download Full-text

Adaptive control of M/M/1 queues—continuous-time Markov decision process approach

Journal of Applied Probability ◽

10.2307/3213809 ◽

1983 ◽

Vol 20 (2) ◽

pp. 368-379 ◽

Cited By ~ 6

Author(s):

Lam Yeh ◽

L. C. Thomas

Keyword(s):

Adaptive Control ◽

Markov Decision Process ◽

Markov Decision Processes ◽

Optimal Policy ◽

Continuous Time ◽

Decision Process ◽

Process Approach ◽

Decision Processes ◽

Markov Decision ◽

Discounted Costs

Download Full-text

A Moreau-Yosida regularization for Markov decision processes

Proyecciones (Antofagasta) ◽

10.22199/issn.0717-6279-2021-01-0008 ◽

2020 ◽

Vol 40 (1) ◽

pp. 117-137

Author(s):

R. Israel Ortega-Gutiérrez ◽

H. Cruz-Suárez

Keyword(s):

Markov Decision Process ◽

Markov Decision Processes ◽

Optimal Policy ◽

Decision Process ◽

Value Function ◽

Decision Processes ◽

Original Process ◽

Optimal Value ◽

Markov Decision ◽

Yosida Regularization

This paper addresses a class of sequential optimization problems known as Markov decision processes. These kinds of processes are considered on Euclidean state and action spaces with the total expected discounted cost as the objective function. The main goal of the paper is to provide conditions to guarantee an adequate Moreau-Yosida regularization for Markov decision processes (named the original process). In this way, a new Markov decision process that conforms to the Markov control model of the original process except for the cost function induced via the Moreau-Yosida regularization is established. Compared to the original process, this new discounted Markov decision process has richer properties, such as the differentiability of its optimal value function, strictly convexity of the value function, uniqueness of optimal policy, and the optimal value function and the optimal policy of both processes, are the same. To complement the theory presented, an example is provided.

Download Full-text

Denumerable state continuous time Markov decision processes with unbounded cost and transition rates under average criterion

The ANZIAM Journal ◽

10.1017/s144618110001213x ◽

2002 ◽

Vol 43 (4) ◽

pp. 541-557 ◽

Cited By ~ 10

Author(s):

Xianping Guo ◽

Weiping Zhu

Keyword(s):

Markov Decision Processes ◽

Continuous Time ◽

Decision Processes ◽

Transition Rates ◽

Birth And Death Processes ◽

Optimality Equation ◽

Average Criterion ◽

Markov Decision ◽

Unbounded Cost ◽

Queue Model

AbstractIn this paper, we consider denumerable state continuous time Markov decision processes with (possibly unbounded) transition and cost rates under average criterion. We present a set of conditions and prove the existence of both average cost optimal stationary policies and a solution of the average optimality equation under the conditions. The results in this paper are applied to an admission control queue model and controlled birth and death processes.

Download Full-text