scholarly journals Note on discounted continuous-time Markov decision processes with a lower bounding function

2017 ◽  
Vol 54 (4) ◽  
pp. 1071-1088
Author(s):  
Xin Guo ◽  
Alexey Piunovskiy ◽  
Yi Zhang

AbstractWe consider the discounted continuous-time Markov decision process (CTMDP), where the negative part of each cost rate is bounded by a drift function, sayw, whereas the positive part is allowed to be arbitrarily unbounded. Our focus is on the existence of a stationary optimal policy for the discounted CTMDP problems out of the more general class. Both constrained and unconstrained problems are considered. Our investigations are based on the continuous-time version of the Veinott transformation. This technique has not been widely employed in the previous literature on CTMDPs, but it clarifies the roles of the imposed conditions in a rather transparent way.

1987 ◽  
Vol 24 (03) ◽  
pp. 644-656 ◽  
Author(s):  
Frederick J. Beutler ◽  
Keith W. Ross

Uniformization permits the replacement of a semi-Markov decision process (SMDP) by a Markov chain exhibiting the same average rewards for simple (non-randomized) policies. It is shown that various anomalies may occur, especially for stationary (randomized) policies; uniformization introduces virtual jumps with concomitant action changes not present in the original process. Since these lead to discrepancies in the average rewards for stationary processes, uniformization can be accepted as valid only for simple policies. We generalize uniformization to yield consistent results for stationary policies also. These results are applied to constrained optimization of SMDP, in which stationary (randomized) policies appear naturally. The structure of optimal constrained SMDP policies can then be elucidated by studying the corresponding controlled Markov chains. Moreover, constrained SMDP optimal policy computations can be more easily implemented in discrete time, the generalized uniformization being employed to relate discrete- and continuous-time optimal constrained policies.


2014 ◽  
Vol 51 (4) ◽  
pp. 954-970 ◽  
Author(s):  
Yi Zhang

This paper considers the average optimality for a continuous-time Markov decision process in Borel state and action spaces, and with an arbitrarily unbounded nonnegative cost rate. The existence of a deterministic stationary optimal policy is proved under the conditions that allow the following; the controlled process can be explosive, the transition rates are weakly continuous, and the multifunction defining the admissible action spaces can be neither compact-valued nor upper semicontinuous.


1983 ◽  
Vol 20 (02) ◽  
pp. 368-379
Author(s):  
Lam Yeh ◽  
L. C. Thomas

By considering continuous-time Markov decision processes where decisions can be made at any time, we show in the case of M/M/1 queues with discounted costs that there exists a monotone optimal policy among all the regular policies.


1983 ◽  
Vol 20 (2) ◽  
pp. 368-379 ◽  
Author(s):  
Lam Yeh ◽  
L. C. Thomas

By considering continuous-time Markov decision processes where decisions can be made at any time, we show in the case of M/M/1 queues with discounted costs that there exists a monotone optimal policy among all the regular policies.


2014 ◽  
Vol 51 (04) ◽  
pp. 954-970 ◽  
Author(s):  
Yi Zhang

This paper considers the average optimality for a continuous-time Markov decision process in Borel state and action spaces, and with an arbitrarily unbounded nonnegative cost rate. The existence of a deterministic stationary optimal policy is proved under the conditions that allow the following; the controlled process can be explosive, the transition rates are weakly continuous, and the multifunction defining the admissible action spaces can be neither compact-valued nor upper semicontinuous.


1987 ◽  
Vol 24 (3) ◽  
pp. 644-656 ◽  
Author(s):  
Frederick J. Beutler ◽  
Keith W. Ross

Uniformization permits the replacement of a semi-Markov decision process (SMDP) by a Markov chain exhibiting the same average rewards for simple (non-randomized) policies. It is shown that various anomalies may occur, especially for stationary (randomized) policies; uniformization introduces virtual jumps with concomitant action changes not present in the original process. Since these lead to discrepancies in the average rewards for stationary processes, uniformization can be accepted as valid only for simple policies.We generalize uniformization to yield consistent results for stationary policies also. These results are applied to constrained optimization of SMDP, in which stationary (randomized) policies appear naturally. The structure of optimal constrained SMDP policies can then be elucidated by studying the corresponding controlled Markov chains. Moreover, constrained SMDP optimal policy computations can be more easily implemented in discrete time, the generalized uniformization being employed to relate discrete- and continuous-time optimal constrained policies.


2002 ◽  
Vol 43 (4) ◽  
pp. 541-557 ◽  
Author(s):  
Xianping Guo ◽  
Weiping Zhu

AbstractIn this paper, we consider denumerable state continuous time Markov decision processes with (possibly unbounded) transition and cost rates under average criterion. We present a set of conditions and prove the existence of both average cost optimal stationary policies and a solution of the average optimality equation under the conditions. The results in this paper are applied to an admission control queue model and controlled birth and death processes.


Sign in / Sign up

Export Citation Format

Share Document