Adaptive control of M/M/1 queues—continuous-time Markov decision process approach

1983 ◽  
Vol 20 (2) ◽  
pp. 368-379 ◽  
Author(s):  
Lam Yeh ◽  
L. C. Thomas

By considering continuous-time Markov decision processes where decisions can be made at any time, we show in the case of M/M/1 queues with discounted costs that there exists a monotone optimal policy among all the regular policies.

1983 ◽  
Vol 20 (02) ◽  
pp. 368-379
Author(s):  
Lam Yeh ◽  
L. C. Thomas

By considering continuous-time Markov decision processes where decisions can be made at any time, we show in the case of M/M/1 queues with discounted costs that there exists a monotone optimal policy among all the regular policies.


2020 ◽  
Vol 40 (1) ◽  
pp. 117-137
Author(s):  
R. Israel Ortega-Gutiérrez ◽  
H. Cruz-Suárez

This paper addresses a class of sequential optimization problems known as Markov decision processes. These kinds of processes are considered on Euclidean state and action spaces with the total expected discounted cost as the objective function. The main goal of the paper is to provide conditions to guarantee an adequate Moreau-Yosida regularization for Markov decision processes (named the original process). In this way, a new Markov decision process that conforms to the Markov control model of the original process except for the cost function induced via the Moreau-Yosida regularization is established. Compared to the original process, this new discounted Markov decision process has richer properties, such as the differentiability of its optimal value function, strictly convexity of the value function, uniqueness of optimal policy, and the optimal value function and the optimal policy of both processes, are the same. To complement the theory presented, an example is provided.


1987 ◽  
Vol 24 (03) ◽  
pp. 644-656 ◽  
Author(s):  
Frederick J. Beutler ◽  
Keith W. Ross

Uniformization permits the replacement of a semi-Markov decision process (SMDP) by a Markov chain exhibiting the same average rewards for simple (non-randomized) policies. It is shown that various anomalies may occur, especially for stationary (randomized) policies; uniformization introduces virtual jumps with concomitant action changes not present in the original process. Since these lead to discrepancies in the average rewards for stationary processes, uniformization can be accepted as valid only for simple policies. We generalize uniformization to yield consistent results for stationary policies also. These results are applied to constrained optimization of SMDP, in which stationary (randomized) policies appear naturally. The structure of optimal constrained SMDP policies can then be elucidated by studying the corresponding controlled Markov chains. Moreover, constrained SMDP optimal policy computations can be more easily implemented in discrete time, the generalized uniformization being employed to relate discrete- and continuous-time optimal constrained policies.


1983 ◽  
Vol 15 (2) ◽  
pp. 274-303 ◽  
Author(s):  
Arie Hordijk ◽  
Frank A. Van Der Duyn Schouten

Recently the authors introduced the concept of Markov decision drift processes. A Markov decision drift process can be seen as a straightforward generalization of a Markov decision process with continuous time parameter. In this paper we investigate the existence of stationary average optimal policies for Markov decision drift processes. Using a well-known Abelian theorem we derive sufficient conditions, which guarantee that a ‘limit point' of a sequence of discounted optimal policies with the discounting factor approaching 1 is an average optimal policy. An alternative set of sufficient conditions is obtained for the case in which the discounted optimal policies generate regenerative stochastic processes. The latter set of conditions is easier to verify in several applications. The results of this paper are also applicable to Markov decision processes with discrete or continuous time parameter and to semi-Markov decision processes. In this sense they generalize some well-known results for Markov decision processes with finite or compact action space. Applications to an M/M/1 queueing model and a maintenance replacement model are given. It is shown that under certain conditions on the model parameters the average optimal policy for the M/M/1 queueing model is monotone non-decreasing (as a function of the number of waiting customers) with respect to the service intensity and monotone non-increasing with respect to the arrival intensity. For the maintenance replacement model we prove the average optimality of a bang-bang type policy. Special attention is paid to the computation of the optimal control parameters.


2013 ◽  
Vol 45 (2) ◽  
pp. 490-519 ◽  
Author(s):  
Xianping Guo ◽  
Mantas Vykertas ◽  
Yi Zhang

In this paper we study absorbing continuous-time Markov decision processes in Polish state spaces with unbounded transition and cost rates, and history-dependent policies. The performance measure is the expected total undiscounted costs. For the unconstrained problem, we show the existence of a deterministic stationary optimal policy, whereas, for the constrained problems with N constraints, we show the existence of a mixed stationary optimal policy, where the mixture is over no more than N+1 deterministic stationary policies. Furthermore, the strong duality result is obtained for the associated linear programs.


2017 ◽  
Vol 54 (4) ◽  
pp. 1071-1088
Author(s):  
Xin Guo ◽  
Alexey Piunovskiy ◽  
Yi Zhang

AbstractWe consider the discounted continuous-time Markov decision process (CTMDP), where the negative part of each cost rate is bounded by a drift function, sayw, whereas the positive part is allowed to be arbitrarily unbounded. Our focus is on the existence of a stationary optimal policy for the discounted CTMDP problems out of the more general class. Both constrained and unconstrained problems are considered. Our investigations are based on the continuous-time version of the Veinott transformation. This technique has not been widely employed in the previous literature on CTMDPs, but it clarifies the roles of the imposed conditions in a rather transparent way.


1982 ◽  
Vol 19 (4) ◽  
pp. 794-802 ◽  
Author(s):  
Matthew J. Sobel

Formulae are presented for the variance and higher moments of the present value of single-stage rewards in a finite Markov decision process. Similar formulae are exhibited for a semi-Markov decision process. There is a short discussion of the obstacles to using the variance formula in algorithms to maximize the mean minus a multiple of the standard deviation.


1998 ◽  
Vol 12 (2) ◽  
pp. 177-187 ◽  
Author(s):  
Kazuyoshi Wakuta

We consider a discounted cost Markov decision process with a constraint. Relating this to a vector-valued Markov decision process, we prove that there exists a constrained optimal randomized semistationary policy if there exists at least one policy satisfying a constraint. Moreover, we present an algorithm by which we can find the constrained optimal randomized semistationary policy, or we can discover that there exist no policies satisfying a given constraint.


Author(s):  
Alessandro Ronca ◽  
Giuseppe De Giacomo

Recently regular decision processes have been proposed as a well-behaved form of non-Markov decision process. Regular decision processes are characterised by a transition function and a reward function that depend on the whole history, though regularly (as in regular languages). In practice both the transition and the reward functions can be seen as finite transducers. We study reinforcement learning in regular decision processes. Our main contribution is to show that a near-optimal policy can be PAC-learned in polynomial time in a set of parameters that describe the underlying decision process. We argue that the identified set of parameters is minimal and it reasonably captures the difficulty of a regular decision process.


2021 ◽  
Author(s):  
Xiaocheng Li ◽  
Huaiyang Zhong ◽  
Margaret L. Brandeau

Title: Sequential Decision Making Using Quantiles The goal of a traditional Markov decision process (MDP) is to maximize the expectation of cumulative reward over a finite or infinite horizon. In many applications, however, a decision maker may be interested in optimizing a specific quantile of the cumulative reward. For example, a physician may want to determine the optimal drug regime for a risk-averse patient with the objective of maximizing the 0.10 quantile of the cumulative reward; this is the cumulative improvement in health that is expected to occur with at least 90% probability for the patient. In “Quantile Markov Decision Processes,” X. Li, H. Zhong, and M. Brandeau provide analytic results to solve the quantile Markov decision process (QMDP) problem. They develop an efficient dynamic programming procedure that finds the optimal QMDP value function for all states and quantiles in one pass. The algorithm also extends to the MDP problem with a conditional value-at-risk objective.


Sign in / Sign up

Export Citation Format

Share Document