Bias Optimality in Controlled Queueing Systems

1998 ◽  
Vol 35 (1) ◽  
pp. 136-150 ◽  
Author(s):  
Moshe Haviv ◽  
Martin L. Puterman

This paper studies an admission control M/M/1 queueing system. It shows that the only gain (average) optimal stationary policies with gain and bias which satisfy the optimality equation are of control limit type, that there are at most two and, if there are two, they occur consecutively. Conditions are provided which ensure the existence of two gain optimal control limit policies and are illustrated with an example. The main result is that bias optimality distinguishes these two gain optimal policies and that the larger of the two control limits is the unique bias optimal stationary policy. Consequently it is also Blackwell optimal. This result is established by appealing to the third optimality equation of the Markov decision process and some observations concerning the structure of solutions of the second optimality equation.

1998 ◽  
Vol 35 (01) ◽  
pp. 136-150 ◽  
Author(s):  
Moshe Haviv ◽  
Martin L. Puterman

This paper studies an admission control M/M/1 queueing system. It shows that the only gain (average) optimal stationary policies with gain and bias which satisfy the optimality equation are of control limit type, that there are at most two and, if there are two, they occur consecutively. Conditions are provided which ensure the existence of two gain optimal control limit policies and are illustrated with an example. The main result is that bias optimality distinguishes these two gain optimal policies and that the larger of the two control limits is the unique bias optimal stationary policy. Consequently it is also Blackwell optimal. This result is established by appealing to the third optimality equation of the Markov decision process and some observations concerning the structure of solutions of the second optimality equation.


1989 ◽  
Vol 3 (2) ◽  
pp. 247-272 ◽  
Author(s):  
Linn I. Sennott

Semi-Markov decision processes underlie the control of many queueing systems. In this paper, we deal with infinite state semi-Markov decision processes with nonnegative, unbounded costs and finite action sets. Axioms for the existence of an expected average cost optimal stationary policy are presented. These conditions generalize the work in Sennott [22] for Markov decision processes. Verifiable conditions for the axioms to hold are obtained. The theory is applied to control of the M/G/l queue with variable service parameter, with on-off server, and with batch processing, and to control of the G/M/m queue with variable arrival parameter and customer rejection. It is applied to a timesharing network of queues with a single server and finally to optimal routing of Poisson arrivals to parallel exponential servers. The final section extends the existence result to compact action spaces.


1993 ◽  
Vol 7 (1) ◽  
pp. 47-67 ◽  
Author(s):  
Linn I. Sennott

We consider a Markov decision chain with countable state space, finite action sets, and nonnegative costs. Conditions for the average cost optimality inequality to be an equality are derived. This extends work of Cavazos-Cadena [8]. It is shown that an optimal stationary policy must satisfy the optimality equation at all positive recurrent states. Structural results on the chain induced by an optimal stationary policy are derived. The results are employed in two examples to prove that any optimal stationary policy must be of critical number form.


2001 ◽  
Vol 15 (4) ◽  
pp. 557-564 ◽  
Author(s):  
Rolando Cavazos-Cadena ◽  
Raúl Montes-de-Oca

This article concerns Markov decision chains with finite state and action spaces, and a control policy is graded via the expected total-reward criterion associated to a nonnegative reward function. Within this framework, a classical theorem guarantees the existence of an optimal stationary policy whenever the optimal value function is finite, a result that is obtained via a limit process using the discounted criterion. The objective of this article is to present an alternative approach, based entirely on the properties of the expected total-reward index, to establish such an existence result.


2015 ◽  
Vol 52 (2) ◽  
pp. 419-440
Author(s):  
Rolando Cavazos-Cadena ◽  
Raúl Montes-De-Oca ◽  
Karel Sladký

This paper concerns discrete-time Markov decision chains with denumerable state and compact action sets. Besides standard continuity requirements, the main assumption on the model is that it admits a Lyapunov function ℓ. In this context the average reward criterion is analyzed from the sample-path point of view. The main conclusion is that if the expected average reward associated to ℓ2 is finite under any policy then a stationary policy obtained from the optimality equation in the standard way is sample-path average optimal in a strong sense.


2010 ◽  
Vol 42 (04) ◽  
pp. 953-985 ◽  
Author(s):  
Xianping Guo ◽  
Liuer Ye

This paper deals with continuous-time Markov decision processes in Polish spaces, under the discounted and average cost criteria. All underlying Markov processes are determined by given transition rates which are allowed to be unbounded, and the costs are assumed to be bounded below. By introducing an occupation measure of a randomized Markov policy and analyzing properties of occupation measures, we first show that the family of all randomized stationary policies is ‘sufficient’ within the class of all randomized Markov policies. Then, under the semicontinuity and compactness conditions, we prove the existence of a discounted cost optimal stationary policy by providing a value iteration technique. Moreover, by developing a new average cost, minimum nonnegative solution method, we prove the existence of an average cost optimal stationary policy under some reasonably mild conditions. Finally, we use some examples to illustrate applications of our results. Except that the costs are assumed to be bounded below, the conditions for the existence of discounted cost (or average cost) optimal policies are much weaker than those in the previous literature, and the minimum nonnegative solution approach is new.


2000 ◽  
Vol 37 (1) ◽  
pp. 300-305 ◽  
Author(s):  
Mark E. Lewis ◽  
Martin L. Puterman

The use of bias optimality to distinguish among gain optimal policies was recently studied by Haviv and Puterman [1] and extended in Lewis et al. [2]. In [1], upon arrival to an M/M/1 queue, customers offer the gatekeeper a reward R. If accepted, the gatekeeper immediately receives the reward, but is charged a holding cost, c(s), depending on the number of customers in the system. The gatekeeper, whose objective is to ‘maximize’ rewards, must decide whether to admit the customer. If the customer is accepted, the customer joins the queue and awaits service. Haviv and Puterman [1] showed there can be only two Markovian, stationary, deterministic gain optimal policies and that only the policy which uses the larger control limit is bias optimal. This showed the usefulness of bias optimality to distinguish between gain optimal policies. In the same paper, they conjectured that if the gatekeeper receives the reward upon completion of a job instead of upon entry, the bias optimal policy will be the lower control limit. This note confirms that conjecture.


2010 ◽  
Vol 42 (4) ◽  
pp. 953-985 ◽  
Author(s):  
Xianping Guo ◽  
Liuer Ye

This paper deals with continuous-time Markov decision processes in Polish spaces, under the discounted and average cost criteria. All underlying Markov processes are determined by given transition rates which are allowed to be unbounded, and the costs are assumed to be bounded below. By introducing an occupation measure of a randomized Markov policy and analyzing properties of occupation measures, we first show that the family of all randomized stationary policies is ‘sufficient’ within the class of all randomized Markov policies. Then, under the semicontinuity and compactness conditions, we prove the existence of a discounted cost optimal stationary policy by providing a value iteration technique. Moreover, by developing a new average cost, minimum nonnegative solution method, we prove the existence of an average cost optimal stationary policy under some reasonably mild conditions. Finally, we use some examples to illustrate applications of our results. Except that the costs are assumed to be bounded below, the conditions for the existence of discounted cost (or average cost) optimal policies are much weaker than those in the previous literature, and the minimum nonnegative solution approach is new.


2006 ◽  
Vol 43 (02) ◽  
pp. 318-334
Author(s):  
Xianping Guo ◽  
Quanxin Zhu

In this paper we study discrete-time Markov decision processes with Borel state and action spaces. The criterion is to minimize average expected costs, and the costs may have neither upper nor lower bounds. We first provide two average optimality inequalities of opposing directions and give conditions for the existence of solutions to them. Then, using the two inequalities, we ensure the existence of an average optimal (deterministic) stationary policy under additional continuity-compactness assumptions. Our conditions are slightly weaker than those in the previous literature. Also, some new sufficient conditions for the existence of an average optimal stationary policy are imposed on the primitive data of the model. Moreover, our approach is slightly different from the well-known ‘optimality inequality approach’ widely used in Markov decision processes. Finally, we illustrate our results in two examples.


1978 ◽  
Vol 15 (2) ◽  
pp. 356-373 ◽  
Author(s):  
A. Federgruen ◽  
H. C. Tijms

This paper is concerned with the optimality equation for the average costs in a denumerable state semi-Markov decision model. It will be shown that under each of a number of recurrency conditions on the transition probability matrices associated with the stationary policies, the optimality equation has a bounded solution. This solution indeed yields a stationary policy which is optimal for a strong version of the average cost optimality criterion. Besides the existence of a bounded solution to the optimality equation, we will show that both the value-iteration method and the policy-iteration method can be used to determine such a solution. For the latter method we will prove that the average costs and the relative cost functions of the policies generated converge to a solution of the optimality equation.


Sign in / Sign up

Export Citation Format

Share Document