Computing Optimal Policies for Controlled Tandem Queueing Systems

1987 ◽  
Vol 35 (1) ◽  
pp. 121-126 ◽  
Author(s):  
Katsuhisa Ohno ◽  
Kuniyoshi Ichiki
1997 ◽  
Vol 29 (01) ◽  
pp. 114-137
Author(s):  
Linn I. Sennott

This paper studies the expected average cost control problem for discrete-time Markov decision processes with denumerably infinite state spaces. A sequence of finite state space truncations is defined such that the average costs and average optimal policies in the sequence converge to the optimal average cost and an optimal policy in the original process. The theory is illustrated with several examples from the control of discrete-time queueing systems. Numerical results are discussed.


2000 ◽  
Vol 37 (1) ◽  
pp. 300-305 ◽  
Author(s):  
Mark E. Lewis ◽  
Martin L. Puterman

The use of bias optimality to distinguish among gain optimal policies was recently studied by Haviv and Puterman [1] and extended in Lewis et al. [2]. In [1], upon arrival to an M/M/1 queue, customers offer the gatekeeper a reward R. If accepted, the gatekeeper immediately receives the reward, but is charged a holding cost, c(s), depending on the number of customers in the system. The gatekeeper, whose objective is to ‘maximize’ rewards, must decide whether to admit the customer. If the customer is accepted, the customer joins the queue and awaits service. Haviv and Puterman [1] showed there can be only two Markovian, stationary, deterministic gain optimal policies and that only the policy which uses the larger control limit is bias optimal. This showed the usefulness of bias optimality to distinguish between gain optimal policies. In the same paper, they conjectured that if the gatekeeper receives the reward upon completion of a job instead of upon entry, the bias optimal policy will be the lower control limit. This note confirms that conjecture.


1994 ◽  
Vol 26 (01) ◽  
pp. 155-171 ◽  
Author(s):  
Panayotis D. Sparaggis ◽  
Don Towsley ◽  
Christos G. Cassandras

We present two forms of weak majorization, namely, very weak majorization and p-weak majorization that can be used as sample path criteria in the analysis of queueing systems. We demonstrate how these two criteria can be used in making comparisons among the joint queue lengths of queueing systems with blocking and/or multiple classes, by capturing an interesting interaction between state and performance descriptors. As a result, stochastic orderings on performance measures such as the cumulative number of losses can be derived. We describe applications that involve the determination of optimal policies in the context of load-balancing and scheduling.


2016 ◽  
Vol 67 (4) ◽  
pp. 629-643 ◽  
Author(s):  
Rob Shone ◽  
Vincent A Knight ◽  
Paul R Harper ◽  
Janet E Williams ◽  
John Minty

1988 ◽  
Vol 20 (2) ◽  
pp. 447-472 ◽  
Author(s):  
Tze Leung Lai ◽  
Zhiliang Ying

Asymptotic approximations are developed herein for the optimal policies in discounted multi-armed bandit problems in which new projects are continually appearing, commonly known as ‘open bandit problems’ or ‘arm-acquiring bandits’. It is shown that under certain stability assumptions the open bandit problem is asymptotically equivalent to a closed bandit problem in which there is no arrival of new projects, as the discount factor approaches 1. Applications of these results to optimal scheduling of queueing networks are given. In particular, Klimov&s priority indices for scheduling queueing networks are shown to be limits of the Gittins indices for the associated closed bandit problem, and extensions of Klimov&s results to preemptive policies and to unstable queueing systems are given.


1988 ◽  
Vol 20 (02) ◽  
pp. 447-472 ◽  
Author(s):  
Tze Leung Lai ◽  
Zhiliang Ying

Asymptotic approximations are developed herein for the optimal policies in discounted multi-armed bandit problems in which new projects are continually appearing, commonly known as ‘open bandit problems’ or ‘arm-acquiring bandits’. It is shown that under certain stability assumptions the open bandit problem is asymptotically equivalent to a closed bandit problem in which there is no arrival of new projects, as the discount factor approaches 1. Applications of these results to optimal scheduling of queueing networks are given. In particular, Klimov&s priority indices for scheduling queueing networks are shown to be limits of the Gittins indices for the associated closed bandit problem, and extensions of Klimov&s results to preemptive policies and to unstable queueing systems are given.


1994 ◽  
Vol 26 (1) ◽  
pp. 155-171 ◽  
Author(s):  
Panayotis D. Sparaggis ◽  
Don Towsley ◽  
Christos G. Cassandras

We present two forms of weak majorization, namely, very weak majorization and p-weak majorization that can be used as sample path criteria in the analysis of queueing systems. We demonstrate how these two criteria can be used in making comparisons among the joint queue lengths of queueing systems with blocking and/or multiple classes, by capturing an interesting interaction between state and performance descriptors. As a result, stochastic orderings on performance measures such as the cumulative number of losses can be derived. We describe applications that involve the determination of optimal policies in the context of load-balancing and scheduling.


2000 ◽  
Vol 37 (01) ◽  
pp. 300-305 ◽  
Author(s):  
Mark E. Lewis ◽  
Martin L. Puterman

The use ofbias optimalityto distinguish among gain optimal policies was recently studied by Haviv and Puterman [1] and extended in Lewiset al.[2]. In [1], upon arrival to anM/M/1 queue, customers offer the gatekeeper a rewardR. If accepted, the gatekeeper immediately receives the reward, but is charged a holding cost,c(s), depending on the number of customers in the system. The gatekeeper, whose objective is to ‘maximize’ rewards, must decide whether to admit the customer. If the customer is accepted, the customer joins the queue and awaits service. Haviv and Puterman [1] showed there can be only two Markovian, stationary, deterministic gain optimal policies and that only the policy which uses thelargercontrol limit is bias optimal. This showed the usefulness of bias optimality to distinguish between gain optimal policies. In the same paper, they conjectured that if the gatekeeper receives the reward uponcompletionof a job instead of upon entry, the bias optimal policy will be the lower control limit. This note confirms that conjecture.


1997 ◽  
Vol 29 (1) ◽  
pp. 114-137 ◽  
Author(s):  
Linn I. Sennott

This paper studies the expected average cost control problem for discrete-time Markov decision processes with denumerably infinite state spaces. A sequence of finite state space truncations is defined such that the average costs and average optimal policies in the sequence converge to the optimal average cost and an optimal policy in the original process. The theory is illustrated with several examples from the control of discrete-time queueing systems. Numerical results are discussed.


Sign in / Sign up

Export Citation Format

Share Document