BIAS OPTIMALITY IN A QUEUE WITH ADMISSION CONTROL

1999 ◽  
Vol 13 (3) ◽  
pp. 309-327 ◽  
Author(s):  
Mark E. Lewis ◽  
Hayriye Ayhan ◽  
Robert D. Foley

We consider a finite capacity queueing system in which each arriving customer offers a reward. A gatekeeper decides based on the reward offered and the space remaining whether each arriving customer should be accepted or rejected. The gatekeeper only receives the offered reward if the customer is accepted. A traditional objective function is to maximize the gain, that is, the long-run average reward. It is quite possible, however, to have several different gain optimal policies that behave quite differently. Bias and Blackwell optimality are more refined objective functions that can distinguish among multiple stationary, deterministic gain optimal policies. This paper focuses on describing the structure of stationary, deterministic, optimal policies and extending this optimality to distinguish between multiple gain optimal policies. We show that these policies are of trunk reservation form and must occur consecutively. We then prove that we can distinguish among these gain optimal policies using the bias or transient reward and extend to Blackwell optimality.

2002 ◽  
Vol 39 (01) ◽  
pp. 20-37 ◽  
Author(s):  
Mark E. Lewis ◽  
Hayriye Ayhan ◽  
Robert D. Foley

We consider a finite-capacity queueing system where arriving customers offer rewards which are paid upon acceptance into the system. The gatekeeper, whose objective is to ‘maximize’ rewards, decides if the reward offered is sufficient to accept or reject the arriving customer. Suppose the arrival rates, service rates, and system capacity are changing over time in a known manner. We show that all bias optimal (a refinement of long-run average reward optimal) policies are of threshold form. Furthermore, we give sufficient conditions for the bias optimal policy to be monotonic in time. We show, via a counterexample, that if these conditions are violated, the optimal policy may not be monotonic in time or of threshold form.


2002 ◽  
Vol 39 (1) ◽  
pp. 20-37 ◽  
Author(s):  
Mark E. Lewis ◽  
Hayriye Ayhan ◽  
Robert D. Foley

We consider a finite-capacity queueing system where arriving customers offer rewards which are paid upon acceptance into the system. The gatekeeper, whose objective is to ‘maximize’ rewards, decides if the reward offered is sufficient to accept or reject the arriving customer. Suppose the arrival rates, service rates, and system capacity are changing over time in a known manner. We show that all bias optimal (a refinement of long-run average reward optimal) policies are of threshold form. Furthermore, we give sufficient conditions for the bias optimal policy to be monotonic in time. We show, via a counterexample, that if these conditions are violated, the optimal policy may not be monotonic in time or of threshold form.


2001 ◽  
Vol 38 (2) ◽  
pp. 369-385 ◽  
Author(s):  
Mark E. Lewis

We consider a controlled M/M/1 queueing system where customers may be subject to two potential rejections. The first occurs upon arrival and is dependent on the number of customers in the queue and the service rate of the customer currently in service. The second, which may or may not occur, occurs immediately prior to the customer receiving service. That is, after each service completion the customer in the front of the queue is assessed and the service rate of that customer is revealed. If the second decision-maker recommends rejection, the customer is denied service with a fixed probability. We show the existence of long-run average optimal monotone switching-curve policies. Further, we show that the average reward is increasing in the probability that the second decision-maker's recommendation of rejection is honored. Applications include call centers with delayed classifications and manufacturing systems when the server is responsible for multiple tasks.


2001 ◽  
Vol 38 (02) ◽  
pp. 369-385 ◽  
Author(s):  
Mark E. Lewis

We consider a controlled M/M/1 queueing system where customers may be subject to two potential rejections. The first occurs upon arrival and is dependent on the number of customers in the queue and the service rate of the customer currently in service. The second, which may or may not occur, occurs immediately prior to the customer receiving service. That is, after each service completion the customer in the front of the queue is assessed and the service rate of that customer is revealed. If the second decision-maker recommends rejection, the customer is denied service with a fixed probability. We show the existence of long-run average optimal monotone switching-curve policies. Further, we show that the average reward is increasing in the probability that the second decision-maker's recommendation of rejection is honored. Applications include call centers with delayed classifications and manufacturing systems when the server is responsible for multiple tasks.


2017 ◽  
Vol 32 (2) ◽  
pp. 163-178 ◽  
Author(s):  
Kenneth C. Chong ◽  
Shane G. Henderson ◽  
Mark E. Lewis

We consider the problem of routing and admission control in a loss system featuring two classes of arriving jobs (high-priority and low-priority jobs) and two types of servers, in which decision-making for high-priority jobs is forced, and rewards influence the desirability of each of the four possible routing decisions. We seek a policy that maximizes expected long-run reward, under both the discounted reward and long-run average reward criteria, and formulate the problem as a Markov decision process. When the reward structure favors high-priority jobs, we demonstrate that there exists an optimal monotone switching curve policy with slope of at least −1. When the reward structure favors low-priority jobs, we demonstrate that the value function, in general, lacks structure, which complicates the search for structure in optimal policies. However, we identify conditions under which optimal policies can be characterized in greater detail. We also examine the performance of heuristic policies in a brief numerical study.


1994 ◽  
Vol 8 (4) ◽  
pp. 463-489 ◽  
Author(s):  
Eugene A. Feinberg ◽  
Martin I. Reiman

We consider a controlled queueing system that is a generalization of the M/M/c/W queue. There are m types of customers that arrive according to independent Poisson processes. Service times are exponential and independent and do not depend on the customer type. There is room in the system for a total of N customers; if there are N customers in the system, new arrivals are lost. Type j customers are more profitable than type (j + 1 ) customers, j = 2,…, m —, and type 1 customers are at least as profitable as type 2 customers. The allowed control is to accept or reject customers at arrival. No preemption of customers in service is allowed. The goal is to maximize the average reward per unit of time subject to a constraint that the blocking probability of type 1 customers is no greater than a given level.For an M/M/c/c system without a constraint, Miller [12] proved that an optimal policy has a simple threshold structure. We show that, for the constrained problem described above, an optimal policy has a similar structure, but one of the thresholds might have to be randomized. We also derive an algorithm that constructs an optimal policy and describe other forms of optimal policies.


2003 ◽  
Vol 17 (2) ◽  
pp. 251-265 ◽  
Author(s):  
I.J.B.F. Adan ◽  
J.A.C. Resing ◽  
V.G. Kulkarni

Stochastic discretization is a technique of representing a continuous random variable as a random sum of i.i.d. exponential random variables. In this article, we apply this technique to study the limiting behavior of a stochastic fluid model. Specifically, we consider an infinite-capacity fluid buffer, where the net input of fluid is regulated by a finite-state irreducible continuous-time Markov chain. Most long-run performance characteristics for such a fluid system can be expressed as the long-run average reward for a suitably chosen reward structure. In this article, we use stochastic discretization of the fluid content process to efficiently determine the long-run average reward. This method transforms the continuous-state Markov process describing the fluid model into a discrete-state quasi-birth–death process. Hence, standard tools, such as the matrix-geometric approach, become available for the analysis of the fluid buffer. To demonstrate this approach, we analyze the output of a buffer processing fluid from K sources on a first-come first-served basis.


1982 ◽  
Vol 19 (2) ◽  
pp. 301-309 ◽  
Author(s):  
Zvi Rosberg

A semi-Markov decision process, with a denumerable multidimensional state space, is considered. At any given state only a finite number of actions can be taken to control the process. The immediate reward earned in one transition period is merely assumed to be bounded by a polynomial and a bound is imposed on a weighted moment of the next state reached in one transition. It is shown that under an ergodicity assumption there is a stationary optimal policy for the long-run average reward criterion. A queueing network scheduling problem, for which previous criteria are inapplicable, is given as an application.


2003 ◽  
Vol 40 (1) ◽  
pp. 250-256 ◽  
Author(s):  
Erol A. Peköz

We consider a multiarmed bandit problem, where each arm when pulled generates independent and identically distributed nonnegative rewards according to some unknown distribution. The goal is to maximize the long-run average reward per pull with the restriction that any previously learned information is forgotten whenever a switch between arms is made. We present several policies and a peculiarity surrounding them.


Sign in / Sign up

Export Citation Format

Share Document