Bias Optimality

Author(s):  
Mark E. Lewis ◽  
Martin L. Puterman
Keyword(s):  
1990 ◽  
Vol 27 (01) ◽  
pp. 134-145
Author(s):  
Matthias Fassbender

This paper establishes the existence of an optimal stationary strategy in a leavable Markov decision process with countable state space and undiscounted total reward criterion. Besides assumptions of boundedness and continuity, an assumption is imposed on the model which demands the continuity of the mean recurrence times on a subset of the stationary strategies, the so-called ‘good strategies'. For practical applications it is important that this assumption is implied by an assumption about the cost structure and the transition probabilities. In the last part we point out that our results in general cannot be deduced from related works on bias-optimality by Dekker and Hordijk, Wijngaard or Mann.


1990 ◽  
Vol 27 (1) ◽  
pp. 134-145
Author(s):  
Matthias Fassbender

This paper establishes the existence of an optimal stationary strategy in a leavable Markov decision process with countable state space and undiscounted total reward criterion.Besides assumptions of boundedness and continuity, an assumption is imposed on the model which demands the continuity of the mean recurrence times on a subset of the stationary strategies, the so-called ‘good strategies'. For practical applications it is important that this assumption is implied by an assumption about the cost structure and the transition probabilities. In the last part we point out that our results in general cannot be deduced from related works on bias-optimality by Dekker and Hordijk, Wijngaard or Mann.


2000 ◽  
Vol 37 (1) ◽  
pp. 300-305 ◽  
Author(s):  
Mark E. Lewis ◽  
Martin L. Puterman

The use of bias optimality to distinguish among gain optimal policies was recently studied by Haviv and Puterman [1] and extended in Lewis et al. [2]. In [1], upon arrival to an M/M/1 queue, customers offer the gatekeeper a reward R. If accepted, the gatekeeper immediately receives the reward, but is charged a holding cost, c(s), depending on the number of customers in the system. The gatekeeper, whose objective is to ‘maximize’ rewards, must decide whether to admit the customer. If the customer is accepted, the customer joins the queue and awaits service. Haviv and Puterman [1] showed there can be only two Markovian, stationary, deterministic gain optimal policies and that only the policy which uses the larger control limit is bias optimal. This showed the usefulness of bias optimality to distinguish between gain optimal policies. In the same paper, they conjectured that if the gatekeeper receives the reward upon completion of a job instead of upon entry, the bias optimal policy will be the lower control limit. This note confirms that conjecture.


1998 ◽  
Vol 35 (1) ◽  
pp. 136-150 ◽  
Author(s):  
Moshe Haviv ◽  
Martin L. Puterman

This paper studies an admission control M/M/1 queueing system. It shows that the only gain (average) optimal stationary policies with gain and bias which satisfy the optimality equation are of control limit type, that there are at most two and, if there are two, they occur consecutively. Conditions are provided which ensure the existence of two gain optimal control limit policies and are illustrated with an example. The main result is that bias optimality distinguishes these two gain optimal policies and that the larger of the two control limits is the unique bias optimal stationary policy. Consequently it is also Blackwell optimal. This result is established by appealing to the third optimality equation of the Markov decision process and some observations concerning the structure of solutions of the second optimality equation.


1999 ◽  
Vol 13 (3) ◽  
pp. 309-327 ◽  
Author(s):  
Mark E. Lewis ◽  
Hayriye Ayhan ◽  
Robert D. Foley

We consider a finite capacity queueing system in which each arriving customer offers a reward. A gatekeeper decides based on the reward offered and the space remaining whether each arriving customer should be accepted or rejected. The gatekeeper only receives the offered reward if the customer is accepted. A traditional objective function is to maximize the gain, that is, the long-run average reward. It is quite possible, however, to have several different gain optimal policies that behave quite differently. Bias and Blackwell optimality are more refined objective functions that can distinguish among multiple stationary, deterministic gain optimal policies. This paper focuses on describing the structure of stationary, deterministic, optimal policies and extending this optimality to distinguish between multiple gain optimal policies. We show that these policies are of trunk reservation form and must occur consecutively. We then prove that we can distinguish among these gain optimal policies using the bias or transient reward and extend to Blackwell optimality.


1998 ◽  
Vol 35 (01) ◽  
pp. 136-150 ◽  
Author(s):  
Moshe Haviv ◽  
Martin L. Puterman

This paper studies an admission control M/M/1 queueing system. It shows that the only gain (average) optimal stationary policies with gain and bias which satisfy the optimality equation are of control limit type, that there are at most two and, if there are two, they occur consecutively. Conditions are provided which ensure the existence of two gain optimal control limit policies and are illustrated with an example. The main result is that bias optimality distinguishes these two gain optimal policies and that the larger of the two control limits is the unique bias optimal stationary policy. Consequently it is also Blackwell optimal. This result is established by appealing to the third optimality equation of the Markov decision process and some observations concerning the structure of solutions of the second optimality equation.


2006 ◽  
Vol 45 (1) ◽  
pp. 51-73 ◽  
Author(s):  
Tomás Prieto-Rumeau ◽  
Onésimo Hernández-Lerma

Sign in / Sign up

Export Citation Format

Share Document