STOCHASTIC DISCRETIZATION FOR THE LONG-RUN AVERAGE REWARD IN FLUID MODELS

2003 ◽  
Vol 17 (2) ◽  
pp. 251-265 ◽  
Author(s):  
I.J.B.F. Adan ◽  
J.A.C. Resing ◽  
V.G. Kulkarni

Stochastic discretization is a technique of representing a continuous random variable as a random sum of i.i.d. exponential random variables. In this article, we apply this technique to study the limiting behavior of a stochastic fluid model. Specifically, we consider an infinite-capacity fluid buffer, where the net input of fluid is regulated by a finite-state irreducible continuous-time Markov chain. Most long-run performance characteristics for such a fluid system can be expressed as the long-run average reward for a suitably chosen reward structure. In this article, we use stochastic discretization of the fluid content process to efficiently determine the long-run average reward. This method transforms the continuous-state Markov process describing the fluid model into a discrete-state quasi-birth–death process. Hence, standard tools, such as the matrix-geometric approach, become available for the analysis of the fluid buffer. To demonstrate this approach, we analyze the output of a buffer processing fluid from K sources on a first-come first-served basis.

2002 ◽  
Vol 39 (01) ◽  
pp. 20-37 ◽  
Author(s):  
Mark E. Lewis ◽  
Hayriye Ayhan ◽  
Robert D. Foley

We consider a finite-capacity queueing system where arriving customers offer rewards which are paid upon acceptance into the system. The gatekeeper, whose objective is to ‘maximize’ rewards, decides if the reward offered is sufficient to accept or reject the arriving customer. Suppose the arrival rates, service rates, and system capacity are changing over time in a known manner. We show that all bias optimal (a refinement of long-run average reward optimal) policies are of threshold form. Furthermore, we give sufficient conditions for the bias optimal policy to be monotonic in time. We show, via a counterexample, that if these conditions are violated, the optimal policy may not be monotonic in time or of threshold form.


1999 ◽  
Vol 13 (3) ◽  
pp. 309-327 ◽  
Author(s):  
Mark E. Lewis ◽  
Hayriye Ayhan ◽  
Robert D. Foley

We consider a finite capacity queueing system in which each arriving customer offers a reward. A gatekeeper decides based on the reward offered and the space remaining whether each arriving customer should be accepted or rejected. The gatekeeper only receives the offered reward if the customer is accepted. A traditional objective function is to maximize the gain, that is, the long-run average reward. It is quite possible, however, to have several different gain optimal policies that behave quite differently. Bias and Blackwell optimality are more refined objective functions that can distinguish among multiple stationary, deterministic gain optimal policies. This paper focuses on describing the structure of stationary, deterministic, optimal policies and extending this optimality to distinguish between multiple gain optimal policies. We show that these policies are of trunk reservation form and must occur consecutively. We then prove that we can distinguish among these gain optimal policies using the bias or transient reward and extend to Blackwell optimality.


1982 ◽  
Vol 19 (2) ◽  
pp. 301-309 ◽  
Author(s):  
Zvi Rosberg

A semi-Markov decision process, with a denumerable multidimensional state space, is considered. At any given state only a finite number of actions can be taken to control the process. The immediate reward earned in one transition period is merely assumed to be bounded by a polynomial and a bound is imposed on a weighted moment of the next state reached in one transition. It is shown that under an ergodicity assumption there is a stationary optimal policy for the long-run average reward criterion. A queueing network scheduling problem, for which previous criteria are inapplicable, is given as an application.


2002 ◽  
Vol 39 (1) ◽  
pp. 20-37 ◽  
Author(s):  
Mark E. Lewis ◽  
Hayriye Ayhan ◽  
Robert D. Foley

We consider a finite-capacity queueing system where arriving customers offer rewards which are paid upon acceptance into the system. The gatekeeper, whose objective is to ‘maximize’ rewards, decides if the reward offered is sufficient to accept or reject the arriving customer. Suppose the arrival rates, service rates, and system capacity are changing over time in a known manner. We show that all bias optimal (a refinement of long-run average reward optimal) policies are of threshold form. Furthermore, we give sufficient conditions for the bias optimal policy to be monotonic in time. We show, via a counterexample, that if these conditions are violated, the optimal policy may not be monotonic in time or of threshold form.


2003 ◽  
Vol 40 (1) ◽  
pp. 250-256 ◽  
Author(s):  
Erol A. Peköz

We consider a multiarmed bandit problem, where each arm when pulled generates independent and identically distributed nonnegative rewards according to some unknown distribution. The goal is to maximize the long-run average reward per pull with the restriction that any previously learned information is forgotten whenever a switch between arms is made. We present several policies and a peculiarity surrounding them.


1990 ◽  
Vol 22 (02) ◽  
pp. 494-497 ◽  
Author(s):  
Lam Yeh

In this paper, we study a similar replacement model in which the successive survival times of the system form a process with non-increasing means, whereas the consecutive repair times after failure constitute a process with non-decreasing means. The system is replaced at the time of the Nth failure since the installation or last replacement. Based on the long-run average cost per unit time, we determine the optimal replacement policy N∗ and the maximum of the long-run average reward explicitly. Under additional conditions, the policy N∗ is even optimal among all replacement policies.


2003 ◽  
Vol 40 (01) ◽  
pp. 250-256
Author(s):  
Erol A. Peköz

We consider a multiarmed bandit problem, where each arm when pulled generates independent and identically distributed nonnegative rewards according to some unknown distribution. The goal is to maximize the long-run average reward per pull with the restriction that any previously learned information is forgotten whenever a switch between arms is made. We present several policies and a peculiarity surrounding them.


1990 ◽  
Vol 22 (2) ◽  
pp. 494-497 ◽  
Author(s):  
Lam Yeh

In this paper, we study a similar replacement model in which the successive survival times of the system form a process with non-increasing means, whereas the consecutive repair times after failure constitute a process with non-decreasing means. The system is replaced at the time of the Nth failure since the installation or last replacement. Based on the long-run average cost per unit time, we determine the optimal replacement policy N∗ and the maximum of the long-run average reward explicitly. Under additional conditions, the policy N∗ is even optimal among all replacement policies.


1996 ◽  
Vol 10 (4) ◽  
pp. 569-590 ◽  
Author(s):  
Florin Avram ◽  
Fikri Karaesmen

We develop a method for computing the optimal double band [b, B] policy for switching between two diffusions with continuous rewards and switching costs. The two switch levels [b, B] are obtained as perturbations of the single optimal switching point a of the control problem with no switching costs. More precisely, we find that in the case of average reward problems the optimal switch levels can be obtained by intersecting two curves: (a) the function, γ(a), which represents the long-run average reward if we were to switch between the two diffusions at a and switches were free, and (b) a horizontal line whose height depends on the size of the transaction costs. Our semianalytical approach reduces, for example, the solution of a problem recently posed by Perry and Bar-Lev (1989, in Stochastic Analysis and Applications 7: 103–115) to the solution of one nonlinear equation.


Sign in / Sign up

Export Citation Format

Share Document