Universality of load balancing schemes on the diffusion scale

Abstract We consider a system of N parallel queues with identical exponential service rates and a single dispatcher where tasks arrive as a Poisson process. When a task arrives, the dispatcher always assigns it to an idle server, if there is any, and to a server with the shortest queue among d randomly selected servers otherwise (1≤d≤N). This load balancing scheme subsumes the so-called join-the-idle queue policy (d=1) and the celebrated join-the-shortest queue policy (d=N) as two crucial special cases. We develop a stochastic coupling construction to obtain the diffusion limit of the queue process in the Halfin‒Whitt heavy-traffic regime, and establish that it does not depend on the value of d, implying that assigning tasks to idle servers is sufficient for diffusion level optimality.

Download Full-text

Subdiffusive Load Balancing in Time-Varying Queueing Systems

Operations Research ◽

10.1287/opre.2019.1851 ◽

2019 ◽

Vol 67 (6) ◽

pp. 1678-1698

Author(s):

Rami Atar ◽

Isaac Keslassy ◽

Gal Mendelson

Keyword(s):

Load Balancing ◽

Queue Length ◽

Queueing Systems ◽

Load Condition ◽

Service Rates ◽

Queue Lengths ◽

Join The Shortest Queue ◽

The Difference ◽

Shortest Queue ◽

Diffusion Scale

The degree to which delays or queue lengths equalize under load-balancing algorithms gives a good indication of their performance. Some of the most well-known results in this context are concerned with the asymptotic behavior of the delay or queue length at the diffusion scale under a critical load condition, where arrival and service rates do not vary with time. For example, under the join-the-shortest-queue policy, the queue length deviation process, defined as the difference between the greatest and smallest queue length as it varies over time, is at a smaller scale (subdiffusive) than that of queue lengths (diffusive).

Download Full-text

Steady-state analysis of load-balancing algorithms in the sub-Halfin–Whitt regime

Journal of Applied Probability ◽

10.1017/jpr.2020.13 ◽

2020 ◽

Vol 57 (2) ◽

pp. 578-596 ◽

Cited By ~ 1

Author(s):

Xin Liu ◽

Lei Ying

Keyword(s):

Steady State ◽

Positive Integer ◽

Load Balancing ◽

Heavy Traffic ◽

Sufficient Condition ◽

Steady State Analysis ◽

Server Systems ◽

Join The Shortest Queue ◽

Shortest Queue ◽

Steady State Performance

AbstractWe study a class of load-balancing algorithms for many-server systems (N servers). Each server has a buffer of size $b-1$ with $b=O(\sqrt{\log N})$, i.e. a server can have at most one job in service and $b-1$ jobs queued. We focus on the steady-state performance of load-balancing algorithms in the heavy traffic regime such that the load of the system is $\lambda = 1 - \gamma N^{-\alpha}$ for $0<\alpha<0.5$ and $\gamma > 0,$ which we call the sub-Halfin–Whitt regime ($\alpha=0.5$ is the so-called Halfin–Whitt regime). We establish a sufficient condition under which the probability that an incoming job is routed to an idle server is 1 asymptotically (as $N \to \infty$) at steady state. The class of load-balancing algorithms that satisfy the condition includes join-the-shortest-queue, idle-one-first, join-the-idle-queue, and power-of-d-choices with $d\geq \frac{r}{\gamma}N^\alpha\log N$ (r a positive integer). The proof of the main result is based on the framework of Stein’s method. A key contribution is to use a simple generator approximation based on state space collapse.

Download Full-text

Join-the-Shortest Queue diffusion limit in Halfin–Whitt regime: Sensitivity on the heavy-traffic parameter

The Annals of Applied Probability ◽

10.1214/19-aap1496 ◽

2020 ◽

Vol 30 (1) ◽

pp. 80-144 ◽

Cited By ~ 1

Author(s):

Sayan Banerjee ◽

Debankur Mukherjee

Keyword(s):

Heavy Traffic ◽

Diffusion Limit ◽

Join The Shortest Queue ◽

Shortest Queue

Download Full-text

Steady-State Analysis of the Join-the-Shortest-Queue Model in the Halfin–Whitt Regime

Mathematics of Operations Research ◽

10.1287/moor.2019.1023 ◽

2020 ◽

Vol 45 (3) ◽

pp. 1069-1103

Author(s):

Anton Braverman

Keyword(s):

Steady State ◽

Diffusion Limit ◽

Fluid Limit ◽

Time Intervals ◽

Dimensional Diffusion ◽

Queue Model ◽

Join The Shortest Queue ◽

Shortest Queue ◽

Process Level ◽

General Tool

This paper studies the steady-state properties of the join-the-shortest-queue model in the Halfin–Whitt regime. We focus on the process tracking the number of idle servers and the number of servers with nonempty buffers. Recently, Eschenfeldt and Gamarnik proved that a scaled version of this process converges, over finite time intervals, to a two-dimensional diffusion limit as the number of servers goes to infinity. In this paper, we prove that the diffusion limit is exponentially ergodic and that the diffusion scaled sequence of the steady-state number of idle servers and nonempty buffers is tight. Combined with the process-level convergence proved by Eschenfeldt and Gamarnik, our results imply convergence of steady-state distributions. The methodology used is the generator expansion framework based on Stein’s method, also referred to as the drift-based fluid limit Lyapunov function approach in Stolyar. One technical contribution to the framework is to show how it can be used as a general tool to establish exponential ergodicity.

Download Full-text

STRATEGIC DYNAMIC JOCKEYING BETWEEN TWO PARALLEL QUEUES

Probability in the Engineering and Informational Sciences ◽

10.1017/s0269964815000273 ◽

2015 ◽

Vol 30 (1) ◽

pp. 41-60 ◽

Cited By ~ 2

Author(s):

Amin Dehghanian ◽

Jeffrey P. Kharoufeh ◽

Mohammad Modarres

Keyword(s):

Poisson Process ◽

Sojourn Time ◽

Queueing System ◽

The Other ◽

Infinite Time ◽

Numerical Examples ◽

Parallel Queues ◽

Time Horizons ◽

Holding Cost ◽

Shortest Queue

Consider a two-station, heterogeneous parallel queueing system in which each station operates as an independent M/M/1 queue with its own infinite-capacity buffer. The input to the system is a Poisson process that splits among the two stations according to a Bernoulli splitting mechanism. However, upon arrival, a strategic customer initially joins one of the queues selectively and decides at subsequent arrival and departure epochs whether to jockey (or switch queues) with the aim of reducing her own sojourn time. There is a holding cost per unit time, and jockeying incurs a fixed non-negative cost while placing the customer at the end of the other queue. We examine individually optimal joining and jockeying policies that minimize the strategic customer's total expected discounted (or undiscounted) costs over finite and infinite time horizons. The main results reveal that, if the strategic customer is in station 1 with ℓ customers in front of her, and q1 and q2 customers in stations 1 and 2, respectively (excluding herself), then the incentive to jockey increases as either ℓ increases or q2 decreases. Numerical examples reveal that it may not be optimal to join, and/or jockey to, the station with the shortest queue or the fastest server.

Download Full-text

A diffusion model for two parallel queues with processor sharing: transient behavior and asymptotics

Journal of Applied Mathematics and Stochastic Analysis ◽

10.1155/s1048953399000295 ◽

1999 ◽

Vol 12 (4) ◽

pp. 311-338 ◽

Cited By ~ 1

Author(s):

Charles Knessl

Keyword(s):

Diffusion Approximation ◽

Queue Length ◽

Heavy Traffic ◽

Transient Behavior ◽

Processor Sharing ◽

Parallel Queues ◽

Poisson Arrival ◽

Heavy Traffic Limit ◽

Service Rates ◽

Dependent Probability

We consider two identical, parallel M/M/1 queues. Both queues are fed by a Poisson arrival stream of rate λ and have service rates equal to μ. When both queues are non-empty, the two systems behave independently of each other. However, when one of the queues becomes empty, the corresponding server helps in the other queue. This is called head-of-the-line processor sharing. We study this model in the heavy traffic limit, where ρ=λ/μ→1. We formulate the heavy traffic diffusion approximation and explicitly compute the time-dependent probability of the diffusion approximation to the joint queue length process. We then evaluate the solution asymptotically for large values of space and/or time. This leads to simple expressions that show how the process achieves its stead state and other transient aspects.

Download Full-text

Join the shortest queue among $$k$$ parallel queues: tail asymptotics of its stationary distribution

Queueing Systems ◽

10.1007/s11134-013-9353-y ◽

2013 ◽

Vol 74 (2-3) ◽

pp. 303-332

Author(s):

Masahiro Kobayashi ◽

Yutaka Sakuma ◽

Masakiyo Miyazawa

Keyword(s):

Stationary Distribution ◽

Tail Asymptotics ◽

Parallel Queues ◽

Join The Shortest Queue ◽

Shortest Queue

Download Full-text

Multiple-server system with flexible arrivals

Advances in Applied Probability ◽

10.1239/aap/1324045695 ◽

2011 ◽

Vol 43 (4) ◽

pp. 985-1004 ◽

Cited By ~ 8

Author(s):

Osman T. Akgun ◽

Rhonda Righter ◽

Ronald Wolff

Keyword(s):

Finite Buffers ◽

Service Production ◽

Weak Majorization ◽

Service Rates ◽

Join The Shortest Queue ◽

Traffic Systems ◽

Shortest Queue ◽

Performance Gains ◽

Number Of Customers ◽

Server System

In many service, production, and traffic systems there are multiple types of customers requiring different types of ‘servers’, i.e. different services, products, or routes. Often, however, a proportion of the customers are flexible, i.e. they are willing to change their type in order to achieve faster service, and even if this proportion is small, it has the potential of achieving large performance gains. We generalize earlier results on the optimality of ‘join the shortest queue’ (JSQ) for flexible arrivals to the following: arbitrary arrivals where only a subset are flexible, multiple-server stations, and abandonments. Surprisingly, with abandonments, the optimality of JSQ for minimizing the number of customers in the system depends on the relative abandonment and service rates. We extend our model to finite buffers and resequencing. We assume exponential service. Our optimality results are very strong; we minimize the queue length process in the weak majorization sense.

Download Full-text

Heavy Traffic Limits for Join-the-Shortest-Estimated-Queue Policy Using Delayed Information

Mathematics of Operations Research ◽

10.1287/moor.2020.1056 ◽

2020 ◽

Author(s):

Rami Atar ◽

David Lipshutz

Keyword(s):

Diffusion Model ◽

Heavy Traffic ◽

Partial Observations ◽

Stochastic Delay ◽

Parallel Queues ◽

Traffic Conditions ◽

Current State ◽

Queue Lengths ◽

Shortest Queue ◽

Stochastic Delay Equation

We consider a load-balancing problem for a network of parallel queues in which information on the state of the queues is subject to a delay. In this setting, adopting a routing policy that performs well when applied to the current state of the queues can perform quite poorly when applied to the delayed state of the queues. Viewing this as a problem of control under partial observations, we propose using an estimate of the current queue lengths as the input to the join-the-shortest-queue policy. For a general class of estimation schemes, under heavy traffic conditions, we prove convergence of the diffusion-scaled process to a solution of a so-called diffusion model, in which an important step toward this goal establishes that the estimated queue lengths undergo state-space collapse. In some cases, our diffusion model is given by a novel stochastic delay equation with reflection, in which the Skorokhod boundary term appears with delay. We illustrate our results with examples of natural estimation schemes, discuss their implementability, and compare their relative performance using simulations.

Download Full-text

Asymptotic Optimality of Power-of-d Load Balancing in Large-Scale Systems

Mathematics of Operations Research ◽

10.1287/moor.2019.1042 ◽

2020 ◽

Vol 45 (4) ◽

pp. 1535-1571 ◽

Cited By ~ 1

Author(s):

Debankur Mukherjee ◽

Sem C. Borst ◽

Johan S. H. van Leeuwaarden ◽

Philip A. Whiting

Keyword(s):

Large Scale ◽

Asymptotic Optimality ◽

Diffusion Limit ◽

Fluid Limit ◽

Large Scale Systems ◽

Minimum Number ◽

Join The Shortest Queue ◽

The Difference ◽

Shortest Queue ◽

And Diffusion

We consider a system of N identical server pools and a single dispatcher in which tasks with unit-exponential service requirements arrive at rate [Formula: see text]. In order to optimize the experienced performance, the dispatcher aims to evenly distribute the tasks across the various server pools. Specifically, when a task arrives, the dispatcher assigns it to the server pool with the minimum number of tasks among d(N) randomly selected server pools. We construct a stochastic coupling to bound the difference in the system occupancy processes between the join-the-shortest-queue (JSQ) policy and a scheme with an arbitrary value of d(N). We use the coupling to derive the fluid limit in case [Formula: see text] and [Formula: see text] as [Formula: see text] along with the associated fixed point. The fluid limit turns out to be insensitive to the exact growth rate of d(N) and coincides with that for the JSQ policy. We further establish that the diffusion limit corresponds to that for the JSQ policy as well, as long as [Formula: see text], and characterize the common limiting diffusion process. These results indicate that the JSQ optimality can be preserved at the fluid and diffusion levels while reducing the overhead by nearly a factor O(N) and O([Formula: see text]), respectively.

Download Full-text