decision epoch
Recently Published Documents


TOTAL DOCUMENTS

13
(FIVE YEARS 7)

H-INDEX

3
(FIVE YEARS 1)

Author(s):  
David Gamarnik ◽  
John N. Tsitsiklis ◽  
Martin Zubeldia

We consider a heterogeneous distributed service system consisting of n servers with unknown and possibly different processing rates. Jobs with unit mean arrive as a renewal process of rate proportional to n and are immediately dispatched to one of several queues associated with the servers. We assume that the dispatching decisions are made by a central dispatcher with the ability to exchange messages with the servers and endowed with a finite memory used to store information from one decision epoch to the next, about the current state of the queues and about the service rates of the servers. We study the fundamental resource requirements (memory bits and message exchange rate) in order for a dispatching policy to be always stable. First, we present a policy that is always stable while using a positive (but arbitrarily small) message rate and [Formula: see text] bits of memory. Second, we show that within a certain broad class of policies, a dispatching policy that exchanges [Formula: see text] messages per unit of time, and with [Formula: see text] bits of memory, cannot be always stable.


Author(s):  
Wenjing Guo ◽  
Bilge Atasoy ◽  
Wouter Beelaerts van Blokland ◽  
Rudy R. Negenborn

AbstractThis paper investigates a dynamic and stochastic shipment matching problem faced by network operators in hinterland synchromodal transportation. We consider a platform that receives contractual and spot shipment requests from shippers, and receives multimodal services from carriers. The platform aims to provide optimal matches between shipment requests and multimodal services within a finite horizon under spot request uncertainty. Due to the capacity limitation of multimodal services, the matching decisions made for current requests will affect the ability to make good matches for future requests. To solve the problem, this paper proposes an anticipatory approach which consists of a rolling horizon framework that handles dynamic events, a sample average approximation method that addresses uncertainties, and a progressive hedging algorithm that generates solutions at each decision epoch. Compared with the greedy approach which is commonly used in practice, the anticipatory approach has total cost savings up to 8.18% under realistic instances. The experimental results highlight the benefits of incorporating stochastic information in dynamic decision making processes of the synchromodal matching system.


2021 ◽  
Author(s):  
Akram Khaleghei ◽  
Michael Jong Kim

In “Optimal Control of Partially Observable Semi-Markovian Failing Systems: An Analysis using a Phase Methodology,” Khaleghei and Kim study a maintenance control problem a as partially observable semi-Markov decision process (POSMDP), a problem class that is typically computationally intractable and not amenable to structural analysis. The authors develop a new approach based on a phase methodology where the idea is to view the intractable POSMDP as the limiting problem of a sequence of tractable POMDPs. They show that the optimal control policy can be represented as a control limit policy which monitors the estimated conditional reliability at each decision epoch, and, by exploiting this structure, an efficient computational approach to solve for the optimal control limit and corresponding optimal value is developed.


2021 ◽  
Vol 229 ◽  
pp. 01047
Author(s):  
Abdellatif Semmouri ◽  
Mostafa Jourhmane ◽  
Bahaa Eddine Elbaghazaoui

In this paper we consider a constrained optimization of discrete time Markov Decision Processes (MDPs) with finite state and action spaces, which accumulate both a reward and costs at each decision epoch. We will study the problem of finding a policy that maximizes the expected total discounted reward subject to the constraints that the expected total discounted costs are not greater than given values. Thus, we will investigate the decomposition method of the state space into the strongly communicating classes for computing an optimal or a nearly optimal stationary policy. The discounted criterion has many applications in several areas such that the Forest Management, the Management of Energy Consumption, the finance, the Communication System (Mobile Networks) and the artificial intelligence.


Mathematics ◽  
2020 ◽  
Vol 8 (12) ◽  
pp. 2226 ◽  
Author(s):  
José Niño-Mora

The Whittle index for restless bandits (two-action semi-Markov decision processes) provides an intuitively appealing optimal policy for controlling a single generic project that can be active (engaged) or passive (rested) at each decision epoch, and which can change state while passive. It further provides a practical heuristic priority-index policy for the computationally intractable multi-armed restless bandit problem, which has been widely applied over the last three decades in multifarious settings, yet mostly restricted to project models with a one-dimensional state. This is due in part to the difficulty of establishing indexability (existence of the index) and of computing the index for projects with large state spaces. This paper draws on the author’s prior results on sufficient indexability conditions and an adaptive-greedy algorithmic scheme for restless bandits to obtain a new fast-pivoting algorithm that computes the n Whittle index values of an n-state restless bandit by performing, after an initialization stage, n steps that entail (2/3)n3+O(n2) arithmetic operations. This algorithm also draws on the parametric simplex method, and is based on elucidating the pattern of parametric simplex tableaux, which allows to exploit special structure to substantially simplify and reduce the complexity of simplex pivoting steps. A numerical study demonstrates substantial runtime speed-ups versus alternative algorithms.


Algorithms ◽  
2020 ◽  
Vol 13 (10) ◽  
pp. 241
Author(s):  
Shashank Goyal ◽  
Diwakar Gupta

Many sharing-economy platforms operate as follows. Owners list the availability of resources, prices, and contract-length limits. Customers propose contract start times and lengths. The owners decide immediately whether to accept or decline each proposal, even if the contract is for a future date. Accepted proposals generate revenue. Declined proposals are lost. At any decision epoch, the owner has no information regarding future proposals. The owner seeks easy-to-implement algorithms that achieve the best competitive ratio (CR). We first derive a lower bound on the CR of any algorithm. We then analyze CRs of all intuitive “greedy” algorithms. We propose two new algorithms that have significantly better CRs than that of any greedy algorithm for certain parameter-value ranges. The key idea behind these algorithms is that owners may reserve some amount of capacity for late-arriving higher-value proposals in an attempt to improve revenue. Our contribution lies in operationalizing this idea with the help of algorithms that utilize thresholds. Moreover, we show that if non-optimal thresholds are chosen, then those may lead to poor CRs. We provide a rigorous method by which an owner can decide the best approach in their context by analyzing the CRs of greedy algorithms and those proposed by us.


2019 ◽  
Vol 53 (5) ◽  
pp. 1749-1761
Author(s):  
Zhicong Zhang ◽  
Shuai Li ◽  
Xiaohui Yan ◽  
Liangwei Zhang

We study a time homogeneous discrete composite-action Markov decision process (CMDP) which needs to make multiple decisions at each state. In this particular Markov decision process, the state variables are divided into two separable sets and a two-dimensional composite action is chosen at each decision epoch. To solve a composite-action Markov decision process, we propose a novel linear programming model (Contracted Linear Programming Model, CLPM). We show that the CLPM model obtains the optimal state values of a CMDP process. We analyze and compare the number of variables and constraints of the CLPM model and the Traditional Linear Programming Model (TLPM). Computational experiments compare running times and memory usage of the two models. The CLPM model outperforms the TLPM model in both time complexity and space complexity by theoretical analysis and computational experiments.


2013 ◽  
Vol 45 (1) ◽  
pp. 51-85 ◽  
Author(s):  
K. D. Glazebrook ◽  
D. J. Hodge ◽  
C. Kirkbride

Motivated by a wide range of applications, we consider a development of Whittle's restless bandit model in which project activation requires a state-dependent amount of a key resource, which is assumed to be available at a constant rate. As many projects may be activated at each decision epoch as resource availability allows. We seek a policy for project activation within resource constraints which minimises an aggregate cost rate for the system. Project indices derived from a Lagrangian relaxation of the original problem exist provided the structural requirement of indexability is met. Verification of this property and derivation of the related indices is greatly simplified when the solution of the Lagrangian relaxation has a state monotone structure for each constituent project. We demonstrate that this is indeed the case for a wide range of bidirectional projects in which the project state tends to move in a different direction when it is activated from that in which it moves when passive. This is natural in many application domains in which activation of a project ameliorates its condition, which otherwise tends to deteriorate or deplete. In some cases the state monotonicity required is related to the structure of state transitions, while in others it is also related to the nature of costs. Two numerical studies demonstrate the value of the ideas for the construction of policies for dynamic resource allocation, most especially in contexts which involve a large number of projects.


2013 ◽  
Vol 45 (01) ◽  
pp. 51-85
Author(s):  
K. D. Glazebrook ◽  
D. J. Hodge ◽  
C. Kirkbride

Motivated by a wide range of applications, we consider a development of Whittle's restless bandit model in which project activation requires a state-dependent amount of a key resource, which is assumed to be available at a constant rate. As many projects may be activated at each decision epoch as resource availability allows. We seek a policy for project activation within resource constraints which minimises an aggregate cost rate for the system. Project indices derived from a Lagrangian relaxation of the original problem exist provided the structural requirement of indexability is met. Verification of this property and derivation of the related indices is greatly simplified when the solution of the Lagrangian relaxation has a state monotone structure for each constituent project. We demonstrate that this is indeed the case for a wide range of bidirectional projects in which the project state tends to move in a different direction when it is activated from that in which it moves when passive. This is natural in many application domains in which activation of a project ameliorates its condition, which otherwise tends to deteriorate or deplete. In some cases the state monotonicity required is related to the structure of state transitions, while in others it is also related to the nature of costs. Two numerical studies demonstrate the value of the ideas for the construction of policies for dynamic resource allocation, most especially in contexts which involve a large number of projects.


Sign in / Sign up

Export Citation Format

Share Document