decision epoch Latest Research Papers

We consider a heterogeneous distributed service system consisting of n servers with unknown and possibly different processing rates. Jobs with unit mean arrive as a renewal process of rate proportional to n and are immediately dispatched to one of several queues associated with the servers. We assume that the dispatching decisions are made by a central dispatcher with the ability to exchange messages with the servers and endowed with a finite memory used to store information from one decision epoch to the next, about the current state of the queues and about the service rates of the servers. We study the fundamental resource requirements (memory bits and message exchange rate) in order for a dispatching policy to be always stable. First, we present a policy that is always stable while using a positive (but arbitrarily small) message rate and [Formula: see text] bits of memory. Second, we show that within a certain broad class of policies, a dispatching policy that exchanges [Formula: see text] messages per unit of time, and with [Formula: see text] bits of memory, cannot be always stable.

Download Full-text

Anticipatory approach for dynamic and stochastic shipment matching in hinterland synchromodal transportation

Flexible Services and Manufacturing Journal ◽

10.1007/s10696-021-09428-5 ◽

2021 ◽

Author(s):

Wenjing Guo ◽

Bilge Atasoy ◽

Wouter Beelaerts van Blokland ◽

Rudy R. Negenborn

Keyword(s):

Cost Savings ◽

Sample Average Approximation ◽

Dynamic Decision Making ◽

Matching Problem ◽

Dynamic Events ◽

Progressive Hedging ◽

Progressive Hedging Algorithm ◽

Sample Average ◽

Decision Epoch ◽

Decision Making Processes

AbstractThis paper investigates a dynamic and stochastic shipment matching problem faced by network operators in hinterland synchromodal transportation. We consider a platform that receives contractual and spot shipment requests from shippers, and receives multimodal services from carriers. The platform aims to provide optimal matches between shipment requests and multimodal services within a finite horizon under spot request uncertainty. Due to the capacity limitation of multimodal services, the matching decisions made for current requests will affect the ability to make good matches for future requests. To solve the problem, this paper proposes an anticipatory approach which consists of a rolling horizon framework that handles dynamic events, a sample average approximation method that addresses uncertainties, and a progressive hedging algorithm that generates solutions at each decision epoch. Compared with the greedy approach which is commonly used in practice, the anticipatory approach has total cost savings up to 8.18% under realistic instances. The experimental results highlight the benefits of incorporating stochastic information in dynamic decision making processes of the synchromodal matching system.

Download Full-text

Optimal Control of Partially Observable Semi-Markovian Failing Systems: An Analysis Using a Phase Methodology

Operations Research ◽

10.1287/opre.2020.2086 ◽

2021 ◽

Author(s):

Akram Khaleghei ◽

Michael Jong Kim

Keyword(s):

Optimal Control ◽

Control Policy ◽

Computational Approach ◽

Control Limit ◽

New Approach ◽

Problem Class ◽

Optimal Value ◽

Markov Decision ◽

Decision Epoch ◽

Partially Observable

In “Optimal Control of Partially Observable Semi-Markovian Failing Systems: An Analysis using a Phase Methodology,” Khaleghei and Kim study a maintenance control problem a as partially observable semi-Markov decision process (POSMDP), a problem class that is typically computationally intractable and not amenable to structural analysis. The authors develop a new approach based on a phase methodology where the idea is to view the intractable POSMDP as the limiting problem of a sequence of tractable POMDPs. They show that the optimal control policy can be represented as a control limit policy which monitors the estimated conditional reliability at each decision epoch, and, by exploiting this structure, an efficient computational approach to solve for the optimal control limit and corresponding optimal value is developed.

Download Full-text

Discounted Markov Decision Processes with Constrained Costs: the decomposition approach

E3S Web of Conferences ◽

10.1051/e3sconf/202122901047 ◽

2021 ◽

Vol 229 ◽

pp. 01047

Author(s):

Abdellatif Semmouri ◽

Mostafa Jourhmane ◽

Bahaa Eddine Elbaghazaoui

Keyword(s):

Markov Decision Processes ◽

Mobile Networks ◽

Decision Processes ◽

Stationary Policy ◽

Decomposition Approach ◽

Finite State ◽

Markov Decision ◽

Optimal Stationary Policy ◽

Decision Epoch ◽

Discounted Criterion

In this paper we consider a constrained optimization of discrete time Markov Decision Processes (MDPs) with finite state and action spaces, which accumulate both a reward and costs at each decision epoch. We will study the problem of finding a policy that maximizes the expected total discounted reward subject to the constraints that the expected total discounted costs are not greater than given values. Thus, we will investigate the decomposition method of the state space into the strongly communicating classes for computing an optimal or a nearly optimal stationary policy. The discounted criterion has many applications in several areas such that the Forest Management, the Management of Energy Consumption, the finance, the Communication System (Mobile Networks) and the artificial intelligence.

Download Full-text

A Fast-Pivoting Algorithm for Whittle’s Restless Bandit Index

Mathematics ◽

10.3390/math8122226 ◽

2020 ◽

Vol 8 (12) ◽

pp. 2226 ◽

Cited By ~ 1

Author(s):

José Niño-Mora

Keyword(s):

Numerical Study ◽

Index Policy ◽

State Spaces ◽

Restless Bandit ◽

Restless Bandits ◽

Pivoting Algorithm ◽

Markov Decision ◽

Whittle Index ◽

Decision Epoch ◽

Change State

The Whittle index for restless bandits (two-action semi-Markov decision processes) provides an intuitively appealing optimal policy for controlling a single generic project that can be active (engaged) or passive (rested) at each decision epoch, and which can change state while passive. It further provides a practical heuristic priority-index policy for the computationally intractable multi-armed restless bandit problem, which has been widely applied over the last three decades in multifarious settings, yet mostly restricted to project models with a one-dimensional state. This is due in part to the difficulty of establishing indexability (existence of the index) and of computing the index for projects with large state spaces. This paper draws on the author’s prior results on sufficient indexability conditions and an adaptive-greedy algorithmic scheme for restless bandits to obtain a new fast-pivoting algorithm that computes the n Whittle index values of an n-state restless bandit by performing, after an initialization stage, n steps that entail (2/3)n3+O(n2) arithmetic operations. This algorithm also draws on the parametric simplex method, and is based on elucidating the pattern of parametric simplex tableaux, which allows to exploit special structure to substantially simplify and reduce the complexity of simplex pivoting steps. A numerical study demonstrates substantial runtime speed-ups versus alternative algorithms.

Download Full-text

The Online Reservation Problem

Algorithms ◽

10.3390/a13100241 ◽

2020 ◽

Vol 13 (10) ◽

pp. 241

Author(s):

Shashank Goyal ◽

Diwakar Gupta

Keyword(s):

Lower Bound ◽

Competitive Ratio ◽

Greedy Algorithms ◽

Sharing Economy ◽

Optimal Thresholds ◽

Decision Epoch ◽

Online Reservation ◽

New Algorithms ◽

Future Date ◽

Rigorous Method

Many sharing-economy platforms operate as follows. Owners list the availability of resources, prices, and contract-length limits. Customers propose contract start times and lengths. The owners decide immediately whether to accept or decline each proposal, even if the contract is for a future date. Accepted proposals generate revenue. Declined proposals are lost. At any decision epoch, the owner has no information regarding future proposals. The owner seeks easy-to-implement algorithms that achieve the best competitive ratio (CR). We first derive a lower bound on the CR of any algorithm. We then analyze CRs of all intuitive “greedy” algorithms. We propose two new algorithms that have significantly better CRs than that of any greedy algorithm for certain parameter-value ranges. The key idea behind these algorithms is that owners may reserve some amount of capacity for late-arriving higher-value proposals in an attempt to improve revenue. Our contribution lies in operationalizing this idea with the help of algorithms that utilize thresholds. Moreover, we show that if non-optimal thresholds are chosen, then those may lead to poor CRs. We provide a rigorous method by which an owner can decide the best approach in their context by analyzing the CRs of greedy algorithms and those proposed by us.

Download Full-text

A linear programming based approach for composite-action Markov decision processes

RAIRO - Operations Research ◽

10.1051/ro/2018081 ◽

2019 ◽

Vol 53 (5) ◽

pp. 1749-1761

Author(s):

Zhicong Zhang ◽

Shuai Li ◽

Xiaohui Yan ◽

Liangwei Zhang

Keyword(s):

Linear Programming ◽

Markov Decision Process ◽

Decision Process ◽

Programming Model ◽

Linear Programming Model ◽

Computational Experiments ◽

State Variables ◽

Composite Action ◽

Markov Decision ◽

Decision Epoch

We study a time homogeneous discrete composite-action Markov decision process (CMDP) which needs to make multiple decisions at each state. In this particular Markov decision process, the state variables are divided into two separable sets and a two-dimensional composite action is chosen at each decision epoch. To solve a composite-action Markov decision process, we propose a novel linear programming model (Contracted Linear Programming Model, CLPM). We show that the CLPM model obtains the optimal state values of a CMDP process. We analyze and compare the number of variables and constraints of the CLPM model and the Traditional Linear Programming Model (TLPM). Computational experiments compare running times and memory usage of the two models. The CLPM model outperforms the TLPM model in both time complexity and space complexity by theoretical analysis and computational experiments.

Download Full-text

Decision Epoch

Encyclopedia of Machine Learning and Data Mining ◽

10.1007/978-1-4899-7687-1_198 ◽

2017 ◽

pp. 328-328

Keyword(s):

Decision Epoch

Download Full-text

Monotone Policies and Indexability for Bidirectional Restless Bandits

Advances in Applied Probability ◽

10.1239/aap/1363354103 ◽

2013 ◽

Vol 45 (1) ◽

pp. 51-85 ◽

Cited By ~ 4

Author(s):

K. D. Glazebrook ◽

D. J. Hodge ◽

C. Kirkbride

Keyword(s):

Lagrangian Relaxation ◽

Resource Constraints ◽

State Transitions ◽

Structural Requirement ◽

Cost Rate ◽

State Dependent ◽

Wide Range ◽

Decision Epoch ◽

Monotone Policies ◽

System Project

Motivated by a wide range of applications, we consider a development of Whittle's restless bandit model in which project activation requires a state-dependent amount of a key resource, which is assumed to be available at a constant rate. As many projects may be activated at each decision epoch as resource availability allows. We seek a policy for project activation within resource constraints which minimises an aggregate cost rate for the system. Project indices derived from a Lagrangian relaxation of the original problem exist provided the structural requirement of indexability is met. Verification of this property and derivation of the related indices is greatly simplified when the solution of the Lagrangian relaxation has a state monotone structure for each constituent project. We demonstrate that this is indeed the case for a wide range of bidirectional projects in which the project state tends to move in a different direction when it is activated from that in which it moves when passive. This is natural in many application domains in which activation of a project ameliorates its condition, which otherwise tends to deteriorate or deplete. In some cases the state monotonicity required is related to the structure of state transitions, while in others it is also related to the nature of costs. Two numerical studies demonstrate the value of the ideas for the construction of policies for dynamic resource allocation, most especially in contexts which involve a large number of projects.

Download Full-text

Monotone Policies and Indexability for Bidirectional Restless Bandits

Advances in Applied Probability ◽

10.1017/s0001867800006194 ◽

2013 ◽

Vol 45 (01) ◽

pp. 51-85

Author(s):

K. D. Glazebrook ◽

D. J. Hodge ◽

C. Kirkbride

Keyword(s):

Lagrangian Relaxation ◽

Resource Constraints ◽

State Transitions ◽

Structural Requirement ◽

Cost Rate ◽

State Dependent ◽

Wide Range ◽

Decision Epoch ◽

Monotone Policies ◽

System Project

Motivated by a wide range of applications, we consider a development of Whittle's restless bandit model in which project activation requires a state-dependent amount of a key resource, which is assumed to be available at a constant rate. As many projects may be activated at each decision epoch as resource availability allows. We seek a policy for project activation within resource constraints which minimises an aggregate cost rate for the system. Project indices derived from a Lagrangian relaxation of the original problem exist provided the structural requirement of indexability is met. Verification of this property and derivation of the related indices is greatly simplified when the solution of the Lagrangian relaxation has a state monotone structure for each constituent project. We demonstrate that this is indeed the case for a wide range of bidirectional projects in which the project state tends to move in a different direction when it is activated from that in which it moves when passive. This is natural in many application domains in which activation of a project ameliorates its condition, which otherwise tends to deteriorate or deplete. In some cases the state monotonicity required is related to the structure of state transitions, while in others it is also related to the nature of costs. Two numerical studies demonstrate the value of the ideas for the construction of policies for dynamic resource allocation, most especially in contexts which involve a large number of projects.

Download Full-text

decision epoch
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

Stability, Memory, and Messaging Trade-Offs in Heterogeneous Service Systems

Anticipatory approach for dynamic and stochastic shipment matching in hinterland synchromodal transportation

Optimal Control of Partially Observable Semi-Markovian Failing Systems: An Analysis Using a Phase Methodology

Discounted Markov Decision Processes with Constrained Costs: the decomposition approach

A Fast-Pivoting Algorithm for Whittle’s Restless Bandit Index

The Online Reservation Problem

A linear programming based approach for composite-action Markov decision processes

Decision Epoch

Monotone Policies and Indexability for Bidirectional Restless Bandits

Monotone Policies and Indexability for Bidirectional Restless Bandits

Export Citation Format

decision epochRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Stability, Memory, and Messaging Trade-Offs in Heterogeneous Service Systems

Anticipatory approach for dynamic and stochastic shipment matching in hinterland synchromodal transportation

Optimal Control of Partially Observable Semi-Markovian Failing Systems: An Analysis Using a Phase Methodology

Discounted Markov Decision Processes with Constrained Costs: the decomposition approach

A Fast-Pivoting Algorithm for Whittle’s Restless Bandit Index

The Online Reservation Problem

A linear programming based approach for composite-action Markov decision processes

Decision Epoch

Monotone Policies and Indexability for Bidirectional Restless Bandits

Monotone Policies and Indexability for Bidirectional Restless Bandits

decision epoch
Recently Published Documents