An Optimization Model of the Single-Leg Air Cargo Space Control Based on Markov Decision Process

Based on the single-leg air cargo issues, we establish a dynamic programming model to consider the overbooking and space inventory control problem. We analyze the structure of optimal booking policy for every kind of booking requests and show that the optimal booking decision is of threshold type (known as booking limit policy). Our research provides a theoretical support for the air cargo space control.

Download Full-text

A Markov decision process-based policy characterization approach for a stochastic inventory control problem with unreliable sourcing

International Journal of Production Economics ◽

10.1016/j.ijpe.2013.03.021 ◽

2013 ◽

Vol 144 (2) ◽

pp. 485-496 ◽

Cited By ~ 21

Author(s):

S. Sebnem Ahiska ◽

Samyuktha R. Appaji ◽

Russell E. King ◽

Donald P. Warsing

Keyword(s):

Control Problem ◽

Inventory Control ◽

Markov Decision Process ◽

Decision Process ◽

Stochastic Inventory ◽

Stochastic Inventory Control ◽

Markov Decision ◽

Inventory Control Problem

Download Full-text

Markov Decision Process-Based Resilience Enhancement for Distribution Systems: An Approximate Dynamic Programming Approach

IEEE Transactions on Smart Grid ◽

10.1109/tsg.2019.2956740 ◽

2020 ◽

Vol 11 (3) ◽

pp. 2498-2510

Author(s):

Chong Wang ◽

Ping Ju ◽

Shunbo Lei ◽

Zhaoyu Wang ◽

Feng Wu ◽

...

Keyword(s):

Dynamic Programming ◽

Markov Decision Process ◽

Decision Process ◽

Distribution Systems ◽

Approximate Dynamic Programming ◽

Programming Approach ◽

Dynamic Programming Approach ◽

Markov Decision

Download Full-text

Scheduling Elective Surgeries with Markov Decision Process and Approximate Dynamic Programming

IFAC-PapersOnLine ◽

10.1016/j.ifacol.2019.11.468 ◽

2019 ◽

Vol 52 (13) ◽

pp. 1831-1836 ◽

Cited By ~ 1

Author(s):

Jian Zhang ◽

Mahjoub Dridi ◽

Abdellah El Moudni

Keyword(s):

Dynamic Programming ◽

Markov Decision Process ◽

Decision Process ◽

Approximate Dynamic Programming ◽

Markov Decision

Download Full-text

Reinforcement Learning Enables Field-Development Policy Optimization

Journal of Petroleum Technology ◽

10.2118/0921-0046-jpt ◽

2021 ◽

Vol 73 (09) ◽

pp. 46-47

Author(s):

Chris Carpenter

Keyword(s):

Dynamic Programming ◽

Reinforcement Learning ◽

Markov Decision Process ◽

Decision Process ◽

Development Policy ◽

The State ◽

Field Development ◽

Markov Decision ◽

Policy Optimization ◽

Sequential Nature

This article, written by JPT Technology Editor Chris Carpenter, contains highlights of paper SPE 201254, “Reinforcement Learning for Field-Development Policy Optimization,” by Giorgio De Paola, SPE, and Cristina Ibanez-Llano, Repsol, and Jesus Rios, IBM, et al., prepared for the 2020 SPE Annual Technical Conference and Exhibition, originally scheduled to be held in Denver, Colorado, 5–7 October. The paper has not been peer reviewed. A field-development plan consists of a sequence of decisions. Each action taken affects the reservoir and conditions any future decision. The presence of uncertainty associated with this process, however, is undeniable. The novelty of the approach proposed by the authors in the complete paper is the consideration of the sequential nature of the decisions through the framework of dynamic programming (DP) and reinforcement learning (RL). This methodology allows moving the focus from a static field-development plan optimization to a more-dynamic framework that the authors call field-development policy optimization. This synopsis focuses on the methodology, while the complete paper also contains a real-field case of application of the methodology. Methodology Deep RL (DRL). RL is considered an important learning paradigm in artificial intelligence (AI) but differs from supervised or unsupervised learning, the most commonly known types currently studied in the field of machine learning. During the last decade, RL has attracted greater attention because of success obtained in applications related to games and self-driving cars resulting from its combination with deep-learning architectures such as DRL, which has allowed RL to scale on to previously unsolvable problems and, therefore, solve much larger sequential decision problems. RL, also referred to as stochastic approximate dynamic programming, is a goal-directed sequential-learning-from-interaction paradigm. The learner or agent is not told what to do but instead has to learn which actions or decisions yield a maximum reward through interaction with an uncertain environment without losing too much reward along the way. This way of learning from interaction to achieve a goal must be achieved in balance with the exploration and exploitation of possible actions. Another key characteristic of this type of problem is its sequential nature, where the actions taken by the agent affect the environment itself and, therefore, the subsequent data it receives and the subsequent actions to be taken. Mathematically, such problems are formulated in the framework of the Markov decision process (MDP) that primarily arises in the field of optimal control. An RL problem consists of two principal parts: the agent, or decision-making engine, and the environment, the interactive world for an agent (in this case, the reservoir). Sequentially, at each timestep, the agent takes an action (e.g., changing control rates or deciding a well location) that makes the environment (reservoir) transition from one state to another. Next, the agent receives a reward (e.g., a cash flow) and an observation of the state of the environment (partial or total) before taking the next action. All relevant information informing the agent of the state of the system is assumed to be included in the last state observed by the agent (Markov property). If the agent observes the full environment state once it has acted, the MDP is said to be fully observable; otherwise, a partially observable Markov decision process (POMDP) results. The agent’s objective is to learn policy mapping from states (MDPs) or histories (POMDPs) to actions such that the agent’s cumulated (discounted) reward in the long run is maximized.

Download Full-text

Robot Navigation in an Uncertain Environment using Dynamic programming via Markov decision process

2018 International Conference on Control, Power, Communication and Computing Technologies (ICCPCCT) ◽

10.1109/iccpcct.2018.8574220 ◽

2018 ◽

Author(s):

Meera Sivadas ◽

Ambili Mohan

Keyword(s):

Dynamic Programming ◽

Markov Decision Process ◽

Decision Process ◽

Robot Navigation ◽

Uncertain Environment ◽

Markov Decision

Download Full-text

Application of Markov decision process in customer behaviour

Proceedings of Business and Economic Studies ◽

10.26689/pbes.v3i4.1426 ◽

2020 ◽

Vol 3 (4) ◽

Author(s):

Yayun Fan ◽

Cheng Li ◽

Yan Hu

Keyword(s):

Markov Decision Process ◽

Decision Process ◽

Programming Model ◽

Decision Makers ◽

Observation System ◽

Action Sequence ◽

Customer Base ◽

Survival And Development ◽

Markov Decision ◽

Well Posed

Customer is the source of business income, a stable customer base is the guarantee of enterprise survival and development of enterprises by using Markov decision process, decision-makers in the new decision point in time, to the latest state of observation system and adopt an original decision, decided in a well-posed option set an action sequence, and then choose to create value and total revenue in this sequence the most significant behavior, obtain the best marketing strategy, formulate relevant enterprise actual customer, dynamic programming model.

Download Full-text

Inventory Control in Retrial Service Facility System – Semi Markov Decision Process

Annals of Pure and Applied Mathematics ◽

10.22457/apam.v15n1a4 ◽

2017 ◽

Vol 15 (1) ◽

pp. 41-49

Author(s):

S. Krishnakumar ◽

◽

C. Elango

Keyword(s):

Inventory Control ◽

Markov Decision Process ◽

Decision Process ◽

Service Facility ◽

Markov Decision ◽

Facility System

Download Full-text

Intelligent Questionnaires Using Approximate Dynamic Programming

i-com ◽

10.1515/icom-2020-0022 ◽

2020 ◽

Vol 19 (3) ◽

pp. 227-237

Author(s):

Frédéric Logé ◽

Erwan Le Pennec ◽

Habiboulaye Amadou-Boubacar

Keyword(s):

Dynamic Programming ◽

Decision Tree ◽

Supervised Learning ◽

Markov Decision Process ◽

User Experience ◽

Decision Process ◽

Approximate Dynamic Programming ◽

Budget Constraint ◽

Markov Decision ◽

The Right

Abstract Inefficient interaction such as long and/or repetitive questionnaires can be detrimental to user experience, which leads us to investigate the computation of an intelligent questionnaire for a prediction task. Given time and budget constraints (maximum q questions asked), this questionnaire will select adaptively the question sequence based on answers already given. Several use-cases with increased user and customer experience are given. The problem is framed as a Markov Decision Process and solved numerically with approximate dynamic programming, exploiting the hierarchical and episodic structure of the problem. The approach, evaluated on toy models and classic supervised learning datasets, outperforms two baselines: a decision tree with budget constraint and a model with q best features systematically asked. The online problem, quite critical for deployment seems to pose no particular issue, under the right exploration strategy. This setting is quite flexible and can incorporate easily initial available data and grouped questions.

Download Full-text

A Dynamic Strategy for Home Pick-Up Service with Uncertain Customer Requests and Its Implementation

Sustainability ◽

10.3390/su11072060 ◽

2019 ◽

Vol 11 (7) ◽

pp. 2060

Author(s):

Yu Wu ◽

Bo Zeng ◽

Siming Huang

Keyword(s):

Dynamic Programming ◽

State Space ◽

Markov Decision Process ◽

Decision Process ◽

Solution Method ◽

Computing Complexity ◽

Home Service ◽

Dynamic Strategy ◽

Markov Decision ◽

Capacitated Vehicle

In this paper, a home service problem is studied, where a capacitated vehicle collects customers’ parcels in one pick-up tour. We consider a situation where customers, who have scheduled their services in advance, may call to cancel their appointments, and customers, who do not have appointments, also need to be visited if they request for services as long as the capacity is allowed. To handle those changes that occurred over the tour, a dynamic strategy will be needed to guide the vehicle to visit customers in an efficient way. Aimed at minimizing the vehicle’s total expected travel distance, we model this problem as a multi-dimensional Markov Decision Process (MDP) with finite exponential scale state space. We exactly solve this MDP via dynamic programming, where the computing complexity is exponential. In order to avoid complexity continually increasing, we aim to develop a fast looking-up method for one already-examined state’s record. Although generally this will result in a huge waste of memory, by exploiting critical structural properties of the state space, we obtain an O ( 1 ) looking-up method without any waste of memory. Computational experiments demonstrate the effectiveness of our model and the developed solution method. For larger instances, two well-performed heuristics are proposed.

Download Full-text

A Dynamic Programming Model to Optimize the Capacity Control with the Priority of Air Cargo

International Journal of Hybrid Information Technology ◽

10.14257/ijhit.2014.7.1.10 ◽

2014 ◽

Vol 7 (1) ◽

pp. 125-136

Author(s):

Qinglong Hu

Keyword(s):

Dynamic Programming ◽

Programming Model ◽

Air Cargo ◽

Capacity Control ◽

Dynamic Programming Model

Download Full-text