An Application of Markov Decision Processes to the Seat Inventory Control Problem

Author(s):  
Christiane Barz ◽  
Karl-Heinz Waldmann
2015 ◽  
Vol 47 (1) ◽  
pp. 106-127 ◽  
Author(s):  
François Dufour ◽  
Alexei B. Piunovskiy

In this paper our objective is to study continuous-time Markov decision processes on a general Borel state space with both impulsive and continuous controls for the infinite time horizon discounted cost. The continuous-time controlled process is shown to be nonexplosive under appropriate hypotheses. The so-called Bellman equation associated to this control problem is studied. Sufficient conditions ensuring the existence and the uniqueness of a bounded measurable solution to this optimality equation are provided. Moreover, it is shown that the value function of the optimization problem under consideration satisfies this optimality equation. Sufficient conditions are also presented to ensure on the one hand the existence of an optimal control strategy, and on the other hand the existence of a ε-optimal control strategy. The decomposition of the state space into two disjoint subsets is exhibited where, roughly speaking, one should apply a gradual action or an impulsive action correspondingly to obtain an optimal or ε-optimal strategy. An interesting consequence of our previous results is as follows: the set of strategies that allow interventions at time t = 0 and only immediately after natural jumps is a sufficient set for the control problem under consideration.


2015 ◽  
Vol 47 (01) ◽  
pp. 106-127 ◽  
Author(s):  
François Dufour ◽  
Alexei B. Piunovskiy

In this paper our objective is to study continuous-time Markov decision processes on a general Borel state space with both impulsive and continuous controls for the infinite time horizon discounted cost. The continuous-time controlled process is shown to be nonexplosive under appropriate hypotheses. The so-called Bellman equation associated to this control problem is studied. Sufficient conditions ensuring the existence and the uniqueness of a bounded measurable solution to this optimality equation are provided. Moreover, it is shown that the value function of the optimization problem under consideration satisfies this optimality equation. Sufficient conditions are also presented to ensure on the one hand the existence of an optimal control strategy, and on the other hand the existence of a ε-optimal control strategy. The decomposition of the state space into two disjoint subsets is exhibited where, roughly speaking, one should apply a gradual action or an impulsive action correspondingly to obtain an optimal or ε-optimal strategy. An interesting consequence of our previous results is as follows: the set of strategies that allow interventions at time t = 0 and only immediately after natural jumps is a sufficient set for the control problem under consideration.


2021 ◽  
Vol 53 (2) ◽  
pp. 301-334
Author(s):  
Xin Guo ◽  
Aiko Kurushima ◽  
Alexey Piunovskiy ◽  
Yi Zhang

AbstractWe consider a gradual-impulse control problem of continuous-time Markov decision processes, where the system performance is measured by the expectation of the exponential utility of the total cost. We show, under natural conditions on the system primitives, the existence of a deterministic stationary optimal policy out of a more general class of policies that allow multiple simultaneous impulses, randomized selection of impulses with random effects, and accumulation of jumps. After characterizing the value function using the optimality equation, we reduce the gradual-impulse control problem to an equivalent simple discrete-time Markov decision process, whose action space is the union of the sets of gradual and impulsive actions.


2012 ◽  
Vol 2012 ◽  
pp. 1-16 ◽  
Author(s):  
H. Cruz-Suárez ◽  
G. Zacarías-Espinoza ◽  
V. Vázquez-Guevara

This paper deals with Markov decision processes (MDPs) on Euclidean spaces with an infinite horizon. An approach to study this kind of MDPs is using the dynamic programming technique (DP). Then the optimal value function is characterized through the value iteration functions. The paper provides conditions that guarantee the convergence of maximizers of the value iteration functions to the optimal policy. Then, using the Euler equation and an envelope formula, the optimal solution of the optimal control problem is obtained. Finally, this theory is applied to a linear-quadratic control problem in order to find its optimal policy.


Sign in / Sign up

Export Citation Format

Share Document