Impulsive Control for Continuous-Time Markov Decision Processes

In this paper our objective is to study continuous-time Markov decision processes on a general Borel state space with both impulsive and continuous controls for the infinite time horizon discounted cost. The continuous-time controlled process is shown to be nonexplosive under appropriate hypotheses. The so-called Bellman equation associated to this control problem is studied. Sufficient conditions ensuring the existence and the uniqueness of a bounded measurable solution to this optimality equation are provided. Moreover, it is shown that the value function of the optimization problem under consideration satisfies this optimality equation. Sufficient conditions are also presented to ensure on the one hand the existence of an optimal control strategy, and on the other hand the existence of a ε-optimal control strategy. The decomposition of the state space into two disjoint subsets is exhibited where, roughly speaking, one should apply a gradual action or an impulsive action correspondingly to obtain an optimal or ε-optimal strategy. An interesting consequence of our previous results is as follows: the set of strategies that allow interventions at time t = 0 and only immediately after natural jumps is a sufficient set for the control problem under consideration.

Download Full-text

Model-Free Reinforcement Learning for Branching Markov Decision Processes

Computer Aided Verification - Lecture Notes in Computer Science ◽

10.1007/978-3-030-81688-9_30 ◽

2021 ◽

pp. 651-673

Author(s):

Ernst Moritz Hahn ◽

Mateo Perez ◽

Sven Schewe ◽

Fabio Somenzi ◽

Ashutosh Trivedi ◽

...

Keyword(s):

Optimal Control ◽

Reinforcement Learning ◽

Markov Decision Processes ◽

Control Strategy ◽

Natural Extension ◽

Decision Processes ◽

Optimal Control Strategy ◽

Model Free ◽

Learning Techniques ◽

Markov Decision

AbstractWe study reinforcement learning for the optimal control of Branching Markov Decision Processes (BMDPs), a natural extension of (multitype) Branching Markov Chains (BMCs). The state of a (discrete-time) BMCs is a collection of entities of various types that, while spawning other entities, generate a payoff. In comparison with BMCs, where the evolution of a each entity of the same type follows the same probabilistic pattern, BMDPs allow an external controller to pick from a range of options. This permits us to study the best/worst behaviour of the system. We generalise model-free reinforcement learning techniques to compute an optimal control strategy of an unknown BMDP in the limit. We present results of an implementation that demonstrate the practicality of the approach.

Download Full-text

Denumerable state continuous time Markov decision processes with unbounded cost and transition rates under average criterion

The ANZIAM Journal ◽

10.1017/s144618110001213x ◽

2002 ◽

Vol 43 (4) ◽

pp. 541-557 ◽

Cited By ~ 10

Author(s):

Xianping Guo ◽

Weiping Zhu

Keyword(s):

Markov Decision Processes ◽

Continuous Time ◽

Decision Processes ◽

Transition Rates ◽

Birth And Death Processes ◽

Optimality Equation ◽

Average Criterion ◽

Markov Decision ◽

Unbounded Cost ◽

Queue Model

AbstractIn this paper, we consider denumerable state continuous time Markov decision processes with (possibly unbounded) transition and cost rates under average criterion. We present a set of conditions and prove the existence of both average cost optimal stationary policies and a solution of the average optimality equation under the conditions. The results in this paper are applied to an admission control queue model and controlled birth and death processes.

Download Full-text

Average optimal policies in Markov decision drift processes with applications to a queueing and a replacement model

Advances in Applied Probability ◽

10.2307/1426437 ◽

1983 ◽

Vol 15 (2) ◽

pp. 274-303 ◽

Cited By ~ 28

Author(s):

Arie Hordijk ◽

Frank A. Van Der Duyn Schouten

Keyword(s):

Markov Decision Processes ◽

Optimal Policy ◽

Continuous Time ◽

Sufficient Conditions ◽

Decision Processes ◽

Time Parameter ◽

Queueing Model ◽

Replacement Model ◽

Optimal Policies ◽

Markov Decision

Recently the authors introduced the concept of Markov decision drift processes. A Markov decision drift process can be seen as a straightforward generalization of a Markov decision process with continuous time parameter. In this paper we investigate the existence of stationary average optimal policies for Markov decision drift processes. Using a well-known Abelian theorem we derive sufficient conditions, which guarantee that a ‘limit point' of a sequence of discounted optimal policies with the discounting factor approaching 1 is an average optimal policy. An alternative set of sufficient conditions is obtained for the case in which the discounted optimal policies generate regenerative stochastic processes. The latter set of conditions is easier to verify in several applications. The results of this paper are also applicable to Markov decision processes with discrete or continuous time parameter and to semi-Markov decision processes. In this sense they generalize some well-known results for Markov decision processes with finite or compact action space. Applications to an M/M/1 queueing model and a maintenance replacement model are given. It is shown that under certain conditions on the model parameters the average optimal policy for the M/M/1 queueing model is monotone non-decreasing (as a function of the number of waiting customers) with respect to the service intensity and monotone non-increasing with respect to the arrival intensity. For the maintenance replacement model we prove the average optimality of a bang-bang type policy. Special attention is paid to the computation of the optimal control parameters.

Download Full-text

ON THE EXISTENCE OF OPTIMAL CONTROL IN CONTINUOUS TIME MARKOV DECISION PROCESSES

Bulletin of Mathematical Statistics ◽

10.5109/13058 ◽

1972 ◽

Vol 15 (1/2) ◽

pp. 7-17 ◽

Cited By ~ 2

Author(s):

Masami Yasuda

Keyword(s):

Optimal Control ◽

Markov Decision Processes ◽

Continuous Time ◽

Decision Processes ◽

Markov Decision ◽

Existence Of Optimal Control

Download Full-text

Optimal control of average reward constrained continuous-time finite Markov decision processes

Proceedings of the 41st IEEE Conference on Decision and Control, 2002. ◽

10.1109/cdc.2002.1184957 ◽

2004 ◽

Cited By ~ 14

Author(s):

E.A. Feinberg

Keyword(s):

Optimal Control ◽

Markov Decision Processes ◽

Continuous Time ◽

Decision Processes ◽

Average Reward ◽

Markov Decision

Download Full-text

On the optimal control of a class of continuous time non-Markov decision processes

International Journal of Systems Science ◽

10.1080/00207727908941571 ◽

1979 ◽

Vol 10 (2) ◽

pp. 135-144 ◽

Cited By ~ 2

Author(s):

K. D. GLAZEBROOK

Keyword(s):

Optimal Control ◽

Markov Decision Processes ◽

Continuous Time ◽

Decision Processes ◽

Markov Decision

Download Full-text

Average optimal policies in Markov decision drift processes with applications to a queueing and a replacement model

Advances in Applied Probability ◽

10.1017/s0001867800021182 ◽

1983 ◽

Vol 15 (02) ◽

pp. 274-303 ◽

Cited By ~ 3

Author(s):

Arie Hordijk ◽

Frank A. Van Der Duyn Schouten

Keyword(s):

Markov Decision Processes ◽

Optimal Policy ◽

Continuous Time ◽

Sufficient Conditions ◽

Decision Processes ◽

Time Parameter ◽

Queueing Model ◽

Replacement Model ◽

Optimal Policies ◽

Markov Decision

Recently the authors introduced the concept of Markov decision drift processes. A Markov decision drift process can be seen as a straightforward generalization of a Markov decision process with continuous time parameter. In this paper we investigate the existence of stationary average optimal policies for Markov decision drift processes. Using a well-known Abelian theorem we derive sufficient conditions, which guarantee that a ‘limit point' of a sequence of discounted optimal policies with the discounting factor approaching 1 is an average optimal policy. An alternative set of sufficient conditions is obtained for the case in which the discounted optimal policies generate regenerative stochastic processes. The latter set of conditions is easier to verify in several applications. The results of this paper are also applicable to Markov decision processes with discrete or continuous time parameter and to semi-Markov decision processes. In this sense they generalize some well-known results for Markov decision processes with finite or compact action space. Applications to an M/M/1 queueing model and a maintenance replacement model are given. It is shown that under certain conditions on the model parameters the average optimal policy for the M/M/1 queueing model is monotone non-decreasing (as a function of the number of waiting customers) with respect to the service intensity and monotone non-increasing with respect to the arrival intensity. For the maintenance replacement model we prove the average optimality of a bang-bang type policy. Special attention is paid to the computation of the optimal control parameters.

Download Full-text

New sufficient conditions for average optimality in continuous-time Markov decision processes

Mathematical Methods of Operations Research ◽

10.1007/s00186-010-0307-4 ◽

2010 ◽

Vol 72 (1) ◽

pp. 75-94 ◽

Cited By ~ 4

Author(s):

Liuer Ye ◽

Xianping Guo

Keyword(s):

Markov Decision Processes ◽

Continuous Time ◽

Sufficient Conditions ◽

Decision Processes ◽

Markov Decision

Download Full-text

Finite-horizon optimality for continuous-time Markov decision processes with unbounded transition rates

Advances in Applied Probability ◽

10.1239/aap/1449859800 ◽

2015 ◽

Vol 47 (4) ◽

pp. 1064-1087 ◽

Cited By ~ 7

Author(s):

Xianping Guo ◽

Xiangxiang Huang ◽

Yonghui Huang

Keyword(s):

Markov Decision Processes ◽

Continuous Time ◽

Value Function ◽

Decision Processes ◽

Finite Horizon ◽

Transition Rates ◽

Optimality Equation ◽

Unbounded Transition Rates ◽

Markov Decision ◽

The Value Function

In this paper we focus on the finite-horizon optimality for denumerable continuous-time Markov decision processes, in which the transition and reward/cost rates are allowed to be unbounded, and the optimality is over the class of all randomized history-dependent policies. Under mild reasonable conditions, we first establish the existence of a solution to the finite-horizon optimality equation by designing a technique of approximations from the bounded transition rates to unbounded ones. Then we prove the existence of ε (≥ 0)-optimal Markov policies and verify that the value function is the unique solution to the optimality equation by establishing the analog of the Itô-Dynkin formula. Finally, we provide an example in which the transition rates and the value function are all unbounded and, thus, obtain solutions to some of the unsolved problems by Yushkevich (1978).

Download Full-text