Extreme-point solutions in Markov decision processes

The paper presents sufficient conditions for certain functions to be convex. Functions of this type often appear in Markov decision processes, where their maximum is the solution of the problem. Since a convex function takes its maximum at an extreme point, the conditions may greatly simplify a problem. In some cases a full solution may be obtained after the reduction is made. Some illustrative examples are discussed.

Download Full-text

Impulsive Control for Continuous-Time Markov Decision Processes

Advances in Applied Probability ◽

10.1239/aap/1427814583 ◽

2015 ◽

Vol 47 (1) ◽

pp. 106-127 ◽

Cited By ~ 6

Author(s):

François Dufour ◽

Alexei B. Piunovskiy

Keyword(s):

Optimal Control ◽

Control Problem ◽

Markov Decision Processes ◽

Control Strategy ◽

Continuous Time ◽

Sufficient Conditions ◽

Decision Processes ◽

Optimal Control Strategy ◽

Optimality Equation ◽

Markov Decision

In this paper our objective is to study continuous-time Markov decision processes on a general Borel state space with both impulsive and continuous controls for the infinite time horizon discounted cost. The continuous-time controlled process is shown to be nonexplosive under appropriate hypotheses. The so-called Bellman equation associated to this control problem is studied. Sufficient conditions ensuring the existence and the uniqueness of a bounded measurable solution to this optimality equation are provided. Moreover, it is shown that the value function of the optimization problem under consideration satisfies this optimality equation. Sufficient conditions are also presented to ensure on the one hand the existence of an optimal control strategy, and on the other hand the existence of a ε-optimal control strategy. The decomposition of the state space into two disjoint subsets is exhibited where, roughly speaking, one should apply a gradual action or an impulsive action correspondingly to obtain an optimal or ε-optimal strategy. An interesting consequence of our previous results is as follows: the set of strategies that allow interventions at time t = 0 and only immediately after natural jumps is a sufficient set for the control problem under consideration.

Download Full-text

Impulsive Control for Continuous-Time Markov Decision Processes

Advances in Applied Probability ◽

10.1017/s0001867800007722 ◽

2015 ◽

Vol 47 (01) ◽

pp. 106-127 ◽

Cited By ~ 2

Author(s):

François Dufour ◽

Alexei B. Piunovskiy

Keyword(s):

Optimal Control ◽

Control Problem ◽

Markov Decision Processes ◽

Control Strategy ◽

Continuous Time ◽

Sufficient Conditions ◽

Decision Processes ◽

Optimal Control Strategy ◽

Optimality Equation ◽

Markov Decision

In this paper our objective is to study continuous-time Markov decision processes on a general Borel state space with both impulsive and continuous controls for the infinite time horizon discounted cost. The continuous-time controlled process is shown to be nonexplosive under appropriate hypotheses. The so-called Bellman equation associated to this control problem is studied. Sufficient conditions ensuring the existence and the uniqueness of a bounded measurable solution to this optimality equation are provided. Moreover, it is shown that the value function of the optimization problem under consideration satisfies this optimality equation. Sufficient conditions are also presented to ensure on the one hand the existence of an optimal control strategy, and on the other hand the existence of a ε-optimal control strategy. The decomposition of the state space into two disjoint subsets is exhibited where, roughly speaking, one should apply a gradual action or an impulsive action correspondingly to obtain an optimal or ε-optimal strategy. An interesting consequence of our previous results is as follows: the set of strategies that allow interventions at time t = 0 and only immediately after natural jumps is a sufficient set for the control problem under consideration.

Download Full-text

Average optimal policies in Markov decision drift processes with applications to a queueing and a replacement model

Advances in Applied Probability ◽

10.2307/1426437 ◽

1983 ◽

Vol 15 (2) ◽

pp. 274-303 ◽

Cited By ~ 28

Author(s):

Arie Hordijk ◽

Frank A. Van Der Duyn Schouten

Keyword(s):

Markov Decision Processes ◽

Optimal Policy ◽

Continuous Time ◽

Sufficient Conditions ◽

Decision Processes ◽

Time Parameter ◽

Queueing Model ◽

Replacement Model ◽

Optimal Policies ◽

Markov Decision

Recently the authors introduced the concept of Markov decision drift processes. A Markov decision drift process can be seen as a straightforward generalization of a Markov decision process with continuous time parameter. In this paper we investigate the existence of stationary average optimal policies for Markov decision drift processes. Using a well-known Abelian theorem we derive sufficient conditions, which guarantee that a ‘limit point' of a sequence of discounted optimal policies with the discounting factor approaching 1 is an average optimal policy. An alternative set of sufficient conditions is obtained for the case in which the discounted optimal policies generate regenerative stochastic processes. The latter set of conditions is easier to verify in several applications. The results of this paper are also applicable to Markov decision processes with discrete or continuous time parameter and to semi-Markov decision processes. In this sense they generalize some well-known results for Markov decision processes with finite or compact action space. Applications to an M/M/1 queueing model and a maintenance replacement model are given. It is shown that under certain conditions on the model parameters the average optimal policy for the M/M/1 queueing model is monotone non-decreasing (as a function of the number of waiting customers) with respect to the service intensity and monotone non-increasing with respect to the arrival intensity. For the maintenance replacement model we prove the average optimality of a bang-bang type policy. Special attention is paid to the computation of the optimal control parameters.

Download Full-text

Functional Reward Markov Decision Processes: Theory and Applications

International Journal of Artificial Intelligence Tools ◽

10.1142/s0218213017600144 ◽

2017 ◽

Vol 26 (03) ◽

pp. 1760014

Author(s):

Paul Weng ◽

Olivier Spanjaard

Keyword(s):

Markov Decision Processes ◽

Infinite Horizon ◽

Standard Form ◽

Sufficient Conditions ◽

Decision Processes ◽

Markov Decision ◽

Standard Models ◽

Reward Functions ◽

Planning Problems ◽

Horizon Case

Markov decision processes (MDP) have become one of the standard models for decisiontheoretic planning problems under uncertainty. In its standard form, rewards are assumed to be numerical additive scalars. In this paper, we propose a generalization of this model allowing rewards to be functional. The value of a history is recursively computed by composing the reward functions. We show that several variants of MDPs presented in the literature can be instantiated in this setting. We then identify sufficient conditions on these reward functions for dynamic programming to be valid. We also discuss the infinite horizon case and the case where a maximum operator does not exist. In order to show the potential of our framework, we conclude the paper by presenting several illustrative examples.

Download Full-text

Average optimality for Markov decision processes in borel spaces: a new condition and approach

Journal of Applied Probability ◽

10.1017/s0021900200001662 ◽

2006 ◽

Vol 43 (02) ◽

pp. 318-334

Author(s):

Xianping Guo ◽

Quanxin Zhu

Keyword(s):

Markov Decision Processes ◽

Discrete Time ◽

Existence Of Solutions ◽

Sufficient Conditions ◽

Decision Processes ◽

Stationary Policy ◽

Markov Decision ◽

Optimal Stationary Policy ◽

Optimality Inequality ◽

Action Spaces

In this paper we study discrete-time Markov decision processes with Borel state and action spaces. The criterion is to minimize average expected costs, and the costs may have neither upper nor lower bounds. We first provide two average optimality inequalities of opposing directions and give conditions for the existence of solutions to them. Then, using the two inequalities, we ensure the existence of an average optimal (deterministic) stationary policy under additional continuity-compactness assumptions. Our conditions are slightly weaker than those in the previous literature. Also, some new sufficient conditions for the existence of an average optimal stationary policy are imposed on the primitive data of the model. Moreover, our approach is slightly different from the well-known ‘optimality inequality approach’ widely used in Markov decision processes. Finally, we illustrate our results in two examples.

Download Full-text

Generalized semi-Markov decision processes

Journal of Applied Probability ◽

10.1017/s0021900200107740 ◽

1979 ◽

Vol 16 (03) ◽

pp. 618-630

Author(s):

Bharat T. Doshi

Keyword(s):

Markov Decision Processes ◽

Sufficient Conditions ◽

Decision Processes ◽

The State ◽

Service Rate ◽

Storage Model ◽

Sample Paths ◽

Markov Decision ◽

Sufficient Conditions For Optimality ◽

Necessary And Sufficient

Various authors have derived the necessary and sufficient conditions for optimality in semi-Markov decision processes in which the state remains constant between jumps. In this paper similar results are presented for a generalized semi-Markov decision process in which the state varies between jumps according to a Markov process with continuous sample paths. These results are specialized to a general storage model and an application to the service rate control in a GI/G/1 queue is indicated.

Download Full-text

Extreme point characterization of constrained nonstationary infinite-horizon Markov decision processes with finite state space

Operations Research Letters ◽

10.1016/j.orl.2014.03.001 ◽

2014 ◽

Vol 42 (3) ◽

pp. 238-245

Author(s):

Ilbin Lee ◽

Marina A. Epelman ◽

H. Edwin Romeijn ◽

Robert L. Smith

Keyword(s):

State Space ◽

Extreme Point ◽

Markov Decision Processes ◽

Infinite Horizon ◽

Decision Processes ◽

Finite State ◽

Markov Decision ◽

Finite State Space

Download Full-text

Detecting optimal and non-optimal actions in average-cost Markov decision processes

Journal of Applied Probability ◽

10.1017/s0021900200099502 ◽

1994 ◽

Vol 31 (04) ◽

pp. 979-990

Author(s):

Jean B. Lasserre

Keyword(s):

Linear Programming ◽

Markov Decision Processes ◽

Average Cost ◽

Sufficient Conditions ◽

Iteration Scheme ◽

Policy Iteration ◽

Decision Processes ◽

Ergodic Average ◽

Linear Programming Methods ◽

Markov Decision

We present two sufficient conditions for detection of optimal and non-optimal actions in (ergodic) average-cost MDPs. They are easily interpreted and can be implemented as detection tests in both policy iteration and linear programming methods. An efficient implementation of a recent new policy iteration scheme is discussed.

Download Full-text

Average optimal policies in Markov decision drift processes with applications to a queueing and a replacement model

Advances in Applied Probability ◽

10.1017/s0001867800021182 ◽

1983 ◽

Vol 15 (02) ◽

pp. 274-303 ◽

Cited By ~ 3

Author(s):

Arie Hordijk ◽

Frank A. Van Der Duyn Schouten

Keyword(s):

Markov Decision Processes ◽

Optimal Policy ◽

Continuous Time ◽

Sufficient Conditions ◽

Decision Processes ◽

Time Parameter ◽

Queueing Model ◽

Replacement Model ◽

Optimal Policies ◽

Markov Decision

Recently the authors introduced the concept of Markov decision drift processes. A Markov decision drift process can be seen as a straightforward generalization of a Markov decision process with continuous time parameter. In this paper we investigate the existence of stationary average optimal policies for Markov decision drift processes. Using a well-known Abelian theorem we derive sufficient conditions, which guarantee that a ‘limit point' of a sequence of discounted optimal policies with the discounting factor approaching 1 is an average optimal policy. An alternative set of sufficient conditions is obtained for the case in which the discounted optimal policies generate regenerative stochastic processes. The latter set of conditions is easier to verify in several applications. The results of this paper are also applicable to Markov decision processes with discrete or continuous time parameter and to semi-Markov decision processes. In this sense they generalize some well-known results for Markov decision processes with finite or compact action space. Applications to an M/M/1 queueing model and a maintenance replacement model are given. It is shown that under certain conditions on the model parameters the average optimal policy for the M/M/1 queueing model is monotone non-decreasing (as a function of the number of waiting customers) with respect to the service intensity and monotone non-increasing with respect to the arrival intensity. For the maintenance replacement model we prove the average optimality of a bang-bang type policy. Special attention is paid to the computation of the optimal control parameters.

Download Full-text