Generalized semi-Markov decision processes

Various authors have derived the necessary and sufficient conditions for optimality in semi-Markov decision processes in which the state remains constant between jumps. In this paper similar results are presented for a generalized semi-Markov decision process in which the state varies between jumps according to a Markov process with continuous sample paths. These results are specialized to a general storage model and an application to the service rate control in a GI/G/1 queue is indicated.

Download Full-text

A semimartingale characterization of average optimal stationary policies for Markov decision processes

Journal of Applied Mathematics and Stochastic Analysis ◽

10.1155/jamsa/2006/81593 ◽

2006 ◽

Vol 2006 ◽

pp. 1-8 ◽

Cited By ~ 1

Author(s):

Quanxin Zhu ◽

Xianping Guo

Keyword(s):

Markov Decision Processes ◽

Sufficient Conditions ◽

Decision Processes ◽

Stationary Policy ◽

Optimal Policies ◽

Markov Decision ◽

Optimal Stationary Policy ◽

Necessary And Sufficient ◽

Action Spaces

This paper deals with discrete-time Markov decision processes with Borel state and action spaces. The criterion to be minimized is the average expected costs, and the costs may have neither upper nor lower bounds. In our former paper (to appear in Journal of Applied Probability), weaker conditions are proposed to ensure the existence of average optimal stationary policies. In this paper, we further study some properties of optimal policies. Under these weaker conditions, we not only obtain two necessary and sufficient conditions for optimal policies, but also give a semimartingale characterization of an average optimal stationary policy.

Download Full-text

On a reduction principle in dynamic programming

Advances in Applied Probability ◽

10.1017/s0001867800018401 ◽

1988 ◽

Vol 20 (04) ◽

pp. 836-851

Author(s):

K. D. Glazebrook

Keyword(s):

Dynamic Programming ◽

Markov Decision Processes ◽

Sufficient Conditions ◽

Optimal Strategies ◽

Decision Processes ◽

Reduction Principle ◽

The Status ◽

Markov Decision ◽

The Individual ◽

Necessary And Sufficient

Whittle enunciated an important reduction principle in dynamic programming when he showed that under certain conditions optimal strategies for Markov decision processes (MDPs) placed in parallel to one another take actions in a way which is consistent with the optimal strategies for the individual MDPs. However, the necessary and sufficient conditions given by Whittle are by no means always satisfied. We explore the status of this computationally attractive reduction principle when these conditions fail.

Download Full-text

On a reduction principle in dynamic programming

Advances in Applied Probability ◽

10.2307/1427363 ◽

1988 ◽

Vol 20 (4) ◽

pp. 836-851 ◽

Cited By ~ 3

Author(s):

K. D. Glazebrook

Keyword(s):

Dynamic Programming ◽

Markov Decision Processes ◽

Sufficient Conditions ◽

Optimal Strategies ◽

Decision Processes ◽

Reduction Principle ◽

The Status ◽

Markov Decision ◽

The Individual ◽

Necessary And Sufficient

Whittle enunciated an important reduction principle in dynamic programming when he showed that under certain conditions optimal strategies for Markov decision processes (MDPs) placed in parallel to one another take actions in a way which is consistent with the optimal strategies for the individual MDPs. However, the necessary and sufficient conditions given by Whittle are by no means always satisfied. We explore the status of this computationally attractive reduction principle when these conditions fail.

Download Full-text

Extreme-point solutions in Markov decision processes

Journal of Applied Probability ◽

10.1017/s002190020002413x ◽

1983 ◽

Vol 20 (04) ◽

pp. 835-842

Author(s):

David Assaf

Keyword(s):

Convex Function ◽

Extreme Point ◽

Markov Decision Processes ◽

Convex Functions ◽

Sufficient Conditions ◽

Decision Processes ◽

Markov Decision ◽

Full Solution

The paper presents sufficient conditions for certain functions to be convex. Functions of this type often appear in Markov decision processes, where their maximum is the solution of the problem. Since a convex function takes its maximum at an extreme point, the conditions may greatly simplify a problem. In some cases a full solution may be obtained after the reduction is made. Some illustrative examples are discussed.

Download Full-text

Impulsive Control for Continuous-Time Markov Decision Processes

Advances in Applied Probability ◽

10.1239/aap/1427814583 ◽

2015 ◽

Vol 47 (1) ◽

pp. 106-127 ◽

Cited By ~ 6

Author(s):

François Dufour ◽

Alexei B. Piunovskiy

Keyword(s):

Optimal Control ◽

Control Problem ◽

Markov Decision Processes ◽

Control Strategy ◽

Continuous Time ◽

Sufficient Conditions ◽

Decision Processes ◽

Optimal Control Strategy ◽

Optimality Equation ◽

Markov Decision

In this paper our objective is to study continuous-time Markov decision processes on a general Borel state space with both impulsive and continuous controls for the infinite time horizon discounted cost. The continuous-time controlled process is shown to be nonexplosive under appropriate hypotheses. The so-called Bellman equation associated to this control problem is studied. Sufficient conditions ensuring the existence and the uniqueness of a bounded measurable solution to this optimality equation are provided. Moreover, it is shown that the value function of the optimization problem under consideration satisfies this optimality equation. Sufficient conditions are also presented to ensure on the one hand the existence of an optimal control strategy, and on the other hand the existence of a ε-optimal control strategy. The decomposition of the state space into two disjoint subsets is exhibited where, roughly speaking, one should apply a gradual action or an impulsive action correspondingly to obtain an optimal or ε-optimal strategy. An interesting consequence of our previous results is as follows: the set of strategies that allow interventions at time t = 0 and only immediately after natural jumps is a sufficient set for the control problem under consideration.

Download Full-text

Learning algorithms for Markov decision processes

Journal of Applied Probability ◽

10.1017/s0021900200030825 ◽

1987 ◽

Vol 24 (01) ◽

pp. 270-276

Author(s):

Masami Kurano

Keyword(s):

Markov Decision Processes ◽

Optimal Policy ◽

Learning Algorithm ◽

Learning Algorithms ◽

Decision Processes ◽

The State ◽

Reward Structure ◽

Adaptive Policy ◽

Markov Decision ◽

Reward Criterion

This study is concerned with finite Markov decision processes whose dynamics and reward structure are unknown but the state is observable exactly. We establish a learning algorithm which yields an optimal policy and construct an adaptive policy which is optimal under the average expected reward criterion.

Download Full-text

Impulsive Control for Continuous-Time Markov Decision Processes

Advances in Applied Probability ◽

10.1017/s0001867800007722 ◽

2015 ◽

Vol 47 (01) ◽

pp. 106-127 ◽

Cited By ~ 2

Author(s):

François Dufour ◽

Alexei B. Piunovskiy

Keyword(s):

Optimal Control ◽

Control Problem ◽

Markov Decision Processes ◽

Control Strategy ◽

Continuous Time ◽

Sufficient Conditions ◽

Decision Processes ◽

Optimal Control Strategy ◽

Optimality Equation ◽

Markov Decision

In this paper our objective is to study continuous-time Markov decision processes on a general Borel state space with both impulsive and continuous controls for the infinite time horizon discounted cost. The continuous-time controlled process is shown to be nonexplosive under appropriate hypotheses. The so-called Bellman equation associated to this control problem is studied. Sufficient conditions ensuring the existence and the uniqueness of a bounded measurable solution to this optimality equation are provided. Moreover, it is shown that the value function of the optimization problem under consideration satisfies this optimality equation. Sufficient conditions are also presented to ensure on the one hand the existence of an optimal control strategy, and on the other hand the existence of a ε-optimal control strategy. The decomposition of the state space into two disjoint subsets is exhibited where, roughly speaking, one should apply a gradual action or an impulsive action correspondingly to obtain an optimal or ε-optimal strategy. An interesting consequence of our previous results is as follows: the set of strategies that allow interventions at time t = 0 and only immediately after natural jumps is a sufficient set for the control problem under consideration.

Download Full-text

Average optimal policies in Markov decision drift processes with applications to a queueing and a replacement model

Advances in Applied Probability ◽

10.2307/1426437 ◽

1983 ◽

Vol 15 (2) ◽

pp. 274-303 ◽

Cited By ~ 28

Author(s):

Arie Hordijk ◽

Frank A. Van Der Duyn Schouten

Keyword(s):

Markov Decision Processes ◽

Optimal Policy ◽

Continuous Time ◽

Sufficient Conditions ◽

Decision Processes ◽

Time Parameter ◽

Queueing Model ◽

Replacement Model ◽

Optimal Policies ◽

Markov Decision

Recently the authors introduced the concept of Markov decision drift processes. A Markov decision drift process can be seen as a straightforward generalization of a Markov decision process with continuous time parameter. In this paper we investigate the existence of stationary average optimal policies for Markov decision drift processes. Using a well-known Abelian theorem we derive sufficient conditions, which guarantee that a ‘limit point' of a sequence of discounted optimal policies with the discounting factor approaching 1 is an average optimal policy. An alternative set of sufficient conditions is obtained for the case in which the discounted optimal policies generate regenerative stochastic processes. The latter set of conditions is easier to verify in several applications. The results of this paper are also applicable to Markov decision processes with discrete or continuous time parameter and to semi-Markov decision processes. In this sense they generalize some well-known results for Markov decision processes with finite or compact action space. Applications to an M/M/1 queueing model and a maintenance replacement model are given. It is shown that under certain conditions on the model parameters the average optimal policy for the M/M/1 queueing model is monotone non-decreasing (as a function of the number of waiting customers) with respect to the service intensity and monotone non-increasing with respect to the arrival intensity. For the maintenance replacement model we prove the average optimality of a bang-bang type policy. Special attention is paid to the computation of the optimal control parameters.

Download Full-text

Functional Reward Markov Decision Processes: Theory and Applications

International Journal of Artificial Intelligence Tools ◽

10.1142/s0218213017600144 ◽

2017 ◽

Vol 26 (03) ◽

pp. 1760014

Author(s):

Paul Weng ◽

Olivier Spanjaard

Keyword(s):

Markov Decision Processes ◽

Infinite Horizon ◽

Standard Form ◽

Sufficient Conditions ◽

Decision Processes ◽

Markov Decision ◽

Standard Models ◽

Reward Functions ◽

Planning Problems ◽

Horizon Case

Markov decision processes (MDP) have become one of the standard models for decisiontheoretic planning problems under uncertainty. In its standard form, rewards are assumed to be numerical additive scalars. In this paper, we propose a generalization of this model allowing rewards to be functional. The value of a history is recursively computed by composing the reward functions. We show that several variants of MDPs presented in the literature can be instantiated in this setting. We then identify sufficient conditions on these reward functions for dynamic programming to be valid. We also discuss the infinite horizon case and the case where a maximum operator does not exist. In order to show the potential of our framework, we conclude the paper by presenting several illustrative examples.

Download Full-text