The Expected Total Cost Criterion for Markov Decision Processes under Constraints: A Convex Analytic Approach

This paper deals with discrete-time Markov decision processes (MDPs) under constraints where all the objectives have the same form of expected total cost over the infinite time horizon. The existence of an optimal control policy is discussed by using the convex analytic approach. We work under the assumptions that the state and action spaces are general Borel spaces, and that the model is nonnegative, semicontinuous, and there exists an admissible solution with finite cost for the associated linear program. It is worth noting that, in contrast to the classical results in the literature, our hypotheses do not require the MDP to be transient or absorbing. Our first result ensures the existence of an optimal solution to the linear program given by an occupation measure of the process generated by a randomized stationary policy. Moreover, it is shown that this randomized stationary policy provides an optimal solution to this Markov control problem. As a consequence, these results imply that the set of randomized stationary policies is a sufficient set for this optimal control problem. Finally, our last main result states that all optimal solutions of the linear program coincide on a special set with an optimal occupation measure generated by a randomized stationary policy. Several examples are presented to illustrate some theoretical issues and the possible applications of the results developed in the paper.

Download Full-text

The Expected Total Cost Criterion for Markov Decision Processes under Constraints

Advances in Applied Probability ◽

10.1239/aap/1377868541 ◽

2013 ◽

Vol 45 (3) ◽

pp. 837-859 ◽

Cited By ~ 5

Author(s):

François Dufour ◽

A. B. Piunovskiy

Keyword(s):

Markov Decision Processes ◽

Optimal Solution ◽

Decision Processes ◽

Linear Program ◽

Programming Approach ◽

Stationary Policy ◽

Total Cost ◽

Optimal Value ◽

Markov Decision ◽

Expected Total Cost

In this work, we study discrete-time Markov decision processes (MDPs) with constraints when all the objectives have the same form of expected total cost over the infinite time horizon. Our objective is to analyze this problem by using the linear programming approach. Under some technical hypotheses, it is shown that if there exists an optimal solution for the associated linear program then there exists a randomized stationary policy which is optimal for the MDP, and that the optimal value of the linear program coincides with the optimal value of the constrained control problem. A second important result states that the set of randomized stationary policies provides a sufficient set for solving this MDP. It is important to note that, in contrast with the classical results of the literature, we do not assume the MDP to be transient or absorbing. More importantly, we do not impose the cost functions to be nonnegative or to be bounded below. Several examples are presented to illustrate our results.

Download Full-text

The Expected Total Cost Criterion for Markov Decision Processes under Constraints

Advances in Applied Probability ◽

10.1017/s0001867800006601 ◽

2013 ◽

Vol 45 (03) ◽

pp. 837-859 ◽

Cited By ~ 1

Author(s):

François Dufour ◽

A. B. Piunovskiy

Keyword(s):

Markov Decision Processes ◽

Optimal Solution ◽

Decision Processes ◽

Linear Program ◽

Programming Approach ◽

Stationary Policy ◽

Total Cost ◽

Optimal Value ◽

Markov Decision ◽

Expected Total Cost

In this work, we study discrete-time Markov decision processes (MDPs) with constraints when all the objectives have the same form of expected total cost over the infinite time horizon. Our objective is to analyze this problem by using the linear programming approach. Under some technical hypotheses, it is shown that if there exists an optimal solution for the associated linear program then there exists a randomized stationary policy which is optimal for the MDP, and that the optimal value of the linear program coincides with the optimal value of the constrained control problem. A second important result states that the set of randomized stationary policies provides a sufficient set for solving this MDP. It is important to note that, in contrast with the classical results of the literature, we do not assume the MDP to be transient or absorbing. More importantly, we do not impose the cost functions to be nonnegative or to be bounded below. Several examples are presented to illustrate our results.

Download Full-text

Constrained Markov decision processes with total cost criteria: Lagrangian approach and dual linear program

Mathematical Methods of Operations Research ◽

10.1007/s001860050035 ◽

1998 ◽

Vol 48 (3) ◽

pp. 387-417 ◽

Cited By ~ 6

Author(s):

Eitan Altman

Keyword(s):

Markov Decision Processes ◽

Decision Processes ◽

Linear Program ◽

Lagrangian Approach ◽

Total Cost ◽

Constrained Markov Decision Processes ◽

Markov Decision

Download Full-text

Temporal concatenation for Markov decision processes

Probability in the Engineering and Informational Sciences ◽

10.1017/s0269964821000206 ◽

2021 ◽

pp. 1-28

Author(s):

Ruiyang Song ◽

Kuang Xu

Keyword(s):

Markov Decision Processes ◽

Large Scale ◽

Optimal Solution ◽

Upper Bounds ◽

Black Box ◽

Decision Processes ◽

Optimal Solutions ◽

Wide Range ◽

Markov Decision ◽

Speed Up

We propose and analyze a temporal concatenation heuristic for solving large-scale finite-horizon Markov decision processes (MDP), which divides the MDP into smaller sub-problems along the time horizon and generates an overall solution by simply concatenating the optimal solutions from these sub-problems. As a “black box” architecture, temporal concatenation works with a wide range of existing MDP algorithms. Our main results characterize the regret of temporal concatenation compared to the optimal solution. We provide upper bounds for general MDP instances, as well as a family of MDP instances in which the upper bounds are shown to be tight. Together, our results demonstrate temporal concatenation's potential of substantial speed-up at the expense of some performance degradation.

Download Full-text

A Theory for Semi-Markov Decision Processes with Unbounded Costs and Its Application to the Optimal Control of Queueing Systems

10.21236/ada030649 ◽

1976 ◽

Author(s):

Peter Orkenyi

Keyword(s):

Optimal Control ◽

Markov Decision Processes ◽

Queueing Systems ◽

Decision Processes ◽

Markov Decision

Download Full-text

Impulsive Control for Continuous-Time Markov Decision Processes

Advances in Applied Probability ◽

10.1239/aap/1427814583 ◽

2015 ◽

Vol 47 (1) ◽

pp. 106-127 ◽

Cited By ~ 6

Author(s):

François Dufour ◽

Alexei B. Piunovskiy

Keyword(s):

Optimal Control ◽

Control Problem ◽

Markov Decision Processes ◽

Control Strategy ◽

Continuous Time ◽

Sufficient Conditions ◽

Decision Processes ◽

Optimal Control Strategy ◽

Optimality Equation ◽

Markov Decision

In this paper our objective is to study continuous-time Markov decision processes on a general Borel state space with both impulsive and continuous controls for the infinite time horizon discounted cost. The continuous-time controlled process is shown to be nonexplosive under appropriate hypotheses. The so-called Bellman equation associated to this control problem is studied. Sufficient conditions ensuring the existence and the uniqueness of a bounded measurable solution to this optimality equation are provided. Moreover, it is shown that the value function of the optimization problem under consideration satisfies this optimality equation. Sufficient conditions are also presented to ensure on the one hand the existence of an optimal control strategy, and on the other hand the existence of a ε-optimal control strategy. The decomposition of the state space into two disjoint subsets is exhibited where, roughly speaking, one should apply a gradual action or an impulsive action correspondingly to obtain an optimal or ε-optimal strategy. An interesting consequence of our previous results is as follows: the set of strategies that allow interventions at time t = 0 and only immediately after natural jumps is a sufficient set for the control problem under consideration.

Download Full-text

New discount and average optimality conditions for continuous-time Markov decision processes

Advances in Applied Probability ◽

10.1017/s000186780000447x ◽

2010 ◽

Vol 42 (04) ◽

pp. 953-985 ◽

Cited By ~ 2

Author(s):

Xianping Guo ◽

Liuer Ye

Keyword(s):

Markov Decision Processes ◽

Continuous Time ◽

Average Cost ◽

Nonnegative Solution ◽

Decision Processes ◽

Stationary Policy ◽

Discounted Cost ◽

Markov Decision ◽

Optimal Stationary Policy ◽

Bounded Below

This paper deals with continuous-time Markov decision processes in Polish spaces, under the discounted and average cost criteria. All underlying Markov processes are determined by given transition rates which are allowed to be unbounded, and the costs are assumed to be bounded below. By introducing an occupation measure of a randomized Markov policy and analyzing properties of occupation measures, we first show that the family of all randomized stationary policies is ‘sufficient’ within the class of all randomized Markov policies. Then, under the semicontinuity and compactness conditions, we prove the existence of a discounted cost optimal stationary policy by providing a value iteration technique. Moreover, by developing a new average cost, minimum nonnegative solution method, we prove the existence of an average cost optimal stationary policy under some reasonably mild conditions. Finally, we use some examples to illustrate applications of our results. Except that the costs are assumed to be bounded below, the conditions for the existence of discounted cost (or average cost) optimal policies are much weaker than those in the previous literature, and the minimum nonnegative solution approach is new.

Download Full-text

Optimal control in Markov decision processes via distributed optimization

2015 54th IEEE Conference on Decision and Control (CDC) ◽

10.1109/cdc.2015.7403398 ◽

2015 ◽

Cited By ~ 4

Author(s):

Jie Fu ◽

Shuo Han ◽

Ufuk Topcu

Keyword(s):

Optimal Control ◽

Markov Decision Processes ◽

Distributed Optimization ◽

Decision Processes ◽

Markov Decision

Download Full-text

Impulsive Control for Continuous-Time Markov Decision Processes

Advances in Applied Probability ◽

10.1017/s0001867800007722 ◽

2015 ◽

Vol 47 (01) ◽

pp. 106-127 ◽

Cited By ~ 2

Author(s):

François Dufour ◽

Alexei B. Piunovskiy

Keyword(s):

Optimal Control ◽

Control Problem ◽

Markov Decision Processes ◽

Control Strategy ◽

Continuous Time ◽

Sufficient Conditions ◽

Decision Processes ◽

Optimal Control Strategy ◽

Optimality Equation ◽

Markov Decision

In this paper our objective is to study continuous-time Markov decision processes on a general Borel state space with both impulsive and continuous controls for the infinite time horizon discounted cost. The continuous-time controlled process is shown to be nonexplosive under appropriate hypotheses. The so-called Bellman equation associated to this control problem is studied. Sufficient conditions ensuring the existence and the uniqueness of a bounded measurable solution to this optimality equation are provided. Moreover, it is shown that the value function of the optimization problem under consideration satisfies this optimality equation. Sufficient conditions are also presented to ensure on the one hand the existence of an optimal control strategy, and on the other hand the existence of a ε-optimal control strategy. The decomposition of the state space into two disjoint subsets is exhibited where, roughly speaking, one should apply a gradual action or an impulsive action correspondingly to obtain an optimal or ε-optimal strategy. An interesting consequence of our previous results is as follows: the set of strategies that allow interventions at time t = 0 and only immediately after natural jumps is a sufficient set for the control problem under consideration.

Download Full-text