Constrained Markov decision processes with total cost criteria: Lagrangian approach and dual linear program

This paper deals with discrete-time Markov decision processes (MDPs) under constraints where all the objectives have the same form of expected total cost over the infinite time horizon. The existence of an optimal control policy is discussed by using the convex analytic approach. We work under the assumptions that the state and action spaces are general Borel spaces, and that the model is nonnegative, semicontinuous, and there exists an admissible solution with finite cost for the associated linear program. It is worth noting that, in contrast to the classical results in the literature, our hypotheses do not require the MDP to be transient or absorbing. Our first result ensures the existence of an optimal solution to the linear program given by an occupation measure of the process generated by a randomized stationary policy. Moreover, it is shown that this randomized stationary policy provides an optimal solution to this Markov control problem. As a consequence, these results imply that the set of randomized stationary policies is a sufficient set for this optimal control problem. Finally, our last main result states that all optimal solutions of the linear program coincide on a special set with an optimal occupation measure generated by a randomized stationary policy. Several examples are presented to illustrate some theoretical issues and the possible applications of the results developed in the paper.

Download Full-text

Constrained Markov decision processes with total cost criteria: Occupation measures and primal LP

Mathematical Methods of Operations Research ◽

10.1007/bf01303434 ◽

1996 ◽

Vol 43 (1) ◽

pp. 45-72 ◽

Cited By ~ 14

Author(s):

Eitan Altman

Keyword(s):

Markov Decision Processes ◽

Decision Processes ◽

Total Cost ◽

Occupation Measures ◽

Constrained Markov Decision Processes ◽

Markov Decision

Download Full-text

The Expected Total Cost Criterion for Markov Decision Processes under Constraints

Advances in Applied Probability ◽

10.1239/aap/1377868541 ◽

2013 ◽

Vol 45 (3) ◽

pp. 837-859 ◽

Cited By ~ 5

Author(s):

François Dufour ◽

A. B. Piunovskiy

Keyword(s):

Markov Decision Processes ◽

Optimal Solution ◽

Decision Processes ◽

Linear Program ◽

Programming Approach ◽

Stationary Policy ◽

Total Cost ◽

Optimal Value ◽

Markov Decision ◽

Expected Total Cost

In this work, we study discrete-time Markov decision processes (MDPs) with constraints when all the objectives have the same form of expected total cost over the infinite time horizon. Our objective is to analyze this problem by using the linear programming approach. Under some technical hypotheses, it is shown that if there exists an optimal solution for the associated linear program then there exists a randomized stationary policy which is optimal for the MDP, and that the optimal value of the linear program coincides with the optimal value of the constrained control problem. A second important result states that the set of randomized stationary policies provides a sufficient set for solving this MDP. It is important to note that, in contrast with the classical results of the literature, we do not assume the MDP to be transient or absorbing. More importantly, we do not impose the cost functions to be nonnegative or to be bounded below. Several examples are presented to illustrate our results.

Download Full-text

The Expected Total Cost Criterion for Markov Decision Processes under Constraints: A Convex Analytic Approach

Advances in Applied Probability ◽

10.1017/s0001867800005875 ◽

2012 ◽

Vol 44 (03) ◽

pp. 774-793 ◽

Cited By ~ 4

Author(s):

François Dufour ◽

M. Horiguchi ◽

A. B. Piunovskiy

Keyword(s):

Optimal Control ◽

Markov Decision Processes ◽

Optimal Solution ◽

Decision Processes ◽

Linear Program ◽

Occupation Measure ◽

Stationary Policy ◽

Total Cost ◽

Markov Decision ◽

Expected Total Cost

This paper deals with discrete-time Markov decision processes (MDPs) under constraints where all the objectives have the same form of expected total cost over the infinite time horizon. The existence of an optimal control policy is discussed by using the convex analytic approach. We work under the assumptions that the state and action spaces are general Borel spaces, and that the model is nonnegative, semicontinuous, and there exists an admissible solution with finite cost for the associated linear program. It is worth noting that, in contrast to the classical results in the literature, our hypotheses do not require the MDP to be transient or absorbing. Our first result ensures the existence of an optimal solution to the linear program given by an occupation measure of the process generated by a randomized stationary policy. Moreover, it is shown that this randomized stationary policy provides an optimal solution to this Markov control problem. As a consequence, these results imply that the set of randomized stationary policies is a sufficient set for this optimal control problem. Finally, our last main result states that all optimal solutions of the linear program coincide on a special set with an optimal occupation measure generated by a randomized stationary policy. Several examples are presented to illustrate some theoretical issues and the possible applications of the results developed in the paper.

Download Full-text

The Expected Total Cost Criterion for Markov Decision Processes under Constraints

Advances in Applied Probability ◽

10.1017/s0001867800006601 ◽

2013 ◽

Vol 45 (03) ◽

pp. 837-859 ◽

Cited By ~ 1

Author(s):

François Dufour ◽

A. B. Piunovskiy

Keyword(s):

Markov Decision Processes ◽

Optimal Solution ◽

Decision Processes ◽

Linear Program ◽

Programming Approach ◽

Stationary Policy ◽

Total Cost ◽

Optimal Value ◽

Markov Decision ◽

Expected Total Cost

In this work, we study discrete-time Markov decision processes (MDPs) with constraints when all the objectives have the same form of expected total cost over the infinite time horizon. Our objective is to analyze this problem by using the linear programming approach. Under some technical hypotheses, it is shown that if there exists an optimal solution for the associated linear program then there exists a randomized stationary policy which is optimal for the MDP, and that the optimal value of the linear program coincides with the optimal value of the constrained control problem. A second important result states that the set of randomized stationary policies provides a sufficient set for solving this MDP. It is important to note that, in contrast with the classical results of the literature, we do not assume the MDP to be transient or absorbing. More importantly, we do not impose the cost functions to be nonnegative or to be bounded below. Several examples are presented to illustrate our results.

Download Full-text