On Linear Programming for Constrained and Unconstrained Average-Cost Markov Decision Processes with Countable Action Spaces and Strictly Unbounded Costs

Average Cost ◽

Decision Processes ◽

The State ◽

Programming Approach ◽

One Stage ◽

Markov Decision ◽

Action Spaces ◽

Borel Measurable

We consider the linear programming approach for constrained and unconstrained Markov decision processes (MDPs) under the long-run average-cost criterion, where the class of MDPs in our study have Borel state spaces and discrete countable action spaces. Under a strict unboundedness condition on the one-stage costs and a recently introduced majorization condition on the state transition stochastic kernel, we study infinite-dimensional linear programs for the average-cost MDPs and prove the absence of a duality gap and other optimality results. Our results do not require a lower-semicontinuous MDP model. Thus, they can be applied to countable action space MDPs where the dynamics and one-stage costs are discontinuous in the state variable. Our proofs make use of the continuity property of Borel measurable functions asserted by Lusin’s theorem.

Average Cost Semi-Markov Decision Processes and the Control of Queueing Systems

Probability in the Engineering and Informational Sciences ◽

10.1017/s0269964800001121 ◽

1989 ◽

Vol 3 (2) ◽

pp. 247-272 ◽

Cited By ~ 47

Author(s):

Linn I. Sennott

Keyword(s):

Average Cost ◽

Queueing Systems ◽

Decision Processes ◽

Single Server ◽

Stationary Policy ◽

Markov Decision ◽

Optimal Stationary Policy ◽

Poisson Arrivals ◽

Action Spaces

Semi-Markov decision processes underlie the control of many queueing systems. In this paper, we deal with infinite state semi-Markov decision processes with nonnegative, unbounded costs and finite action sets. Axioms for the existence of an expected average cost optimal stationary policy are presented. These conditions generalize the work in Sennott [22] for Markov decision processes. Verifiable conditions for the axioms to hold are obtained. The theory is applied to control of the M/G/l queue with variable service parameter, with on-off server, and with batch processing, and to control of the G/M/m queue with variable arrival parameter and customer rejection. It is applied to a timesharing network of queues with a single server and finally to optimal routing of Poisson arrivals to parallel exponential servers. The final section extends the existence result to compact action spaces.

On the Minimum Pair Approach for Average Cost Markov Decision Processes with Countable Discrete Action Spaces and Strictly Unbounded Costs

SIAM Journal on Control and Optimization ◽

10.1137/19m1247395 ◽

2020 ◽

Vol 58 (2) ◽

pp. 660-685 ◽

Cited By ~ 1

Author(s):

Huizhen Yu

Keyword(s):

Average Cost ◽

Decision Processes ◽

Markov Decision ◽

Discrete Action ◽

Action Spaces

Detecting optimal and non-optimal actions in average-cost Markov decision processes

Journal of Applied Probability ◽

10.1017/s0021900200099502 ◽

1994 ◽

Vol 31 (04) ◽

pp. 979-990

Author(s):

Jean B. Lasserre

Keyword(s):

Linear Programming ◽

Average Cost ◽

Sufficient Conditions ◽

Iteration Scheme ◽

Policy Iteration ◽

Decision Processes ◽

Ergodic Average ◽

Linear Programming Methods ◽

We present two sufficient conditions for detection of optimal and non-optimal actions in (ergodic) average-cost MDPs. They are easily interpreted and can be implemented as detection tests in both policy iteration and linear programming methods. An efficient implementation of a recent new policy iteration scheme is discussed.

Impulsive Control for Continuous-Time Markov Decision Processes: A Linear Programming Approach

Applied Mathematics & Optimization ◽

10.1007/s00245-015-9310-8 ◽

2015 ◽

Vol 74 (1) ◽

pp. 129-161 ◽

Cited By ~ 8

Author(s):

F. Dufour ◽

A. B. Piunovskiy

Keyword(s):

Linear Programming ◽

Continuous Time ◽

Impulsive Control ◽

Decision Processes ◽

Programming Approach ◽

Linear Programming Approach ◽

A Linear Programming Approach to Nonstationary Infinite-Horizon Markov Decision Processes

Operations Research ◽

10.1287/opre.1120.1121 ◽

2013 ◽

Vol 61 (2) ◽

pp. 413-425 ◽

Cited By ~ 19

Author(s):

Archis Ghate ◽

Robert L. Smith

Keyword(s):

Linear Programming ◽

Infinite Horizon ◽

Decision Processes ◽

Programming Approach ◽

Linear Programming Approach ◽

Detecting optimal and non-optimal actions in average-cost Markov decision processes

Journal of Applied Probability ◽

10.2307/3215322 ◽

1994 ◽

Vol 31 (4) ◽

pp. 979-990 ◽

Cited By ~ 3

Author(s):

Jean B. Lasserre

Keyword(s):

Linear Programming ◽

Average Cost ◽

Sufficient Conditions ◽

Iteration Scheme ◽

Policy Iteration ◽

Decision Processes ◽

Ergodic Average ◽

Linear Programming Methods ◽

A Convex Programming Approach for Discrete-Time Markov Decision Processes under the Expected Total Reward Criterion

SIAM Journal on Control and Optimization ◽

10.1137/19m1255811 ◽

2020 ◽

Vol 58 (4) ◽

pp. 2535-2566

Author(s):

François Dufour ◽

Alexandre Genadot

Keyword(s):

Convex Programming ◽

Discrete Time ◽

Decision Processes ◽

Programming Approach ◽

Total Reward ◽

Markov Decision ◽

Reward Criterion

Linear programming formulations of Markov decision processes

Operations Research Letters ◽

10.1016/0167-6377(86)90094-5 ◽

1986 ◽

Vol 5 (1) ◽

pp. 13-16 ◽

Cited By ~ 3

Author(s):

J.L. Nazareth ◽

R.B. Kulkarni

Keyword(s):

Linear Programming ◽

Decision Processes ◽

Learning algorithms for Markov decision processes

Journal of Applied Probability ◽

10.1017/s0021900200030825 ◽

1987 ◽

Vol 24 (01) ◽

pp. 270-276

Author(s):

Masami Kurano

Keyword(s):

Optimal Policy ◽

Learning Algorithm ◽

Learning Algorithms ◽

Decision Processes ◽

The State ◽

Reward Structure ◽

Adaptive Policy ◽

Markov Decision ◽

Reward Criterion

This study is concerned with finite Markov decision processes whose dynamics and reward structure are unknown but the state is observable exactly. We establish a learning algorithm which yields an optimal policy and construct an adaptive policy which is optimal under the average expected reward criterion.

New discount and average optimality conditions for continuous-time Markov decision processes

Advances in Applied Probability ◽

10.1017/s000186780000447x ◽

2010 ◽

Vol 42 (04) ◽

pp. 953-985 ◽

Cited By ~ 2

Author(s):

Xianping Guo ◽

Liuer Ye

Keyword(s):

Continuous Time ◽

Average Cost ◽

Nonnegative Solution ◽

Decision Processes ◽

Stationary Policy ◽

Discounted Cost ◽

Markov Decision ◽

Optimal Stationary Policy ◽

Bounded Below

This paper deals with continuous-time Markov decision processes in Polish spaces, under the discounted and average cost criteria. All underlying Markov processes are determined by given transition rates which are allowed to be unbounded, and the costs are assumed to be bounded below. By introducing an occupation measure of a randomized Markov policy and analyzing properties of occupation measures, we first show that the family of all randomized stationary policies is ‘sufficient’ within the class of all randomized Markov policies. Then, under the semicontinuity and compactness conditions, we prove the existence of a discounted cost optimal stationary policy by providing a value iteration technique. Moreover, by developing a new average cost, minimum nonnegative solution method, we prove the existence of an average cost optimal stationary policy under some reasonably mild conditions. Finally, we use some examples to illustrate applications of our results. Except that the costs are assumed to be bounded below, the conditions for the existence of discounted cost (or average cost) optimal policies are much weaker than those in the previous literature, and the minimum nonnegative solution approach is new.