Detecting optimal and non-optimal actions in average-cost Markov decision processes

We present two sufficient conditions for detection of optimal and non-optimal actions in (ergodic) average-cost MDPs. They are easily interpreted and can be implemented as detection tests in both policy iteration and linear programming methods. An efficient implementation of a recent new policy iteration scheme is discussed.

Download Full-text

Discounted semi-Markov decision processes: linear programming and policy iteration

Statistica Neerlandica ◽

10.1111/j.1467-9574.1975.tb00238.x ◽

1975 ◽

Vol 29 (1) ◽

pp. 1-7 ◽

Cited By ~ 10

Author(s):

J. Wessels ◽

J. A. E. E. van Nunen

Keyword(s):

Linear Programming ◽

Markov Decision Processes ◽

Policy Iteration ◽

Decision Processes ◽

Markov Decision

Download Full-text

Policy iteration and Newton-Raphson methods for Markov decision processes under average cost criterion

Computers & Mathematics with Applications ◽

10.1016/0898-1221(92)90240-i ◽

1992 ◽

Vol 24 (1-2) ◽

pp. 147-155 ◽

Cited By ~ 3

Author(s):

Masamitsu Ohnishi

Keyword(s):

Markov Decision Processes ◽

Average Cost ◽

Policy Iteration ◽

Decision Processes ◽

Average Cost Criterion ◽

Cost Criterion ◽

Markov Decision ◽

Newton Raphson

Download Full-text

On Linear Programming for Constrained and Unconstrained Average-Cost Markov Decision Processes with Countable Action Spaces and Strictly Unbounded Costs

Mathematics of Operations Research ◽

10.1287/moor.2021.1177 ◽

2021 ◽

Author(s):

Huizhen Yu

Keyword(s):

Linear Programming ◽

Markov Decision Processes ◽

Average Cost ◽

Decision Processes ◽

The State ◽

Programming Approach ◽

One Stage ◽

Markov Decision ◽

Action Spaces ◽

Borel Measurable

We consider the linear programming approach for constrained and unconstrained Markov decision processes (MDPs) under the long-run average-cost criterion, where the class of MDPs in our study have Borel state spaces and discrete countable action spaces. Under a strict unboundedness condition on the one-stage costs and a recently introduced majorization condition on the state transition stochastic kernel, we study infinite-dimensional linear programs for the average-cost MDPs and prove the absence of a duality gap and other optimality results. Our results do not require a lower-semicontinuous MDP model. Thus, they can be applied to countable action space MDPs where the dynamics and one-stage costs are discontinuous in the state variable. Our proofs make use of the continuity property of Borel measurable functions asserted by Lusin’s theorem.

Download Full-text

A Simulation-Based Policy Iteration Algorithm for Average Cost Unichain Markov Decision Processes

Operations Research/Computer Science Interfaces Series - Computing Tools for Modeling, Optimization and Simulation ◽

10.1007/978-1-4615-4567-5_9 ◽

2000 ◽

pp. 161-182 ◽

Cited By ~ 4

Author(s):

Ying He ◽

Michael C. Fu ◽

Steven I. Marcus

Keyword(s):

Markov Decision Processes ◽

Average Cost ◽

Policy Iteration ◽

Decision Processes ◽

Iteration Algorithm ◽

Simulation Based ◽

Markov Decision ◽

Policy Iteration Algorithm

Download Full-text

A new policy iteration scheme for Markov decision processes using Schweitzer's formula

Journal of Applied Probability ◽

10.1017/s0021900200107521 ◽

1994 ◽

Vol 31 (01) ◽

pp. 268-273 ◽

Cited By ~ 2

Author(s):

J. B. Lasserre

Keyword(s):

Markov Decision Processes ◽

Exact Formula ◽

Iteration Scheme ◽

Policy Iteration ◽

Decision Processes ◽

Steady State Probability ◽

Markov Decision ◽

One Step ◽

New Criterion ◽

And Storage

Given a family of Markov chains with a single recurrent class, we present a potential application of Schweitzer's exact formula relating the steady-state probability and fundamental matrices of any two chains in the family. We propose a new policy iteration scheme for Markov decision processes where in contrast to policy iteration, the new criterion for selecting an action ensures the maximal one-step average cost improvement. Its computational complexity and storage requirement are analysed.

Download Full-text

A new policy iteration scheme for Markov decision processes using Schweitzer's formula

Journal of Applied Probability ◽

10.2307/3215254 ◽

1994 ◽

Vol 31 (1) ◽

pp. 268-273 ◽

Cited By ~ 3

Author(s):

J. B. Lasserre

Keyword(s):

Markov Decision Processes ◽

Exact Formula ◽

Iteration Scheme ◽

Policy Iteration ◽

Decision Processes ◽

Steady State Probability ◽

Markov Decision ◽

One Step ◽

New Criterion ◽

And Storage

Given a family of Markov chains with a single recurrent class, we present a potential application of Schweitzer's exact formula relating the steady-state probability and fundamental matrices of any two chains in the family. We propose a new policy iteration scheme for Markov decision processes where in contrast to policy iteration, the new criterion for selecting an action ensures the maximal one-step average cost improvement. Its computational complexity and storage requirement are analysed.

Download Full-text

Extreme-point solutions in Markov decision processes

Journal of Applied Probability ◽

10.1017/s002190020002413x ◽

1983 ◽

Vol 20 (04) ◽

pp. 835-842

Author(s):

David Assaf

Keyword(s):

Convex Function ◽

Extreme Point ◽

Markov Decision Processes ◽

Convex Functions ◽

Sufficient Conditions ◽

Decision Processes ◽

Markov Decision ◽

Full Solution

The paper presents sufficient conditions for certain functions to be convex. Functions of this type often appear in Markov decision processes, where their maximum is the solution of the problem. Since a convex function takes its maximum at an extreme point, the conditions may greatly simplify a problem. In some cases a full solution may be obtained after the reduction is made. Some illustrative examples are discussed.

Download Full-text

Policy iteration for parameterized Markov decision processes and its application

2013 9th Asian Control Conference (ASCC) ◽

10.1109/ascc.2013.6606023 ◽

2013 ◽

Cited By ~ 2

Author(s):

Li Xia ◽

Qing-Shan Jia

Keyword(s):

Markov Decision Processes ◽

Policy Iteration ◽

Decision Processes ◽

Markov Decision

Download Full-text

Impulsive Control for Continuous-Time Markov Decision Processes

Advances in Applied Probability ◽

10.1239/aap/1427814583 ◽

2015 ◽

Vol 47 (1) ◽

pp. 106-127 ◽

Cited By ~ 6

Author(s):

François Dufour ◽

Alexei B. Piunovskiy

Keyword(s):

Optimal Control ◽

Control Problem ◽

Markov Decision Processes ◽

Control Strategy ◽

Continuous Time ◽

Sufficient Conditions ◽

Decision Processes ◽

Optimal Control Strategy ◽

Optimality Equation ◽

Markov Decision

In this paper our objective is to study continuous-time Markov decision processes on a general Borel state space with both impulsive and continuous controls for the infinite time horizon discounted cost. The continuous-time controlled process is shown to be nonexplosive under appropriate hypotheses. The so-called Bellman equation associated to this control problem is studied. Sufficient conditions ensuring the existence and the uniqueness of a bounded measurable solution to this optimality equation are provided. Moreover, it is shown that the value function of the optimization problem under consideration satisfies this optimality equation. Sufficient conditions are also presented to ensure on the one hand the existence of an optimal control strategy, and on the other hand the existence of a ε-optimal control strategy. The decomposition of the state space into two disjoint subsets is exhibited where, roughly speaking, one should apply a gradual action or an impulsive action correspondingly to obtain an optimal or ε-optimal strategy. An interesting consequence of our previous results is as follows: the set of strategies that allow interventions at time t = 0 and only immediately after natural jumps is a sufficient set for the control problem under consideration.

Download Full-text