The policy iteration algorithm for average reward Markov decision processes with general state space

We study thepolicy iteration algorithm(PIA) for continuous-time jump Markov decision processes in general state and action spaces. The corresponding transition rates are allowed to beunbounded, and the reward rates may haveneither upper nor lower bounds. The criterion that we are concerned with isexpected average reward. We propose a set of conditions under which we first establish the average reward optimality equation and present the PIA. Then under twoslightlydifferent sets of conditions we show that the PIA yields the optimal (maximum) reward, an average optimal stationary policy, and a solution to the average reward optimality equation.

Download Full-text

Singular control for discounted Markov Decision Processes in a general state space

IEEE Conference on Decision and Control and European Control Conference ◽

10.1109/cdc.2011.6160377 ◽

2011 ◽

Author(s):

O.L.V. Costa ◽

F. Dufour

Keyword(s):

State Space ◽

Markov Decision Processes ◽

Singular Control ◽

Decision Processes ◽

General State ◽

Markov Decision ◽

General State Space

Download Full-text

A policy iteration algorithm for Markov decision processes skip-free in one direction

Proceedings of the 2nd International ICST Conference on Performance Evaluation Methodologies and Tools ◽

10.4108/smctools.2007.1948 ◽

2007 ◽

Cited By ~ 3

Author(s):

J. Lambert ◽

B. Van Houdt ◽

C. Blondia

Keyword(s):

Markov Decision Processes ◽

Policy Iteration ◽

Decision Processes ◽

Iteration Algorithm ◽

Markov Decision ◽

Policy Iteration Algorithm

Download Full-text

Approximation of Markov decision processes with general state space

Journal of Mathematical Analysis and Applications ◽

10.1016/j.jmaa.2011.11.015 ◽

2012 ◽

Vol 388 (2) ◽

pp. 1254-1267 ◽

Cited By ~ 23

Author(s):

F. Dufour ◽

T. Prieto-Rumeau

Keyword(s):

State Space ◽

Markov Decision Processes ◽

Decision Processes ◽

General State ◽

Markov Decision ◽

General State Space

Download Full-text

Continuous-time zero-sum games for markov decision processes with discounted risk-sensitive cost criterion on a general state space

Stochastic Analysis and Applications ◽

10.1080/07362994.2021.2013889 ◽

2021 ◽

pp. 1-31

Author(s):

Subrata Golui ◽

Chandan Pal

Keyword(s):

Markov Decision Processes ◽

Continuous Time ◽

Decision Processes ◽

General State ◽

Zero Sum Games ◽

Cost Criterion ◽

Risk Sensitive ◽

Markov Decision ◽

General State Space ◽

Zero Sum

Download Full-text

VECTOR-VALUED MARKOV DECISION PROCESSES WITH AVERAGE REWARD CRITERION: THE MULTICHAIN CASE

Probability in the Engineering and Informational Sciences ◽

10.1017/s0269964800144092 ◽

2000 ◽

Vol 14 (4) ◽

pp. 533-548

Author(s):

Kazuyoshi Wakuta

Keyword(s):

Decision Process ◽

Decision Processes ◽

Iteration Algorithm ◽

Average Reward ◽

Markov Decision ◽

Policy Iteration Algorithm ◽

Average Reward Criterion ◽

Systems Of Linear Inequalities ◽

Vector Valued ◽

Reward Criterion

We study the multichain case of a vector-valued Markov decision process with average reward criterion. We characterize optimal deterministic stationary policies via systems of linear inequalities and discuss a policy iteration algorithm for finding all optimal deterministic stationary policies.

Download Full-text

CONVERGENCE OF SIMULATION-BASED POLICY ITERATION

Probability in the Engineering and Informational Sciences ◽

10.1017/s0269964803172051 ◽

2003 ◽

Vol 17 (2) ◽

pp. 213-234 ◽

Cited By ~ 18

Author(s):

William L. Cooper ◽

Shane G. Henderson ◽

Mark E. Lewis

Keyword(s):

Markov Decision Processes ◽

Decision Rules ◽

Almost Sure Convergence ◽

Policy Iteration ◽

Decision Processes ◽

Optimal Decision ◽

Iteration Algorithm ◽

Simulation Based ◽

Markov Decision ◽

Run Lengths

Simulation-based policy iteration (SBPI) is a modification of the policy iteration algorithm for computing optimal policies for Markov decision processes. At each iteration, rather than solving the average evaluation equations, SBPI employs simulation to estimate a solution to these equations. For recurrent average-reward Markov decision processes with finite state and action spaces, we provide easily verifiable conditions that ensure that simulation-based policy iteration almost-surely eventually never leaves the set of optimal decision rules. We analyze three simulation estimators for solutions to the average evaluation equations. Using our general results, we derive simple conditions on the simulation run lengths that guarantee the almost-sure convergence of the algorithm.

Download Full-text

A unified approach to adaptive control of average reward Markov decision processes

OR Spectrum ◽

10.1007/bf01740510 ◽

1988 ◽

Vol 10 (3) ◽

pp. 161-166 ◽

Cited By ~ 5

Author(s):

G. Hübner

Keyword(s):

Adaptive Control ◽

Markov Decision Processes ◽

Decision Processes ◽

Average Reward ◽

Unified Approach ◽

Markov Decision

Download Full-text