A Modified Policy Iteration Algorithm for Discounted Reward Markov Decision Processes

We study thepolicy iteration algorithm(PIA) for continuous-time jump Markov decision processes in general state and action spaces. The corresponding transition rates are allowed to beunbounded, and the reward rates may haveneither upper nor lower bounds. The criterion that we are concerned with isexpected average reward. We propose a set of conditions under which we first establish the average reward optimality equation and present the PIA. Then under twoslightlydifferent sets of conditions we show that the PIA yields the optimal (maximum) reward, an average optimal stationary policy, and a solution to the average reward optimality equation.

Download Full-text

CONVERGENCE OF SIMULATION-BASED POLICY ITERATION

Probability in the Engineering and Informational Sciences ◽

10.1017/s0269964803172051 ◽

2003 ◽

Vol 17 (2) ◽

pp. 213-234 ◽

Cited By ~ 18

Author(s):

William L. Cooper ◽

Shane G. Henderson ◽

Mark E. Lewis

Keyword(s):

Markov Decision Processes ◽

Decision Rules ◽

Almost Sure Convergence ◽

Policy Iteration ◽

Decision Processes ◽

Optimal Decision ◽

Iteration Algorithm ◽

Simulation Based ◽

Markov Decision ◽

Run Lengths

Simulation-based policy iteration (SBPI) is a modification of the policy iteration algorithm for computing optimal policies for Markov decision processes. At each iteration, rather than solving the average evaluation equations, SBPI employs simulation to estimate a solution to these equations. For recurrent average-reward Markov decision processes with finite state and action spaces, we provide easily verifiable conditions that ensure that simulation-based policy iteration almost-surely eventually never leaves the set of optimal decision rules. We analyze three simulation estimators for solutions to the average evaluation equations. Using our general results, we derive simple conditions on the simulation run lengths that guarantee the almost-sure convergence of the algorithm.

Download Full-text

Policy iteration for parameterized Markov decision processes and its application

2013 9th Asian Control Conference (ASCC) ◽

10.1109/ascc.2013.6606023 ◽

2013 ◽

Cited By ~ 2

Author(s):

Li Xia ◽

Qing-Shan Jia

Keyword(s):

Markov Decision Processes ◽

Policy Iteration ◽

Decision Processes ◽

Markov Decision

Download Full-text

Approximate Policy Iteration for Markov Decision Processes via Quantitative Adaptive Aggregations

Automated Technology for Verification and Analysis - Lecture Notes in Computer Science ◽

10.1007/978-3-319-46520-3_2 ◽

2016 ◽

pp. 13-31 ◽

Cited By ~ 2

Author(s):

Alessandro Abate ◽

Milan Češka ◽

Marta Kwiatkowska

Keyword(s):

Markov Decision Processes ◽

Policy Iteration ◽

Decision Processes ◽

Markov Decision ◽

Approximate Policy Iteration

Download Full-text

Accelerating Procedures of the Value Iteration Algorithm for Discounted Markov Decision Processes, Based on a One-Step Lookahead Analysis

Operations Research ◽

10.1287/opre.42.5.940 ◽

1994 ◽

Vol 42 (5) ◽

pp. 940-946 ◽

Cited By ~ 10

Author(s):

Meir Herzberg ◽

Uri Yechiali

Keyword(s):

Markov Decision Processes ◽

Decision Processes ◽

Iteration Algorithm ◽

Value Iteration ◽

Markov Decision ◽

One Step ◽

Value Iteration Algorithm

Download Full-text

A Modified Value Iteration Algorithm for Discounted Markov Decision Processes

Journal of Electronic Commerce in Organizations ◽

10.4018/jeco.2015070104 ◽

2015 ◽

Vol 13 (3) ◽

pp. 47-57 ◽

Cited By ~ 1

Author(s):

Sanaa Chafik ◽

Cherki Daoui

Keyword(s):

Markov Decision Processes ◽

Decision Processes ◽

Iteration Algorithm ◽

Value Iteration ◽

Decomposition Technique ◽

Artificial Data ◽

Markov Decision ◽

Speed Up ◽

Value Iteration Algorithm

As many real applications need a large amount of states, the classical methods are intractable for solving large Markov Decision Processes. The decomposition technique basing on the topology of each state in the associated graph and the parallelization technique are very useful methods to cope with this problem. In this paper, the authors propose a Modified Value Iteration algorithm, adding the parallelism technique. They test their implementation on artificial data using an Open MP that offers a significant speed-up.

Download Full-text

Policy iteration for robust nonstationary Markov decision processes

Optimization Letters ◽

10.1007/s11590-016-1040-6 ◽

2016 ◽

Vol 10 (8) ◽

pp. 1613-1628 ◽

Cited By ~ 3

Author(s):

Saumya Sinha ◽

Archis Ghate

Keyword(s):

Markov Decision Processes ◽

Policy Iteration ◽

Decision Processes ◽

Markov Decision

Download Full-text