Semi-Markov decision processes with limiting ratio average rewards

Uniformization permits the replacement of a semi-Markov decision process (SMDP) by a Markov chain exhibiting the same average rewards for simple (non-randomized) policies. It is shown that various anomalies may occur, especially for stationary (randomized) policies; uniformization introduces virtual jumps with concomitant action changes not present in the original process. Since these lead to discrepancies in the average rewards for stationary processes, uniformization can be accepted as valid only for simple policies. We generalize uniformization to yield consistent results for stationary policies also. These results are applied to constrained optimization of SMDP, in which stationary (randomized) policies appear naturally. The structure of optimal constrained SMDP policies can then be elucidated by studying the corresponding controlled Markov chains. Moreover, constrained SMDP optimal policy computations can be more easily implemented in discrete time, the generalized uniformization being employed to relate discrete- and continuous-time optimal constrained policies.

Download Full-text

Uniformization for semi-Markov decision processes under stationary policies

Journal of Applied Probability ◽

10.2307/3214096 ◽

1987 ◽

Vol 24 (3) ◽

pp. 644-656 ◽

Cited By ~ 16

Author(s):

Frederick J. Beutler ◽

Keith W. Ross

Keyword(s):

Markov Decision Processes ◽

Continuous Time ◽

Decision Process ◽

Decision Processes ◽

Stationary Processes ◽

Original Process ◽

Markov Decision ◽

Time Optimal ◽

Randomized Policies ◽

Average Rewards

Uniformization permits the replacement of a semi-Markov decision process (SMDP) by a Markov chain exhibiting the same average rewards for simple (non-randomized) policies. It is shown that various anomalies may occur, especially for stationary (randomized) policies; uniformization introduces virtual jumps with concomitant action changes not present in the original process. Since these lead to discrepancies in the average rewards for stationary processes, uniformization can be accepted as valid only for simple policies.We generalize uniformization to yield consistent results for stationary policies also. These results are applied to constrained optimization of SMDP, in which stationary (randomized) policies appear naturally. The structure of optimal constrained SMDP policies can then be elucidated by studying the corresponding controlled Markov chains. Moreover, constrained SMDP optimal policy computations can be more easily implemented in discrete time, the generalized uniformization being employed to relate discrete- and continuous-time optimal constrained policies.

Download Full-text

Constrained Semi-Markov decision processes with average rewards

Mathematical Methods of Operations Research ◽

10.1007/bf01435458 ◽

1994 ◽

Vol 39 (3) ◽

pp. 257-288 ◽

Cited By ~ 56

Author(s):

Eugene A. Feinberg

Keyword(s):

Markov Decision Processes ◽

Decision Processes ◽

Markov Decision ◽

Average Rewards

Download Full-text

Learning Control of Dynamical Systems Based on Markov Decision Processes: Research Frontiers and Outlooks

ACTA AUTOMATICA SINICA ◽

10.3724/sp.j.1004.2012.00673 ◽

2012 ◽

Vol 38 (5) ◽

pp. 673-687 ◽

Cited By ~ 1

Author(s):

Xin XU ◽

Dong SHEN ◽

Yan-Qing GAO ◽

Kai WANG

Keyword(s):

Dynamical Systems ◽

Markov Decision Processes ◽

Learning Control ◽

Decision Processes ◽

Markov Decision ◽

Research Frontiers

Download Full-text

A Framework for Modeling Bounded Rationality: Mis-Specified Bayesian-Markov Decision Processes

SSRN Electronic Journal ◽

10.2139/ssrn.2710475 ◽

2016 ◽

Cited By ~ 1

Author(s):

Ignacio Esponda ◽

Demian Pouzo

Keyword(s):

Bounded Rationality ◽

Markov Decision Processes ◽

Decision Processes ◽

Markov Decision

Download Full-text

A Vector Minimum Superharmonic Approach to Solving Infinite-Horizon Discounted Markov Decision Processes

Journal of the Operational Research Society ◽

10.1038/sj/jors/0431109 ◽

1992 ◽

Vol 43 (11) ◽

pp. 1095-1102

Author(s):

D J White

Keyword(s):

Markov Decision Processes ◽

Infinite Horizon ◽

Decision Processes ◽

Markov Decision

Download Full-text

A Convex Programming Approach for Discrete-Time Markov Decision Processes under the Expected Total Reward Criterion

SIAM Journal on Control and Optimization ◽

10.1137/19m1255811 ◽

2020 ◽

Vol 58 (4) ◽

pp. 2535-2566

Author(s):

François Dufour ◽

Alexandre Genadot

Keyword(s):

Convex Programming ◽

Markov Decision Processes ◽

Discrete Time ◽

Decision Processes ◽

Programming Approach ◽

Total Reward ◽

Markov Decision ◽

Reward Criterion

Download Full-text

Extreme-point solutions in Markov decision processes

Journal of Applied Probability ◽

10.1017/s002190020002413x ◽

1983 ◽

Vol 20 (04) ◽

pp. 835-842

Author(s):

David Assaf

Keyword(s):

Convex Function ◽

Extreme Point ◽

Markov Decision Processes ◽

Convex Functions ◽

Sufficient Conditions ◽

Decision Processes ◽

Markov Decision ◽

Full Solution

The paper presents sufficient conditions for certain functions to be convex. Functions of this type often appear in Markov decision processes, where their maximum is the solution of the problem. Since a convex function takes its maximum at an extreme point, the conditions may greatly simplify a problem. In some cases a full solution may be obtained after the reduction is made. Some illustrative examples are discussed.

Download Full-text