Theory of Infinite Horizon Markov Decision Processes

Author(s):  
Nicole Bäuerle ◽  
Ulrich Rieder
2017 ◽  
Vol 26 (03) ◽  
pp. 1760014
Author(s):  
Paul Weng ◽  
Olivier Spanjaard

Markov decision processes (MDP) have become one of the standard models for decisiontheoretic planning problems under uncertainty. In its standard form, rewards are assumed to be numerical additive scalars. In this paper, we propose a generalization of this model allowing rewards to be functional. The value of a history is recursively computed by composing the reward functions. We show that several variants of MDPs presented in the literature can be instantiated in this setting. We then identify sufficient conditions on these reward functions for dynamic programming to be valid. We also discuss the infinite horizon case and the case where a maximum operator does not exist. In order to show the potential of our framework, we conclude the paper by presenting several illustrative examples.


Sign in / Sign up

Export Citation Format

Share Document