On Polynomial Sized MDP Succinct Policies

This paper considers Markov decision processes (MDPs) with unbounded rates, as a function of state. We are especially interested in studying structural properties of optimal policies and the value function. A common method to derive such properties is by value iteration applied to the uniformised MDP. However, due to the unboundedness of the rates, uniformisation is not possible, and so value iteration cannot be applied in the way we need. To circumvent this, one can perturb the MDP. Then we need two results for the perturbed sequence of MDPs: 1. there exists a unique solution to the discounted cost optimality equation for each perturbation as well as for the original MDP; 2. if the perturbed sequence of MDPs converges in a suitable manner then the associated optimal policies and the value function should converge as well. We can model both the MDP and perturbed MDPs as a collection of parametrised Markov processes. Then both of the results above are essentially implied by certain continuity properties of the process as a function of the parameter. In this paper we deduce tight verifiable conditions that imply the necessary continuity properties. The most important of these conditions are drift conditions that are strongly related to nonexplosiveness.

Download Full-text

Finite-horizon optimality for continuous-time Markov decision processes with unbounded transition rates

Advances in Applied Probability ◽

10.1239/aap/1449859800 ◽

2015 ◽

Vol 47 (4) ◽

pp. 1064-1087 ◽

Cited By ~ 7

Author(s):

Xianping Guo ◽

Xiangxiang Huang ◽

Yonghui Huang

Keyword(s):

Markov Decision Processes ◽

Continuous Time ◽

Value Function ◽

Decision Processes ◽

Finite Horizon ◽

Transition Rates ◽

Optimality Equation ◽

Unbounded Transition Rates ◽

Markov Decision ◽

The Value Function

In this paper we focus on the finite-horizon optimality for denumerable continuous-time Markov decision processes, in which the transition and reward/cost rates are allowed to be unbounded, and the optimality is over the class of all randomized history-dependent policies. Under mild reasonable conditions, we first establish the existence of a solution to the finite-horizon optimality equation by designing a technique of approximations from the bounded transition rates to unbounded ones. Then we prove the existence of ε (≥ 0)-optimal Markov policies and verify that the value function is the unique solution to the optimality equation by establishing the analog of the Itô-Dynkin formula. Finally, we provide an example in which the transition rates and the value function are all unbounded and, thus, obtain solutions to some of the unsolved problems by Yushkevich (1978).

Download Full-text

Finite-horizon optimality for continuous-time Markov decision processes with unbounded transition rates

Advances in Applied Probability ◽

10.1017/s0001867800049016 ◽

2015 ◽

Vol 47 (04) ◽

pp. 1064-1087 ◽

Cited By ~ 4

Author(s):

Xianping Guo ◽

Xiangxiang Huang ◽

Yonghui Huang

Keyword(s):

Markov Decision Processes ◽

Continuous Time ◽

Value Function ◽

Decision Processes ◽

Finite Horizon ◽

Transition Rates ◽

Optimality Equation ◽

Unbounded Transition Rates ◽

Markov Decision ◽

The Value Function

In this paper we focus on the finite-horizon optimality for denumerable continuous-time Markov decision processes, in which the transition and reward/cost rates are allowed to be unbounded, and the optimality is over the class of all randomized history-dependent policies. Under mild reasonable conditions, we first establish the existence of a solution to the finite-horizon optimality equation by designing a technique of approximations from the bounded transition rates to unbounded ones. Then we prove the existence of ε (≥ 0)-optimal Markov policies and verify that the value function is the unique solution to the optimality equation by establishing the analog of the Itô-Dynkin formula. Finally, we provide an example in which the transition rates and the value function are all unbounded and, thus, obtain solutions to some of the unsolved problems by Yushkevich (1978).

Download Full-text

Countable state Markov decision processes with unbounded jump rates and discounted cost: optimality equation and approximations

Advances in Applied Probability ◽

10.1017/s0001867800049028 ◽

2015 ◽

Vol 47 (04) ◽

pp. 1088-1107 ◽

Cited By ~ 2

Author(s):

H. Blok ◽

F. M. Spieksma

Keyword(s):

Markov Decision Processes ◽

Value Function ◽

Decision Processes ◽

Value Iteration ◽

Discounted Cost ◽

Continuity Properties ◽

Optimal Policies ◽

Markov Decision ◽

The Value Function ◽

Cost Optimality

This paper considers Markov decision processes (MDPs) with unbounded rates, as a function of state. We are especially interested in studying structural properties of optimal policies and the value function. A common method to derive such properties is by value iteration applied to the uniformised MDP. However, due to the unboundedness of the rates, uniformisation is not possible, and so value iteration cannot be applied in the way we need. To circumvent this, one can perturb the MDP. Then we need two results for the perturbed sequence of MDPs: 1. there exists a unique solution to the discounted cost optimality equation for each perturbation as well as for the original MDP; 2. if the perturbed sequence of MDPs converges in a suitable manner then the associated optimal policies and the value function should converge as well. We can model both the MDP and perturbed MDPs as a collection of parametrised Markov processes. Then both of the results above are essentially implied by certain continuity properties of the process as a function of the parameter. In this paper we deduce tight verifiable conditions that imply the necessary continuity properties. The most important of these conditions are drift conditions that are strongly related to nonexplosiveness.

Download Full-text

Simulation‐based Uniform Value Function Estimates of Markov Decision Processes

SIAM Journal on Control and Optimization ◽

10.1137/040619508 ◽

2006 ◽

Vol 45 (5) ◽

pp. 1633-1656 ◽

Cited By ~ 13

Author(s):

Rahul Jain ◽

Pravin P. Varaiya

Keyword(s):

Markov Decision Processes ◽

Value Function ◽

Decision Processes ◽

Uniform Value ◽

Simulation Based ◽

Markov Decision

Download Full-text

Exact Decomposition Approaches for Markov Decision Processes: A Survey

Advances in Operations Research ◽

10.1155/2010/659432 ◽

2010 ◽

Vol 2010 ◽

pp. 1-19 ◽

Cited By ~ 4

Author(s):

Cherki Daoui ◽

Mohamed Abbad ◽

Mohamed Tkiouat

Keyword(s):

Markov Decision Processes ◽

Optimal Strategies ◽

Decision Processes ◽

Divide And Conquer ◽

The Past ◽

Large State Space ◽

Aggregation Techniques ◽

Markov Decision ◽

Pros And Cons ◽

Special Case

As classical methods are intractable for solving Markov decision processes (MDPs) requiring a large state space, decomposition and aggregation techniques are very useful to cope with large problems. These techniques are in general a special case of the classic Divide-and-Conquer framework to split a large, unwieldy problem into smaller components and solving the parts in order to construct the global solution. This paper reviews most of decomposition approaches encountered in the associated literature over the past two decades, weighing their pros and cons. We consider several categories of MDPs (average, discounted, and weighted MDPs), and we present briefly a variety of methodologies to find or approximate optimal strategies.

Download Full-text

Value Function Discovery in Markov Decision Processes With Evolutionary Algorithms

IEEE Transactions on Systems Man and Cybernetics Systems ◽

10.1109/tsmc.2015.2475716 ◽

2016 ◽

Vol 46 (9) ◽

pp. 1190-1201 ◽

Cited By ~ 5

Author(s):

Martijn Onderwater ◽

Sandjai Bhulai ◽

Rob van der Mei

Keyword(s):

Evolutionary Algorithms ◽

Markov Decision Processes ◽

Value Function ◽

Decision Processes ◽

Markov Decision ◽

Function Discovery

Download Full-text

An analysis of transient Markov decision processes

Journal of Applied Probability ◽

10.1239/jap/1158784933 ◽

2006 ◽

Vol 43 (3) ◽

pp. 603-621 ◽

Cited By ~ 5

Author(s):

Huw W. James ◽

E. J. Collins

Keyword(s):

Markov Decision Processes ◽

Value Function ◽

Decision Processes ◽

Optimal Value Function ◽

Natural Form ◽

Convergence Results ◽

Optimal Value ◽

Finite State ◽

Markov Decision ◽

Bounded Below

This paper is concerned with the analysis of Markov decision processes in which a natural form of termination ensures that the expected future costs are bounded, at least under some policies. Whereas most previous analyses have restricted attention to the case where the set of states is finite, this paper analyses the case where the set of states is not necessarily finite or even countable. It is shown that all the existence, uniqueness, and convergence results of the finite-state case hold when the set of states is a general Borel space, provided we make the additional assumption that the optimal value function is bounded below. We give a sufficient condition for the optimal value function to be bounded below which holds, in particular, if the set of states is countable.

Download Full-text

Task Scoping for Efficient Planning in Open Worlds (Student Abstract)

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i10.7195 ◽

2020 ◽

Vol 34 (10) ◽

pp. 13845-13846

Author(s):

Nishanth Kumar ◽

Michael Fishman ◽

Natasha Danas ◽

Stefanie Tellex ◽

Michael Littman ◽

...

Keyword(s):

Markov Decision Processes ◽

Value Function ◽

Decision Processes ◽

Initial State ◽

Open World ◽

Optimal Value ◽

Markov Decision ◽

Efficient Planning ◽

Action Spaces ◽

Action Variables

We propose an abstraction method for open-world environments expressed as Factored Markov Decision Processes (FMDPs) with very large state and action spaces. Our method prunes state and action variables that are irrelevant to the optimal value function on the state subspace the agent would visit when following any optimal policy from the initial state. This method thus enables tractable fast planning within large open-world FMDPs.

Download Full-text