Sensitivity Analysis in Markov Decision Processes with Uncertain Reward Parameters

Sequential decision problems can often be modeled as Markov decision processes. Classical solution approaches assume that the parameters of the model are known. However, model parameters are usually estimated and uncertain in practice. As a result, managers are often interested in how estimation errors affect the optimal solution. In this paper we illustrate how sensitivity analysis can be performed directly for a Markov decision process with uncertain reward parameters using the Bellman equations. In particular, we consider problems involving (i) a single stationary parameter, (ii) multiple stationary parameters, and (iii) multiple nonstationary parameters. We illustrate the applicability of this work through a capacitated stochastic lot-sizing problem.

Download Full-text

Temporal concatenation for Markov decision processes

Probability in the Engineering and Informational Sciences ◽

10.1017/s0269964821000206 ◽

2021 ◽

pp. 1-28

Author(s):

Ruiyang Song ◽

Kuang Xu

Keyword(s):

Markov Decision Processes ◽

Large Scale ◽

Optimal Solution ◽

Upper Bounds ◽

Black Box ◽

Decision Processes ◽

Optimal Solutions ◽

Wide Range ◽

Markov Decision ◽

Speed Up

We propose and analyze a temporal concatenation heuristic for solving large-scale finite-horizon Markov decision processes (MDP), which divides the MDP into smaller sub-problems along the time horizon and generates an overall solution by simply concatenating the optimal solutions from these sub-problems. As a “black box” architecture, temporal concatenation works with a wide range of existing MDP algorithms. Our main results characterize the regret of temporal concatenation compared to the optimal solution. We provide upper bounds for general MDP instances, as well as a family of MDP instances in which the upper bounds are shown to be tight. Together, our results demonstrate temporal concatenation's potential of substantial speed-up at the expense of some performance degradation.

Download Full-text

Performance sensitivity analysis and optimization for a class of countable semi-Markov decision processes

2011 9th World Congress on Intelligent Control and Automation ◽

10.1109/wcica.2011.5970625 ◽

2011 ◽

Author(s):

Yu Kang ◽

Baoqun Yin ◽

Weike Shang ◽

Hongsheng Xi

Keyword(s):

Sensitivity Analysis ◽

Markov Decision Processes ◽

Decision Processes ◽

Performance Sensitivity ◽

Markov Decision

Download Full-text

The Expected Total Cost Criterion for Markov Decision Processes under Constraints: A Convex Analytic Approach

Advances in Applied Probability ◽

10.1239/aap/1346955264 ◽

2012 ◽

Vol 44 (3) ◽

pp. 774-793 ◽

Cited By ~ 4

Author(s):

François Dufour ◽

M. Horiguchi ◽

A. B. Piunovskiy

Keyword(s):

Optimal Control ◽

Markov Decision Processes ◽

Optimal Solution ◽

Decision Processes ◽

Linear Program ◽

Occupation Measure ◽

Stationary Policy ◽

Total Cost ◽

Markov Decision ◽

Expected Total Cost

This paper deals with discrete-time Markov decision processes (MDPs) under constraints where all the objectives have the same form of expected total cost over the infinite time horizon. The existence of an optimal control policy is discussed by using the convex analytic approach. We work under the assumptions that the state and action spaces are general Borel spaces, and that the model is nonnegative, semicontinuous, and there exists an admissible solution with finite cost for the associated linear program. It is worth noting that, in contrast to the classical results in the literature, our hypotheses do not require the MDP to be transient or absorbing. Our first result ensures the existence of an optimal solution to the linear program given by an occupation measure of the process generated by a randomized stationary policy. Moreover, it is shown that this randomized stationary policy provides an optimal solution to this Markov control problem. As a consequence, these results imply that the set of randomized stationary policies is a sufficient set for this optimal control problem. Finally, our last main result states that all optimal solutions of the linear program coincide on a special set with an optimal occupation measure generated by a randomized stationary policy. Several examples are presented to illustrate some theoretical issues and the possible applications of the results developed in the paper.

Download Full-text

Enforcing Almost-Sure Reachability in POMDPs

Computer Aided Verification - Lecture Notes in Computer Science ◽

10.1007/978-3-030-81688-9_28 ◽

2021 ◽

pp. 602-625

Author(s):

Sebastian Junges ◽

Nils Jansen ◽

Sanjit A. Seshia

Keyword(s):

Markov Decision Processes ◽

Empirical Evaluation ◽

Decision Processes ◽

Limited Information ◽

Sequential Decision ◽

Goal State ◽

Learning Agent ◽

Markov Decision ◽

System Configurations ◽

Partially Observable

AbstractPartially-Observable Markov Decision Processes (POMDPs) are a well-known stochastic model for sequential decision making under limited information. We consider the EXPTIME-hard problem of synthesising policies that almost-surely reach some goal state without ever visiting a bad state. In particular, we are interested in computing the winning region, that is, the set of system configurations from which a policy exists that satisfies the reachability specification. A direct application of such a winning region is the safe exploration of POMDPs by, for instance, restricting the behavior of a reinforcement learning agent to the region. We present two algorithms: A novel SAT-based iterative approach and a decision-diagram based alternative. The empirical evaluation demonstrates the feasibility and efficacy of the approaches.

Download Full-text

The Expected Total Cost Criterion for Markov Decision Processes under Constraints

Advances in Applied Probability ◽

10.1239/aap/1377868541 ◽

2013 ◽

Vol 45 (3) ◽

pp. 837-859 ◽

Cited By ~ 5

Author(s):

François Dufour ◽

A. B. Piunovskiy

Keyword(s):

Markov Decision Processes ◽

Optimal Solution ◽

Decision Processes ◽

Linear Program ◽

Programming Approach ◽

Stationary Policy ◽

Total Cost ◽

Optimal Value ◽

Markov Decision ◽

Expected Total Cost

In this work, we study discrete-time Markov decision processes (MDPs) with constraints when all the objectives have the same form of expected total cost over the infinite time horizon. Our objective is to analyze this problem by using the linear programming approach. Under some technical hypotheses, it is shown that if there exists an optimal solution for the associated linear program then there exists a randomized stationary policy which is optimal for the MDP, and that the optimal value of the linear program coincides with the optimal value of the constrained control problem. A second important result states that the set of randomized stationary policies provides a sufficient set for solving this MDP. It is important to note that, in contrast with the classical results of the literature, we do not assume the MDP to be transient or absorbing. More importantly, we do not impose the cost functions to be nonnegative or to be bounded below. Several examples are presented to illustrate our results.

Download Full-text

Polynomial Time Algorithms for Branching Markov Decision Processes and Probabilistic Min(Max) Polynomial Bellman Equations

Automata, Languages, and Programming - Lecture Notes in Computer Science ◽

10.1007/978-3-642-31594-7_27 ◽

2012 ◽

pp. 314-326 ◽

Cited By ~ 5

Author(s):

Kousha Etessami ◽

Alistair Stewart ◽

Mihalis Yannakakis

Keyword(s):

Markov Decision Processes ◽

Polynomial Time ◽

Decision Processes ◽

Bellman Equations ◽

Polynomial Time Algorithms ◽

Markov Decision

Download Full-text

Quantile Markov Decision Processes

Operations Research ◽

10.1287/opre.2021.2123 ◽

2021 ◽

Author(s):

Xiaocheng Li ◽

Huaiyang Zhong ◽

Margaret L. Brandeau

Keyword(s):

Markov Decision Process ◽

Markov Decision Processes ◽

Decision Process ◽

Value At Risk ◽

Infinite Horizon ◽

Decision Processes ◽

Conditional Value At Risk ◽

Sequential Decision ◽

Optimal Drug ◽

Markov Decision

Title: Sequential Decision Making Using Quantiles The goal of a traditional Markov decision process (MDP) is to maximize the expectation of cumulative reward over a finite or infinite horizon. In many applications, however, a decision maker may be interested in optimizing a specific quantile of the cumulative reward. For example, a physician may want to determine the optimal drug regime for a risk-averse patient with the objective of maximizing the 0.10 quantile of the cumulative reward; this is the cumulative improvement in health that is expected to occur with at least 90% probability for the patient. In “Quantile Markov Decision Processes,” X. Li, H. Zhong, and M. Brandeau provide analytic results to solve the quantile Markov decision process (QMDP) problem. They develop an efficient dynamic programming procedure that finds the optimal QMDP value function for all states and quantiles in one pass. The algorithm also extends to the MDP problem with a conditional value-at-risk objective.

Download Full-text

Partially Observable Markov Decision Processes and Performance Sensitivity Analysis

IEEE Transactions on Systems Man and Cybernetics Part B (Cybernetics) ◽

10.1109/tsmcb.2008.927711 ◽

2008 ◽

Vol 38 (6) ◽

pp. 1645-1651 ◽

Cited By ~ 9

Author(s):

Yanjie Li ◽

Baoqun Yin ◽

Hongsheng Xi

Keyword(s):

Sensitivity Analysis ◽

Markov Decision Processes ◽

Decision Processes ◽

Performance Sensitivity ◽

Markov Decision ◽

Partially Observable Markov ◽

And Performance ◽

Partially Observable

Download Full-text

The Expected Total Cost Criterion for Markov Decision Processes under Constraints: A Convex Analytic Approach

Advances in Applied Probability ◽

10.1017/s0001867800005875 ◽

2012 ◽

Vol 44 (03) ◽

pp. 774-793 ◽

Cited By ~ 4

Author(s):

François Dufour ◽

M. Horiguchi ◽

A. B. Piunovskiy

Keyword(s):

Optimal Control ◽

Markov Decision Processes ◽

Optimal Solution ◽

Decision Processes ◽

Linear Program ◽

Occupation Measure ◽

Stationary Policy ◽

Total Cost ◽

Markov Decision ◽

Expected Total Cost

This paper deals with discrete-time Markov decision processes (MDPs) under constraints where all the objectives have the same form of expected total cost over the infinite time horizon. The existence of an optimal control policy is discussed by using the convex analytic approach. We work under the assumptions that the state and action spaces are general Borel spaces, and that the model is nonnegative, semicontinuous, and there exists an admissible solution with finite cost for the associated linear program. It is worth noting that, in contrast to the classical results in the literature, our hypotheses do not require the MDP to be transient or absorbing. Our first result ensures the existence of an optimal solution to the linear program given by an occupation measure of the process generated by a randomized stationary policy. Moreover, it is shown that this randomized stationary policy provides an optimal solution to this Markov control problem. As a consequence, these results imply that the set of randomized stationary policies is a sufficient set for this optimal control problem. Finally, our last main result states that all optimal solutions of the linear program coincide on a special set with an optimal occupation measure generated by a randomized stationary policy. Several examples are presented to illustrate some theoretical issues and the possible applications of the results developed in the paper.

Download Full-text