scholarly journals First-order sensitivity of the optimal value in a Markov decision model with respect to deviations in the transition probability function

2020 ◽  
Vol 92 (1) ◽  
pp. 165-197 ◽  
Author(s):  
Patrick Kern ◽  
Axel Simroth ◽  
Henryk Zähle

Abstract Markov decision models (MDM) used in practical applications are most often less complex than the underlying ‘true’ MDM. The reduction of model complexity is performed for several reasons. However, it is obviously of interest to know what kind of model reduction is reasonable (in regard to the optimal value) and what kind is not. In this article we propose a way how to address this question. We introduce a sort of derivative of the optimal value as a function of the transition probabilities, which can be used to measure the (first-order) sensitivity of the optimal value w.r.t. changes in the transition probabilities. ‘Differentiability’ is obtained for a fairly broad class of MDMs, and the ‘derivative’ is specified explicitly. Our theoretical findings are illustrated by means of optimization problems in inventory control and mathematical finance.

Author(s):  
Bar Light

In multiperiod stochastic optimization problems, the future optimal decision is a random variable whose distribution depends on the parameters of the optimization problem. I analyze how the expected value of this random variable changes as a function of the dynamic optimization parameters in the context of Markov decision processes. I call this analysis stochastic comparative statics. I derive both comparative statics results and stochastic comparative statics results showing how the current and future optimal decisions change in response to changes in the single-period payoff function, the discount factor, the initial state of the system, and the transition probability function. I apply my results to various models from the economics and operations research literature, including investment theory, dynamic pricing models, controlled random walks, and comparisons of stationary distributions.


1978 ◽  
Vol 15 (2) ◽  
pp. 356-373 ◽  
Author(s):  
A. Federgruen ◽  
H. C. Tijms

This paper is concerned with the optimality equation for the average costs in a denumerable state semi-Markov decision model. It will be shown that under each of a number of recurrency conditions on the transition probability matrices associated with the stationary policies, the optimality equation has a bounded solution. This solution indeed yields a stationary policy which is optimal for a strong version of the average cost optimality criterion. Besides the existence of a bounded solution to the optimality equation, we will show that both the value-iteration method and the policy-iteration method can be used to determine such a solution. For the latter method we will prove that the average costs and the relative cost functions of the policies generated converge to a solution of the optimality equation.


Author(s):  
Daniel Bartl ◽  
Samuel Drapeau ◽  
Jan Obłój ◽  
Johannes Wiesel

We consider sensitivity of a generic stochastic optimization problem to model uncertainty. We take a non-parametric approach and capture model uncertainty using Wasserstein balls around the postulated model. We provide explicit formulae for the first-order correction to both the value function and the optimizer and further extend our results to optimization under linear constraints. We present applications to statistics, machine learning, mathematical finance and uncertainty quantification. In particular, we provide an explicit first-order approximation for square-root LASSO regression coefficients and deduce coefficient shrinkage compared to the ordinary least-squares regression. We consider robustness of call option pricing and deduce a new Black–Scholes sensitivity, a non-parametric version of the so-called Vega. We also compute sensitivities of optimized certainty equivalents in finance and propose measures to quantify robustness of neural networks to adversarial examples.


2018 ◽  
Vol 12 (4) ◽  
pp. 351-360
Author(s):  
Lili Tang

How to get maximal benefit within a range of risk in securities market is a very interesting and widely concerned issue. Meanwhile, as there are many complex factors that affect securities’ activity, such as the risk and uncertainty of the benefit, it is very difficult to establish an appropriate model for investment. Aiming at solving the curse of dimension and model disaster caused by the problem, we use the approximate dynamic programming to set up a Markov decision model for the multi-time segment portfolio with transaction cost. A model-based actor-critic algorithm under uncertain environment is proposed, where the optimal value function is obtained by iteration on the basis of the constrained risk range and a limited number of funds, and the optimal investment of each period is solved by using the dynamic planning of limited number of fund ratio. The experiment indicated that the algorithm could get a stable investment, and the income could grow steadily.


2006 ◽  
Vol 27 ◽  
pp. 153-201 ◽  
Author(s):  
B. Kveton ◽  
M. Hauskrecht ◽  
C. Guestrin

Efficient representations and solutions for large decision problems with continuous and discrete variables are among the most important challenges faced by the designers of automated decision support systems. In this paper, we describe a novel hybrid factored Markov decision process (MDP) model that allows for a compact representation of these problems, and a new hybrid approximate linear programming (HALP) framework that permits their efficient solutions. The central idea of HALP is to approximate the optimal value function by a linear combination of basis functions and optimize its weights by linear programming. We analyze both theoretical and computational aspects of this approach, and demonstrate its scale-up potential on several hybrid optimization problems.


1978 ◽  
Vol 15 (02) ◽  
pp. 356-373 ◽  
Author(s):  
A. Federgruen ◽  
H. C. Tijms

This paper is concerned with the optimality equation for the average costs in a denumerable state semi-Markov decision model. It will be shown that under each of a number of recurrency conditions on the transition probability matrices associated with the stationary policies, the optimality equation has a bounded solution. This solution indeed yields a stationary policy which is optimal for a strong version of the average cost optimality criterion. Besides the existence of a bounded solution to the optimality equation, we will show that both the value-iteration method and the policy-iteration method can be used to determine such a solution. For the latter method we will prove that the average costs and the relative cost functions of the policies generated converge to a solution of the optimality equation.


Author(s):  
Juan Xiong ◽  
Qiyu Fang ◽  
Jialing Chen ◽  
Yingxin Li ◽  
Huiyi Li ◽  
...  

Background: Postpartum depression (PPD) has been recognized as a severe public health problem worldwide due to its high incidence and the detrimental consequences not only for the mother but for the infant and the family. However, the pattern of natural transition trajectories of PPD has rarely been explored. Methods: In this research, a quantitative longitudinal study was conducted to explore the PPD progression process, providing information on the transition probability, hazard ratio, and the mean sojourn time in the three postnatal mental states, namely normal state, mild PPD, and severe PPD. The multi-state Markov model was built based on 912 depression status assessments in 304 Chinese primiparous women over multiple time points of six weeks postpartum, three months postpartum, and six months postpartum. Results: Among the 608 PPD status transitions from one visit to the next visit, 6.2% (38/608) showed deterioration of mental status from the level at the previous visit; while 40.0% (243/608) showed improvement at the next visit. A subject in normal state who does transition then has a probability of 49.8% of worsening to mild PPD, and 50.2% to severe PPD. A subject with mild PPD who does transition has a 20.0% chance of worsening to severe PPD. A subject with severe PPD is more likely to improve to mild PPD than developing to the normal state. On average, the sojourn time in the normal state, mild PPD, and severe PPD was 64.12, 6.29, and 9.37 weeks, respectively. Women in normal state had 6.0%, 8.5%, 8.7%, and 8.8% chances of progress to severe PPD within three months, nine months, one year, and three years, respectively. Increased all kinds of supports were associated with decreased risk of deterioration from normal state to severe PPD (hazard ratio, HR: 0.42–0.65); and increased informational supports, evaluation of support, and maternal age were associated with alleviation from severe PPD to normal state (HR: 1.46–2.27). Conclusions: The PPD state transition probabilities caused more attention and awareness about the regular PPD screening for postnatal women and the timely intervention for women with mild or severe PPD. The preventive actions on PPD should be conducted at the early stages, and three yearly; at least one yearly screening is strongly recommended. Emotional support, material support, informational support, and evaluation of support had significant positive associations with the prevention of PPD progression transitions. The derived transition probabilities and sojourn time can serve as an importance reference for health professionals to make proactive plans and target interventions for PPD.


Sign in / Sign up

Export Citation Format

Share Document