scholarly journals Learning parametric policies and transition probability models of markov decision processes from data

Author(s):  
Tingting Xu ◽  
Henghui Zhu ◽  
Ioannis Ch. Paschalidis
Author(s):  
Bar Light

In multiperiod stochastic optimization problems, the future optimal decision is a random variable whose distribution depends on the parameters of the optimization problem. I analyze how the expected value of this random variable changes as a function of the dynamic optimization parameters in the context of Markov decision processes. I call this analysis stochastic comparative statics. I derive both comparative statics results and stochastic comparative statics results showing how the current and future optimal decisions change in response to changes in the single-period payoff function, the discount factor, the initial state of the system, and the transition probability function. I apply my results to various models from the economics and operations research literature, including investment theory, dynamic pricing models, controlled random walks, and comparisons of stationary distributions.


1998 ◽  
Vol 35 (2) ◽  
pp. 293-302 ◽  
Author(s):  
Masami Kurano ◽  
Jinjie Song ◽  
Masanori Hosaka ◽  
Youqiang Huang

In the framework of discounted Markov decision processes, we consider the case that the transition probability varies in some given domain at each time and its variation is unknown or unobservable.To this end we introduce a new model, named controlled Markov set-chains, based on Markov set-chains, and discuss its optimization under some partial order.Also, a numerical example is given to explain the theoretical results and the computation.


1998 ◽  
Vol 35 (02) ◽  
pp. 293-302 ◽  
Author(s):  
Masami Kurano ◽  
Jinjie Song ◽  
Masanori Hosaka ◽  
Youqiang Huang

In the framework of discounted Markov decision processes, we consider the case that the transition probability varies in some given domain at each time and its variation is unknown or unobservable. To this end we introduce a new model, named controlled Markov set-chains, based on Markov set-chains, and discuss its optimization under some partial order. Also, a numerical example is given to explain the theoretical results and the computation.


1983 ◽  
Vol 20 (04) ◽  
pp. 835-842
Author(s):  
David Assaf

The paper presents sufficient conditions for certain functions to be convex. Functions of this type often appear in Markov decision processes, where their maximum is the solution of the problem. Since a convex function takes its maximum at an extreme point, the conditions may greatly simplify a problem. In some cases a full solution may be obtained after the reduction is made. Some illustrative examples are discussed.


Sign in / Sign up

Export Citation Format

Share Document