Learning parametric policies and transition probability models of markov decision processes from data

Stochastic Comparative Statics in Markov Decision Processes

Mathematics of Operations Research ◽

10.1287/moor.2020.1086 ◽

2021 ◽

Author(s):

Bar Light

Keyword(s):

Markov Decision Processes ◽

Dynamic Pricing ◽

Optimization Problems ◽

Transition Probability ◽

Random Variable ◽

Comparative Statics ◽

Decision Processes ◽

Optimal Decision ◽

Initial State ◽

Markov Decision

In multiperiod stochastic optimization problems, the future optimal decision is a random variable whose distribution depends on the parameters of the optimization problem. I analyze how the expected value of this random variable changes as a function of the dynamic optimization parameters in the context of Markov decision processes. I call this analysis stochastic comparative statics. I derive both comparative statics results and stochastic comparative statics results showing how the current and future optimal decisions change in response to changes in the single-period payoff function, the discount factor, the initial state of the system, and the transition probability function. I apply my results to various models from the economics and operations research literature, including investment theory, dynamic pricing models, controlled random walks, and comparisons of stationary distributions.

Download Full-text

Controlled Markov set-chains with discounting

Journal of Applied Probability ◽

10.1239/jap/1032192848 ◽

1998 ◽

Vol 35 (2) ◽

pp. 293-302 ◽

Cited By ~ 20

Author(s):

Masami Kurano ◽

Jinjie Song ◽

Masanori Hosaka ◽

Youqiang Huang

Keyword(s):

Partial Order ◽

Markov Decision Processes ◽

Transition Probability ◽

Decision Processes ◽

New Model ◽

Numerical Example ◽

Markov Decision ◽

Theoretical Results

In the framework of discounted Markov decision processes, we consider the case that the transition probability varies in some given domain at each time and its variation is unknown or unobservable.To this end we introduce a new model, named controlled Markov set-chains, based on Markov set-chains, and discuss its optimization under some partial order.Also, a numerical example is given to explain the theoretical results and the computation.

Download Full-text

Controlled Markov set-chains with discounting

Journal of Applied Probability ◽

10.1017/s0021900200014959 ◽

1998 ◽

Vol 35 (02) ◽

pp. 293-302 ◽

Cited By ~ 1

Author(s):

Masami Kurano ◽

Jinjie Song ◽

Masanori Hosaka ◽

Youqiang Huang

Keyword(s):

Partial Order ◽

Markov Decision Processes ◽

Transition Probability ◽

Decision Processes ◽

New Model ◽

Numerical Example ◽

Markov Decision ◽

Theoretical Results

In the framework of discounted Markov decision processes, we consider the case that the transition probability varies in some given domain at each time and its variation is unknown or unobservable. To this end we introduce a new model, named controlled Markov set-chains, based on Markov set-chains, and discuss its optimization under some partial order. Also, a numerical example is given to explain the theoretical results and the computation.

Download Full-text

Learning Control of Dynamical Systems Based on Markov Decision Processes: Research Frontiers and Outlooks

ACTA AUTOMATICA SINICA ◽

10.3724/sp.j.1004.2012.00673 ◽

2012 ◽

Vol 38 (5) ◽

pp. 673-687 ◽

Cited By ~ 1

Author(s):

Xin XU ◽

Dong SHEN ◽

Yan-Qing GAO ◽

Kai WANG

Keyword(s):

Dynamical Systems ◽

Markov Decision Processes ◽

Learning Control ◽

Decision Processes ◽

Markov Decision ◽

Research Frontiers

Download Full-text

A Framework for Modeling Bounded Rationality: Mis-Specified Bayesian-Markov Decision Processes

SSRN Electronic Journal ◽

10.2139/ssrn.2710475 ◽

2016 ◽

Cited By ~ 1

Author(s):

Ignacio Esponda ◽

Demian Pouzo

Keyword(s):

Bounded Rationality ◽

Markov Decision Processes ◽

Decision Processes ◽

Markov Decision

Download Full-text

A Vector Minimum Superharmonic Approach to Solving Infinite-Horizon Discounted Markov Decision Processes

Journal of the Operational Research Society ◽

10.1038/sj/jors/0431109 ◽

1992 ◽

Vol 43 (11) ◽

pp. 1095-1102

Author(s):

D J White

Keyword(s):

Markov Decision Processes ◽

Infinite Horizon ◽

Decision Processes ◽

Markov Decision

Download Full-text

A Convex Programming Approach for Discrete-Time Markov Decision Processes under the Expected Total Reward Criterion

SIAM Journal on Control and Optimization ◽

10.1137/19m1255811 ◽

2020 ◽

Vol 58 (4) ◽

pp. 2535-2566

Author(s):

François Dufour ◽

Alexandre Genadot

Keyword(s):

Convex Programming ◽

Markov Decision Processes ◽

Discrete Time ◽

Decision Processes ◽

Programming Approach ◽

Total Reward ◽

Markov Decision ◽

Reward Criterion

Download Full-text

Extreme-point solutions in Markov decision processes

Journal of Applied Probability ◽

10.1017/s002190020002413x ◽

1983 ◽

Vol 20 (04) ◽

pp. 835-842

Author(s):

David Assaf

Keyword(s):

Convex Function ◽

Extreme Point ◽

Markov Decision Processes ◽

Convex Functions ◽

Sufficient Conditions ◽

Decision Processes ◽

Markov Decision ◽

Full Solution

The paper presents sufficient conditions for certain functions to be convex. Functions of this type often appear in Markov decision processes, where their maximum is the solution of the problem. Since a convex function takes its maximum at an extreme point, the conditions may greatly simplify a problem. In some cases a full solution may be obtained after the reduction is made. Some illustrative examples are discussed.

Download Full-text