Quantile Markov Decision Processes

Decision Process ◽

Value At Risk ◽

Infinite Horizon ◽

Decision Processes ◽

Sequential Decision ◽

Optimal Drug ◽

Title: Sequential Decision Making Using Quantiles The goal of a traditional Markov decision process (MDP) is to maximize the expectation of cumulative reward over a finite or infinite horizon. In many applications, however, a decision maker may be interested in optimizing a specific quantile of the cumulative reward. For example, a physician may want to determine the optimal drug regime for a risk-averse patient with the objective of maximizing the 0.10 quantile of the cumulative reward; this is the cumulative improvement in health that is expected to occur with at least 90% probability for the patient. In “Quantile Markov Decision Processes,” X. Li, H. Zhong, and M. Brandeau provide analytic results to solve the quantile Markov decision process (QMDP) problem. They develop an efficient dynamic programming procedure that finds the optimal QMDP value function for all states and quantiles in one pass. The algorithm also extends to the MDP problem with a conditional value-at-risk objective.

Algoritmo Exato de Avaliação de uma Política Estacionária para CVaR MDP

10.5753/eniac.2019.9341 ◽

2019 ◽

Author(s):

Denis Pais ◽

Valdinei Freire ◽

Karina Valdivia-Delgado

Keyword(s):

At Risk ◽

Value At Risk ◽

Decision Processes ◽

Processos de decisão Markovianos (Markov Decision Processes -- MDPs) são amplamente utilizados para resolver problemas de tomada de decisão sequencial. O critério de desempenho mais utilizado em MDPs é a minimização do custo total esperado. Porém, esta abordagem não leva em consideração flutuações em torno da média, o que pode afetar significativamente o desempenho geral do processo. MDPs que lidam com esse tipo de problema são chamados de MDPs sensíveis a risco. Um tipo especial de MDP sensível a risco é o CVaR MDP, que inclui a métrica CVaR (Conditional-Value-at-Risk) comumente utilizada na área financeira. Um algoritmo que encontra a política ótima para CVaR MDPs é o algoritmo de Iteração de Valor com Interpolação Linear chamado CVaRVILI. O algoritmo CVaRVILI precisa resolver problemas de programação linear várias vezes, o que faz com que o algoritmo tenha um alto custo computacional. Neste trabalho, é proposto um algoritmo que avalia uma política estacionário para CVaR MDPs de custo constante e que não precisa resolver problemas de programação linear, esse algoritmo é chamado de PECVaR. Além disso, foram realizados experimentos usando o custo total esperado e o custo usando o algoritmo PECVaR de uma política neutra para inicializar o algoritmo CVaRVILI. Os resultados mostram que utilizando essas inicializações é possível diminuir o tempo de convergência do CVaRVILI na maioria dos casos.

The variance of discounted Markov decision processes

10.2307/3213832 ◽

1982 ◽

Vol 19 (4) ◽

pp. 794-802 ◽

Cited By ~ 66

Author(s):

Matthew J. Sobel

Keyword(s):

Markov Decision Process ◽

Decision Process ◽

Decision Processes ◽

Higher Moments ◽

Present Value ◽

Short Discussion ◽

Variance Formula ◽

Markov Decision ◽

The Mean

Formulae are presented for the variance and higher moments of the present value of single-stage rewards in a finite Markov decision process. Similar formulae are exhibited for a semi-Markov decision process. There is a short discussion of the obstacles to using the variance formula in algorithms to maximize the mean minus a multiple of the standard deviation.

Discounted Cost Markov Decision Processes with a Constraint

Probability in the Engineering and Informational Sciences ◽

10.1017/s0269964800005131 ◽

1998 ◽

Vol 12 (2) ◽

pp. 177-187 ◽

Cited By ~ 3

Author(s):

Kazuyoshi Wakuta

Keyword(s):

Markov Decision Process ◽

Decision Process ◽

Decision Processes ◽

Discounted Cost ◽

Markov Decision ◽

Vector Valued

We consider a discounted cost Markov decision process with a constraint. Relating this to a vector-valued Markov decision process, we prove that there exists a constrained optimal randomized semistationary policy if there exists at least one policy satisfying a constraint. Moreover, we present an algorithm by which we can find the constrained optimal randomized semistationary policy, or we can discover that there exist no policies satisfying a given constraint.

Adaptive control of M/M/1 queues—continuous-time Markov decision process approach

10.1017/s0021900200023512 ◽

1983 ◽

Vol 20 (02) ◽

pp. 368-379

Author(s):

Lam Yeh ◽

L. C. Thomas

Keyword(s):

Adaptive Control ◽

Markov Decision Process ◽

Optimal Policy ◽

Continuous Time ◽

Decision Process ◽

Process Approach ◽

Decision Processes ◽

Markov Decision ◽

Discounted Costs

By considering continuous-time Markov decision processes where decisions can be made at any time, we show in the case of M/M/1 queues with discounted costs that there exists a monotone optimal policy among all the regular policies.

Adaptive control of M/M/1 queues—continuous-time Markov decision process approach

10.2307/3213809 ◽

1983 ◽

Vol 20 (2) ◽

pp. 368-379 ◽

Cited By ~ 6

Author(s):

Lam Yeh ◽

L. C. Thomas

Keyword(s):

Adaptive Control ◽

Markov Decision Process ◽

Optimal Policy ◽

Continuous Time ◽

Decision Process ◽

Process Approach ◽

Decision Processes ◽

Markov Decision ◽

Discounted Costs

The variance of discounted Markov decision processes

10.1017/s0021900200023123 ◽

1982 ◽

Vol 19 (04) ◽

pp. 794-802 ◽

Cited By ~ 12

Author(s):

Matthew J. Sobel

Keyword(s):

Markov Decision Process ◽

Decision Process ◽

Decision Processes ◽

Higher Moments ◽

Present Value ◽

Short Discussion ◽

Variance Formula ◽

Markov Decision ◽

The Mean

Proceedings of the 33rd Annual ACM/IEEE Symposium on Logic in Computer Science - LICS '18 ◽

Conditional Value-at-Risk for Reachability and Mean Payoff in Markov Decision Processes

10.1145/3209108.3209176 ◽

2018 ◽

Cited By ~ 2

Author(s):

Jan Křetínský ◽

Tobias Meggendorfer

Keyword(s):

At Risk ◽

Value At Risk ◽

Decision Processes ◽

Markov Decision ◽

Mean Payoff

Conditional Value-at-Risk for Random Immediate Reward Variables in Markov Decision Processes

American Journal of Computational Mathematics ◽

10.4236/ajcm.2011.13021 ◽

2011 ◽

Vol 01 (03) ◽

pp. 183-188 ◽

Cited By ~ 1

Author(s):

Masayuki Kageyama ◽

Takayuki Fujii ◽

Koji Kanefuji ◽

Hiroe Tsubaki

Keyword(s):

At Risk ◽

Value At Risk ◽

Decision Processes ◽

A Moreau-Yosida regularization for Markov decision processes

Proyecciones (Antofagasta) ◽

10.22199/issn.0717-6279-2021-01-0008 ◽

2020 ◽

Vol 40 (1) ◽

pp. 117-137

Author(s):

R. Israel Ortega-Gutiérrez ◽

H. Cruz-Suárez

Keyword(s):

Markov Decision Process ◽

Optimal Policy ◽

Decision Process ◽

Value Function ◽

Decision Processes ◽

Original Process ◽

Optimal Value ◽

Markov Decision ◽

Yosida Regularization

This paper addresses a class of sequential optimization problems known as Markov decision processes. These kinds of processes are considered on Euclidean state and action spaces with the total expected discounted cost as the objective function. The main goal of the paper is to provide conditions to guarantee an adequate Moreau-Yosida regularization for Markov decision processes (named the original process). In this way, a new Markov decision process that conforms to the Markov control model of the original process except for the cost function induced via the Moreau-Yosida regularization is established. Compared to the original process, this new discounted Markov decision process has richer properties, such as the differentiability of its optimal value function, strictly convexity of the value function, uniqueness of optimal policy, and the optimal value function and the optimal policy of both processes, are the same. To complement the theory presented, an example is provided.

A Vector Minimum Superharmonic Approach to Solving Infinite-Horizon Discounted Markov Decision Processes

Journal of the Operational Research Society ◽

10.1038/sj/jors/0431109 ◽

1992 ◽

Vol 43 (11) ◽

pp. 1095-1102

Author(s):

D J White

Keyword(s):

Infinite Horizon ◽

Decision Processes ◽