Quantile Markov Decision Processes

2021 ◽  
Author(s):  
Xiaocheng Li ◽  
Huaiyang Zhong ◽  
Margaret L. Brandeau

Title: Sequential Decision Making Using Quantiles The goal of a traditional Markov decision process (MDP) is to maximize the expectation of cumulative reward over a finite or infinite horizon. In many applications, however, a decision maker may be interested in optimizing a specific quantile of the cumulative reward. For example, a physician may want to determine the optimal drug regime for a risk-averse patient with the objective of maximizing the 0.10 quantile of the cumulative reward; this is the cumulative improvement in health that is expected to occur with at least 90% probability for the patient. In “Quantile Markov Decision Processes,” X. Li, H. Zhong, and M. Brandeau provide analytic results to solve the quantile Markov decision process (QMDP) problem. They develop an efficient dynamic programming procedure that finds the optimal QMDP value function for all states and quantiles in one pass. The algorithm also extends to the MDP problem with a conditional value-at-risk objective.

2019 ◽  
Author(s):  
Denis Pais ◽  
Valdinei Freire ◽  
Karina Valdivia-Delgado

Processos de decisão Markovianos (Markov Decision Processes -- MDPs) são amplamente utilizados para resolver problemas de tomada de decisão sequencial. O critério de desempenho mais utilizado em MDPs é a minimização do custo total esperado. Porém, esta abordagem não leva em consideração flutuações em torno da média, o que pode afetar significativamente o desempenho geral do processo. MDPs que lidam com esse tipo de problema são chamados de MDPs sensíveis a risco. Um tipo especial de MDP sensível a risco é o CVaR MDP, que inclui a métrica CVaR (Conditional-Value-at-Risk) comumente utilizada na área financeira. Um algoritmo que encontra a política ótima para CVaR MDPs é o algoritmo de Iteração de Valor com Interpolação Linear chamado CVaRVILI. O algoritmo CVaRVILI precisa resolver problemas de programação linear várias vezes, o que faz com que o algoritmo tenha um alto custo computacional. Neste trabalho, é proposto um algoritmo que avalia uma política estacionário para CVaR MDPs de custo constante e que não precisa resolver problemas de programação linear, esse algoritmo é chamado de PECVaR. Além disso, foram realizados experimentos usando o custo total esperado e o custo usando o algoritmo PECVaR de uma política neutra para inicializar o algoritmo CVaRVILI. Os resultados mostram que utilizando essas inicializações é possível diminuir o tempo de convergência do CVaRVILI na maioria dos casos.


1982 ◽  
Vol 19 (4) ◽  
pp. 794-802 ◽  
Author(s):  
Matthew J. Sobel

Formulae are presented for the variance and higher moments of the present value of single-stage rewards in a finite Markov decision process. Similar formulae are exhibited for a semi-Markov decision process. There is a short discussion of the obstacles to using the variance formula in algorithms to maximize the mean minus a multiple of the standard deviation.


1998 ◽  
Vol 12 (2) ◽  
pp. 177-187 ◽  
Author(s):  
Kazuyoshi Wakuta

We consider a discounted cost Markov decision process with a constraint. Relating this to a vector-valued Markov decision process, we prove that there exists a constrained optimal randomized semistationary policy if there exists at least one policy satisfying a constraint. Moreover, we present an algorithm by which we can find the constrained optimal randomized semistationary policy, or we can discover that there exist no policies satisfying a given constraint.


1983 ◽  
Vol 20 (02) ◽  
pp. 368-379
Author(s):  
Lam Yeh ◽  
L. C. Thomas

By considering continuous-time Markov decision processes where decisions can be made at any time, we show in the case of M/M/1 queues with discounted costs that there exists a monotone optimal policy among all the regular policies.


1983 ◽  
Vol 20 (2) ◽  
pp. 368-379 ◽  
Author(s):  
Lam Yeh ◽  
L. C. Thomas

By considering continuous-time Markov decision processes where decisions can be made at any time, we show in the case of M/M/1 queues with discounted costs that there exists a monotone optimal policy among all the regular policies.


1982 ◽  
Vol 19 (04) ◽  
pp. 794-802 ◽  
Author(s):  
Matthew J. Sobel

Formulae are presented for the variance and higher moments of the present value of single-stage rewards in a finite Markov decision process. Similar formulae are exhibited for a semi-Markov decision process. There is a short discussion of the obstacles to using the variance formula in algorithms to maximize the mean minus a multiple of the standard deviation.


2020 ◽  
Vol 40 (1) ◽  
pp. 117-137
Author(s):  
R. Israel Ortega-Gutiérrez ◽  
H. Cruz-Suárez

This paper addresses a class of sequential optimization problems known as Markov decision processes. These kinds of processes are considered on Euclidean state and action spaces with the total expected discounted cost as the objective function. The main goal of the paper is to provide conditions to guarantee an adequate Moreau-Yosida regularization for Markov decision processes (named the original process). In this way, a new Markov decision process that conforms to the Markov control model of the original process except for the cost function induced via the Moreau-Yosida regularization is established. Compared to the original process, this new discounted Markov decision process has richer properties, such as the differentiability of its optimal value function, strictly convexity of the value function, uniqueness of optimal policy, and the optimal value function and the optimal policy of both processes, are the same. To complement the theory presented, an example is provided.


Sign in / Sign up

Export Citation Format

Share Document