scholarly journals Expectation Optimization with Probabilistic Guarantees in POMDPs with Discounted-Sum Objectives

Author(s):  
Krishnendu Chatterjee ◽  
Adrián Elgyütt ◽  
Petr Novotný ◽  
Owen Rouillé

Partially-observable Markov decision processes (POMDPs) with discounted-sum payoff are a standard framework to model a wide range of problems related to decision making under uncertainty. Traditionally, the goal has been to obtain policies that optimize the expectation of the discounted-sum payoff. A key drawback of the expectation measure is that even low probability events with extreme payoff can significantly affect the expectation, and thus the obtained policies are not necessarily risk averse. An alternate approach is to optimize the probability that the payoff is above a certain threshold, which allows to obtain risk-averse policies, but ignore optimization of the expectation. We consider the expectation optimization with probabilistic guarantee (EOPG) problem where the goal is to optimize the expectation ensuring that the payoff is above a given threshold with at least a specified probability. We present several results on the EOPG problem, including the first algorithm to solve it.

Author(s):  
Pascal Poupart

The goal of this chapter is to provide an introduction to Markov decision processes as a framework for sequential decision making under uncertainty. The aim of this introduction is to provide practitioners with a basic understanding of the common modeling and solution techniques. Hence, we will not delve into the details of the most recent algorithms, but rather focus on the main concepts and the issues that impact deployment in practice. More precisely, we will review fully and partially observable Markov decision processes, describe basic algorithms to find good policies and discuss modeling/computational issues that arise in practice.


2020 ◽  
Vol 34 (06) ◽  
pp. 9794-9801
Author(s):  
Tomáš Brázdil ◽  
Krishnendu Chatterjee ◽  
Petr Novotný ◽  
Jiří Vahala

Markov decision processes (MDPs) are the defacto framework for sequential decision making in the presence of stochastic uncertainty. A classical optimization criterion for MDPs is to maximize the expected discounted-sum payoff, which ignores low probability catastrophic events with highly negative impact on the system. On the other hand, risk-averse policies require the probability of undesirable events to be below a given threshold, but they do not account for optimization of the expected payoff. We consider MDPs with discounted-sum payoff with failure states which represent catastrophic outcomes. The objective of risk-constrained planning is to maximize the expected discounted-sum payoff among risk-averse policies that ensure the probability to encounter a failure state is below a desired threshold. Our main contribution is an efficient risk-constrained planning algorithm that combines UCT-like search with a predictor learned through interaction with the MDP (in the style of AlphaZero) and with a risk-constrained action selection via linear programming. We demonstrate the effectiveness of our approach with experiments on classical MDPs from the literature, including benchmarks with an order of 106 states.


Author(s):  
Ruiyang Song ◽  
Kuang Xu

We propose and analyze a temporal concatenation heuristic for solving large-scale finite-horizon Markov decision processes (MDP), which divides the MDP into smaller sub-problems along the time horizon and generates an overall solution by simply concatenating the optimal solutions from these sub-problems. As a “black box” architecture, temporal concatenation works with a wide range of existing MDP algorithms. Our main results characterize the regret of temporal concatenation compared to the optimal solution. We provide upper bounds for general MDP instances, as well as a family of MDP instances in which the upper bounds are shown to be tight. Together, our results demonstrate temporal concatenation's potential of substantial speed-up at the expense of some performance degradation.


Sign in / Sign up

Export Citation Format

Share Document