scholarly journals Computational Approaches for Stochastic Shortest Path on Succinct MDPs

Author(s):  
Krishnendu Chatterjee ◽  
Hongfei Fu ◽  
Amir Goharshady ◽  
Nastaran Okati

We consider the stochastic shortest path (SSP) problem for succinct Markov decision processes (MDPs), where the MDP consists of a set of variables, and a set of nondeterministic rules that update the variables. First, we show that several examples from the AI literature can be modeled as succinct MDPs. Then we present computational approaches for upper and lower bounds for the SSP problem: (a) for computing upper bounds, our method is polynomial-time in the implicit description of the MDP; (b) for lower bounds, we present a polynomial-time (in the size of the implicit description) reduction to quadratic programming. Our approach is applicable even to infinite-state MDPs. Finally, we present experimental results to demonstrate the effectiveness of our approach on several classical examples from the AI literature.

Author(s):  
M. de G. García-Hernández ◽  
J. Ruiz-Pinales ◽  
E. Onaindía ◽  
S. Ledesma-Orozco ◽  
J. G. Aviña-Cervantes ◽  
...  

In this paper we propose the combination of accelerated variants of value iteration mixed with improved prioritizedsweeping for the fast solution of stochastic shortest-path Markov decision processes. Value iteration is a classicalalgorithm for solving Markov decision processes, but this algorithm and its variants are quite slow for solvingconsiderably large problems. In order to improve the solution time, acceleration techniques such as asynchronousupdates, prioritization and prioritized sweeping have been explored in this paper. A topological reordering algorithmwas also compared with static reordering. Experimental results obtained on finite state and action-space stochasticshortest-path problems show that our approach achieves a considerable reduction in the solution time with respect tothe tested variants of value iteration. For instance, the experiments showed in one test a reduction of 5.7 times withrespect to value iteration with asynchronous updates.


2020 ◽  
Vol 26 (2) ◽  
pp. 131-161
Author(s):  
Florian Bourgey ◽  
Stefano De Marco ◽  
Emmanuel Gobet ◽  
Alexandre Zhou

AbstractThe multilevel Monte Carlo (MLMC) method developed by M. B. Giles [Multilevel Monte Carlo path simulation, Oper. Res. 56 2008, 3, 607–617] has a natural application to the evaluation of nested expectations {\mathbb{E}[g(\mathbb{E}[f(X,Y)|X])]}, where {f,g} are functions and {(X,Y)} a couple of independent random variables. Apart from the pricing of American-type derivatives, such computations arise in a large variety of risk valuations (VaR or CVaR of a portfolio, CVA), and in the assessment of margin costs for centrally cleared portfolios. In this work, we focus on the computation of initial margin. We analyze the properties of corresponding MLMC estimators, for which we provide results of asymptotic optimality; at the technical level, we have to deal with limited regularity of the outer function g (which might fail to be everywhere differentiable). Parallel to this, we investigate upper and lower bounds for nested expectations as above, in the spirit of primal-dual algorithms for stochastic control problems.


Author(s):  
Ruiyang Song ◽  
Kuang Xu

We propose and analyze a temporal concatenation heuristic for solving large-scale finite-horizon Markov decision processes (MDP), which divides the MDP into smaller sub-problems along the time horizon and generates an overall solution by simply concatenating the optimal solutions from these sub-problems. As a “black box” architecture, temporal concatenation works with a wide range of existing MDP algorithms. Our main results characterize the regret of temporal concatenation compared to the optimal solution. We provide upper bounds for general MDP instances, as well as a family of MDP instances in which the upper bounds are shown to be tight. Together, our results demonstrate temporal concatenation's potential of substantial speed-up at the expense of some performance degradation.


2012 ◽  
Vol 29 (3) ◽  
pp. 642-658 ◽  
Author(s):  
Benedikt M. Pötscher

Upper and lower bounds on the order of magnitude of $\sum\nolimits_{t = 1}^n {\lefttnq#x007C; {x_t } \righttnq#x007C;^{ - \alpha } } $, where xt is an integrated process, are obtained. Furthermore, upper bounds for the order of magnitude of the related quantity $\sum\nolimits_{t = 1}^n {v_t } \lefttnq#x007C; {x_t } \righttnq#x007C;^{ - \alpha } $, where vt are random variables satisfying certain conditions, are also derived.


2008 ◽  
Vol 19 (03) ◽  
pp. 549-563 ◽  
Author(s):  
LAURENT DOYEN ◽  
THOMAS A. HENZINGER ◽  
JEAN-FRANÇOIS RASKIN

We consider the equivalence problem for labeled Markov chains (LMCs), where each state is labeled with an observation. Two LMCs are equivalent if every finite sequence of observations has the same probability of occurrence in the two LMCs. We show that equivalence can be decided in polynomial time, using a reduction to the equivalence problem for probabilistic automata, which is known to be solvable in polynomial time. We provide an alternative algorithm to solve the equivalence problem, which is based on a new definition of bisimulation for probabilistic automata. We also extend the technique to decide the equivalence of weighted probabilistic automata. Then, we consider the equivalence problem for labeled Markov decision processes (LMDPs), which asks given two LMDPs whether for every scheduler (i.e. way of resolving the nondeterministic decisions) for each of the processes, there exists a scheduler for the other process such that the resulting LMCs are equivalent. The decidability of this problem remains open. We show that the schedulers can be restricted to be observation-based, but may require infinite memory.


2016 ◽  
Vol 54 (6) ◽  
pp. 545-587
Author(s):  
Aaron Bohy ◽  
Véronique Bruyère ◽  
Jean-François Raskin ◽  
Nathalie Bertrand

1991 ◽  
Vol 234 ◽  
Author(s):  
David J. Bergman ◽  
Ohad Levy

ABSTRACTA theoretical study of composite thermoelectric media has resulted in the development of a number of simple approximations, as well as some exact results. The latter include exact upper and lower bounds on the bulk effective thermoelectric transport coefficients of the composite and upper bounds on the bulk effective thermoelectric quality factor Ze. In particular, as a result of some exact theorems and computer simulations we conclude that Ze can never be greater than the largest value of Z in the different components that make up the composite.


Sign in / Sign up

Export Citation Format

Share Document