Computational Approaches for Stochastic Shortest Path on Succinct MDPs

We consider the stochastic shortest path (SSP) problem for succinct Markov decision processes (MDPs), where the MDP consists of a set of variables, and a set of nondeterministic rules that update the variables. First, we show that several examples from the AI literature can be modeled as succinct MDPs. Then we present computational approaches for upper and lower bounds for the SSP problem: (a) for computing upper bounds, our method is polynomial-time in the implicit description of the MDP; (b) for lower bounds, we present a polynomial-time (in the size of the implicit description) reduction to quadratic programming. Our approach is applicable even to infinite-state MDPs. Finally, we present experimental results to demonstrate the effectiveness of our approach on several classical examples from the AI literature.

Download Full-text

Combination of acceleration procedures for solving stochastic shortest-path Markov decision processes

2010 IEEE International Conference on Intelligent Systems and Knowledge Engineering ◽

10.1109/iske.2010.5680801 ◽

2010 ◽

Cited By ~ 1

Author(s):

M.G. Garcia-Hernandez ◽

J. Ruiz-Pinales ◽

S. Ledesma-Orozco ◽

G. Avina-Cervantes ◽

E. Onaindia ◽

...

Keyword(s):

Markov Decision Processes ◽

Shortest Path ◽

Decision Processes ◽

Stochastic Shortest Path ◽

Markov Decision

Download Full-text

Mixed Acceleration Techniques for Solving Quickly Stochastic Shortest-Path Markov Decision Processes

Journal of Applied Research and Technology ◽

10.22201/icat.16656423.2011.9.02.439 ◽

2011 ◽

Vol 9 (02) ◽

Author(s):

M. de G. García-Hernández ◽

J. Ruiz-Pinales ◽

E. Onaindía ◽

S. Ledesma-Orozco ◽

J. G. Aviña-Cervantes ◽

...

Keyword(s):

Markov Decision Processes ◽

Shortest Path ◽

Decision Processes ◽

Solution Time ◽

Value Iteration ◽

Stochastic Shortest Path ◽

Acceleration Techniques ◽

Finite State ◽

Markov Decision ◽

Fast Solution

In this paper we propose the combination of accelerated variants of value iteration mixed with improved prioritizedsweeping for the fast solution of stochastic shortest-path Markov decision processes. Value iteration is a classicalalgorithm for solving Markov decision processes, but this algorithm and its variants are quite slow for solvingconsiderably large problems. In order to improve the solution time, acceleration techniques such as asynchronousupdates, prioritization and prioritized sweeping have been explored in this paper. A topological reordering algorithmwas also compared with static reordering. Experimental results obtained on finite state and action-space stochasticshortest-path problems show that our approach achieves a considerable reduction in the solution time with respect tothe tested variants of value iteration. For instance, the experiments showed in one test a reduction of 5.7 times withrespect to value iteration with asynchronous updates.

Download Full-text

Risk-Sensitive Piecewise-Linear Policy Iteration for Stochastic Shortest Path Markov Decision Processes

Advances in Soft Computing - Lecture Notes in Computer Science ◽

10.1007/978-3-030-60884-2_28 ◽

2020 ◽

pp. 383-395

Author(s):

Henrique Dias Pastor ◽

Igor Oliveira Borges ◽

Valdinei Freire ◽

Karina Valdivia Delgado ◽

Leliane Nunes de Barros

Keyword(s):

Markov Decision Processes ◽

Shortest Path ◽

Piecewise Linear ◽

Policy Iteration ◽

Decision Processes ◽

Stochastic Shortest Path ◽

Risk Sensitive ◽

Markov Decision

Download Full-text

Multilevel Monte Carlo methods and lower–upper bounds in initial margin computations

Monte Carlo Methods and Applications ◽

10.1515/mcma-2020-2062 ◽

2020 ◽

Vol 26 (2) ◽

pp. 131-161

Author(s):

Florian Bourgey ◽

Stefano De Marco ◽

Emmanuel Gobet ◽

Alexandre Zhou

Keyword(s):

Monte Carlo ◽

Lower Bounds ◽

Asymptotic Optimality ◽

Upper Bounds ◽

Upper And Lower Bounds ◽

Independent Random Variables ◽

Outer Function ◽

Control Problems ◽

Multilevel Monte Carlo ◽

Primal Dual

AbstractThe multilevel Monte Carlo (MLMC) method developed by M. B. Giles [Multilevel Monte Carlo path simulation, Oper. Res. 56 2008, 3, 607–617] has a natural application to the evaluation of nested expectations {\mathbb{E}[g(\mathbb{E}[f(X,Y)|X])]}, where {f,g} are functions and {(X,Y)} a couple of independent random variables. Apart from the pricing of American-type derivatives, such computations arise in a large variety of risk valuations (VaR or CVaR of a portfolio, CVA), and in the assessment of margin costs for centrally cleared portfolios. In this work, we focus on the computation of initial margin. We analyze the properties of corresponding MLMC estimators, for which we provide results of asymptotic optimality; at the technical level, we have to deal with limited regularity of the outer function g (which might fail to be everywhere differentiable). Parallel to this, we investigate upper and lower bounds for nested expectations as above, in the spirit of primal-dual algorithms for stochastic control problems.

Download Full-text

Temporal concatenation for Markov decision processes

Probability in the Engineering and Informational Sciences ◽

10.1017/s0269964821000206 ◽

2021 ◽

pp. 1-28

Author(s):

Ruiyang Song ◽

Kuang Xu

Keyword(s):

Markov Decision Processes ◽

Large Scale ◽

Optimal Solution ◽

Upper Bounds ◽

Black Box ◽

Decision Processes ◽

Optimal Solutions ◽

Wide Range ◽

Markov Decision ◽

Speed Up

We propose and analyze a temporal concatenation heuristic for solving large-scale finite-horizon Markov decision processes (MDP), which divides the MDP into smaller sub-problems along the time horizon and generates an overall solution by simply concatenating the optimal solutions from these sub-problems. As a “black box” architecture, temporal concatenation works with a wide range of existing MDP algorithms. Our main results characterize the regret of temporal concatenation compared to the optimal solution. We provide upper bounds for general MDP instances, as well as a family of MDP instances in which the upper bounds are shown to be tight. Together, our results demonstrate temporal concatenation's potential of substantial speed-up at the expense of some performance degradation.

Download Full-text

PageRank Optimization in Polynomial Time by Stochastic Shortest Path Reformulation

Lecture Notes in Computer Science - Algorithmic Learning Theory ◽

10.1007/978-3-642-16108-7_11 ◽

2010 ◽

pp. 89-103 ◽

Cited By ~ 7

Author(s):

Balázs Csanád Csáji ◽

Raphaël M. Jungers ◽

Vincent D. Blondel

Keyword(s):

Polynomial Time ◽

Shortest Path ◽

Stochastic Shortest Path

Download Full-text

ON THE ORDER OF MAGNITUDE OF SUMS OF NEGATIVE POWERS OF INTEGRATED PROCESSES

Econometric Theory ◽

10.1017/s0266466612000503 ◽

2012 ◽

Vol 29 (3) ◽

pp. 642-658 ◽

Cited By ~ 1

Author(s):

Benedikt M. Pötscher

Keyword(s):

Lower Bounds ◽

Random Variables ◽

Upper Bounds ◽

Upper And Lower Bounds ◽

Integrated Process ◽

Order Of Magnitude ◽

Image Position ◽

Integrated Processes

Upper and lower bounds on the order of magnitude of $\sum\nolimits_{t = 1}^n {\lefttnq#x007C; {x_t } \righttnq#x007C;^{ - \alpha } } $, where xt is an integrated process, are obtained. Furthermore, upper bounds for the order of magnitude of the related quantity $\sum\nolimits_{t = 1}^n {v_t } \lefttnq#x007C; {x_t } \righttnq#x007C;^{ - \alpha } $, where vt are random variables satisfying certain conditions, are also derived.

Download Full-text

EQUIVALENCE OF LABELED MARKOV CHAINS

International Journal of Foundations of Computer Science ◽

10.1142/s0129054108005814 ◽

2008 ◽

Vol 19 (03) ◽

pp. 549-563 ◽

Cited By ~ 28

Author(s):

LAURENT DOYEN ◽

THOMAS A. HENZINGER ◽

JEAN-FRANÇOIS RASKIN

Keyword(s):

Markov Chains ◽

Markov Decision Processes ◽

Polynomial Time ◽

Finite Sequence ◽

Equivalence Problem ◽

Decision Processes ◽

Probabilistic Automata ◽

Alternative Algorithm ◽

Markov Decision ◽

Definition Of

We consider the equivalence problem for labeled Markov chains (LMCs), where each state is labeled with an observation. Two LMCs are equivalent if every finite sequence of observations has the same probability of occurrence in the two LMCs. We show that equivalence can be decided in polynomial time, using a reduction to the equivalence problem for probabilistic automata, which is known to be solvable in polynomial time. We provide an alternative algorithm to solve the equivalence problem, which is based on a new definition of bisimulation for probabilistic automata. We also extend the technique to decide the equivalence of weighted probabilistic automata. Then, we consider the equivalence problem for labeled Markov decision processes (LMDPs), which asks given two LMDPs whether for every scheduler (i.e. way of resolving the nondeterministic decisions) for each of the processes, there exists a scheduler for the other process such that the resulting LMCs are equivalent. The decidability of this problem remains open. We show that the schedulers can be restricted to be observation-based, but may require infinite memory.

Download Full-text

Symblicit algorithms for mean-payoff and shortest path in monotonic Markov decision processes

Acta Informatica ◽

10.1007/s00236-016-0255-4 ◽

2016 ◽

Vol 54 (6) ◽

pp. 545-587

Author(s):

Aaron Bohy ◽

Véronique Bruyère ◽

Jean-François Raskin ◽

Nathalie Bertrand

Keyword(s):

Markov Decision Processes ◽

Shortest Path ◽

Decision Processes ◽

Markov Decision ◽

Mean Payoff

Download Full-text

Composite Thermoelectrics - Exact Results and Calculational Methods

MRS Proceedings ◽

10.1557/proc-234-39 ◽

1991 ◽

Vol 234 ◽

Cited By ~ 6

Author(s):

David J. Bergman ◽

Ohad Levy

Keyword(s):

Quality Factor ◽

Computer Simulations ◽

Lower Bounds ◽

Theoretical Study ◽

Transport Coefficients ◽

Upper Bounds ◽

Upper And Lower Bounds ◽

Exact Results ◽

Thermoelectric Transport

ABSTRACTA theoretical study of composite thermoelectric media has resulted in the development of a number of simple approximations, as well as some exact results. The latter include exact upper and lower bounds on the bulk effective thermoelectric transport coefficients of the composite and upper bounds on the bulk effective thermoelectric quality factor Ze. In particular, as a result of some exact theorems and computer simulations we conclude that Ze can never be greater than the largest value of Z in the different components that make up the composite.

Download Full-text