scholarly journals The Linear Programming Approach to Reach-Avoid Problems for Markov Decision Processes

2017 ◽  
Vol 60 ◽  
pp. 263-285 ◽  
Author(s):  
Nikolaos Kariotoglou ◽  
Maryam Kamgarpour ◽  
Tyler H. Summers ◽  
John Lygeros

One of the most fundamental problems in Markov decision processes is analysis and control synthesis for safety and reachability specifications. We consider the stochastic reach-avoid problem, in which the objective is to synthesize a control policy to maximize the probability of reaching a target set at a given time, while staying in a safe set at all prior times. We characterize the solution to this problem through an infinite dimensional linear program. We then develop a tractable approximation to the infinite dimensional linear program through finite dimensional approximations of the decision space and constraints. For a large class of Markov decision processes modeled by Gaussian mixtures kernels we show that through a proper selection of the finite dimensional space, one can further reduce the computational complexity of the resulting linear program. We validate the proposed method and analyze its potential with numerical case studies.

2013 ◽  
Vol 45 (3) ◽  
pp. 837-859 ◽  
Author(s):  
François Dufour ◽  
A. B. Piunovskiy

In this work, we study discrete-time Markov decision processes (MDPs) with constraints when all the objectives have the same form of expected total cost over the infinite time horizon. Our objective is to analyze this problem by using the linear programming approach. Under some technical hypotheses, it is shown that if there exists an optimal solution for the associated linear program then there exists a randomized stationary policy which is optimal for the MDP, and that the optimal value of the linear program coincides with the optimal value of the constrained control problem. A second important result states that the set of randomized stationary policies provides a sufficient set for solving this MDP. It is important to note that, in contrast with the classical results of the literature, we do not assume the MDP to be transient or absorbing. More importantly, we do not impose the cost functions to be nonnegative or to be bounded below. Several examples are presented to illustrate our results.


2013 ◽  
Vol 45 (03) ◽  
pp. 837-859 ◽  
Author(s):  
François Dufour ◽  
A. B. Piunovskiy

In this work, we study discrete-time Markov decision processes (MDPs) with constraints when all the objectives have the same form of expected total cost over the infinite time horizon. Our objective is to analyze this problem by using the linear programming approach. Under some technical hypotheses, it is shown that if there exists an optimal solution for the associated linear program then there exists a randomized stationary policy which is optimal for the MDP, and that the optimal value of the linear program coincides with the optimal value of the constrained control problem. A second important result states that the set of randomized stationary policies provides a sufficient set for solving this MDP. It is important to note that, in contrast with the classical results of the literature, we do not assume the MDP to be transient or absorbing. More importantly, we do not impose the cost functions to be nonnegative or to be bounded below. Several examples are presented to illustrate our results.


2005 ◽  
Vol 02 (03) ◽  
pp. 251-258
Author(s):  
HANLIN HE ◽  
QIAN WANG ◽  
XIAOXIN LIAO

The dual formulation of the maximal-minimal problem for an objective function of the error response to a fixed input in the continuous-time systems is given by a result of Fenchel dual. This formulation probably changes the original problem in the infinite dimensional space into the maximal problem with some restrained conditions in the finite dimensional space, which can be researched by finite dimensional space theory. When the objective function is given by the norm of the error response, the maximum of the error response or minimum of the error response, the dual formulation for the problems of L1-optimal control, the minimum of maximal error response, and the minimal overshoot etc. can be obtained, which gives a method for studying these problems.


2012 ◽  
Vol 44 (3) ◽  
pp. 774-793 ◽  
Author(s):  
François Dufour ◽  
M. Horiguchi ◽  
A. B. Piunovskiy

This paper deals with discrete-time Markov decision processes (MDPs) under constraints where all the objectives have the same form of expected total cost over the infinite time horizon. The existence of an optimal control policy is discussed by using the convex analytic approach. We work under the assumptions that the state and action spaces are general Borel spaces, and that the model is nonnegative, semicontinuous, and there exists an admissible solution with finite cost for the associated linear program. It is worth noting that, in contrast to the classical results in the literature, our hypotheses do not require the MDP to be transient or absorbing. Our first result ensures the existence of an optimal solution to the linear program given by an occupation measure of the process generated by a randomized stationary policy. Moreover, it is shown that this randomized stationary policy provides an optimal solution to this Markov control problem. As a consequence, these results imply that the set of randomized stationary policies is a sufficient set for this optimal control problem. Finally, our last main result states that all optimal solutions of the linear program coincide on a special set with an optimal occupation measure generated by a randomized stationary policy. Several examples are presented to illustrate some theoretical issues and the possible applications of the results developed in the paper.


2018 ◽  
Vol 63 (4) ◽  
pp. 1185-1191 ◽  
Author(s):  
Chandrashekar Lakshminarayanan ◽  
Shalabh Bhatnagar ◽  
Csaba Szepesvari

2017 ◽  
Vol 20 (10) ◽  
pp. 74-83
Author(s):  
V.L. Pasikov

For conflict operated differential system with delay studying of dynamic game of convergence - evasion relatively functional goal set, now regarding evasion and solution of a problem of existence of alternative in the case under consideration is continued. In the work realization of condition of saddle point relatively to the right part of operated system is not supposed. Earlier similar tasks were set and solved for finite-dimensional space at scientific school of the academicianN.N. Krasovsky. For a case of infinite-dimensional space of continuous functions similar tasks were considered by the author. In the suggested work at theorem proving about convergence - evasion, the norm of Hilbert space is used.


Sign in / Sign up

Export Citation Format

Share Document