scholarly journals Temporal concatenation for Markov decision processes

Author(s):  
Ruiyang Song ◽  
Kuang Xu

We propose and analyze a temporal concatenation heuristic for solving large-scale finite-horizon Markov decision processes (MDP), which divides the MDP into smaller sub-problems along the time horizon and generates an overall solution by simply concatenating the optimal solutions from these sub-problems. As a “black box” architecture, temporal concatenation works with a wide range of existing MDP algorithms. Our main results characterize the regret of temporal concatenation compared to the optimal solution. We provide upper bounds for general MDP instances, as well as a family of MDP instances in which the upper bounds are shown to be tight. Together, our results demonstrate temporal concatenation's potential of substantial speed-up at the expense of some performance degradation.

2015 ◽  
Vol 13 (3) ◽  
pp. 47-57 ◽  
Author(s):  
Sanaa Chafik ◽  
Cherki Daoui

As many real applications need a large amount of states, the classical methods are intractable for solving large Markov Decision Processes. The decomposition technique basing on the topology of each state in the associated graph and the parallelization technique are very useful methods to cope with this problem. In this paper, the authors propose a Modified Value Iteration algorithm, adding the parallelism technique. They test their implementation on artificial data using an Open MP that offers a significant speed-up.


2012 ◽  
Vol 44 (3) ◽  
pp. 774-793 ◽  
Author(s):  
François Dufour ◽  
M. Horiguchi ◽  
A. B. Piunovskiy

This paper deals with discrete-time Markov decision processes (MDPs) under constraints where all the objectives have the same form of expected total cost over the infinite time horizon. The existence of an optimal control policy is discussed by using the convex analytic approach. We work under the assumptions that the state and action spaces are general Borel spaces, and that the model is nonnegative, semicontinuous, and there exists an admissible solution with finite cost for the associated linear program. It is worth noting that, in contrast to the classical results in the literature, our hypotheses do not require the MDP to be transient or absorbing. Our first result ensures the existence of an optimal solution to the linear program given by an occupation measure of the process generated by a randomized stationary policy. Moreover, it is shown that this randomized stationary policy provides an optimal solution to this Markov control problem. As a consequence, these results imply that the set of randomized stationary policies is a sufficient set for this optimal control problem. Finally, our last main result states that all optimal solutions of the linear program coincide on a special set with an optimal occupation measure generated by a randomized stationary policy. Several examples are presented to illustrate some theoretical issues and the possible applications of the results developed in the paper.


Author(s):  
Krishnendu Chatterjee ◽  
Adrián Elgyütt ◽  
Petr Novotný ◽  
Owen Rouillé

Partially-observable Markov decision processes (POMDPs) with discounted-sum payoff are a standard framework to model a wide range of problems related to decision making under uncertainty. Traditionally, the goal has been to obtain policies that optimize the expectation of the discounted-sum payoff. A key drawback of the expectation measure is that even low probability events with extreme payoff can significantly affect the expectation, and thus the obtained policies are not necessarily risk averse. An alternate approach is to optimize the probability that the payoff is above a certain threshold, which allows to obtain risk-averse policies, but ignore optimization of the expectation. We consider the expectation optimization with probabilistic guarantee (EOPG) problem where the goal is to optimize the expectation ensuring that the payoff is above a given threshold with at least a specified probability. We present several results on the EOPG problem, including the first algorithm to solve it.


Author(s):  
Zhiwei Chen ◽  
Xiaopeng Li ◽  
Xiaobo Qu

The “asymmetry” between spatiotemporally varying passenger demand and fixed-capacity transportation supply has been a long-standing problem in urban mass transportation (UMT) systems around the world. The emerging modular autonomous vehicle (MAV) technology offers us an opportunity to close the substantial gap between passenger demand and vehicle capacity through station-wise docking and undocking operations. However, there still lacks an appropriate approach that can solve the operational design problem for UMT corridor systems with MAVs efficiently. To bridge this methodological gap, this paper proposes a continuum approximation (CA) model that can offer near-optimal solutions to the operational design for MAV-based transit corridors very efficiently. We investigate the theoretical properties of the optimal solutions to the investigated problem in a certain (yet not uncommon) case. These theoretical properties allow us to estimate the seat demand of each time neighborhood with the arrival demand curves, which recover the “local impact” property of the investigated problem. With the property, a CA model is properly formulated to decompose the original problem into a finite number of subproblems that can be analytically solved. A discretization heuristic is then proposed to convert the analytical solution from the CA model to feasible solutions to the original problem. With two sets of numerical experiments, we show that the proposed CA model can achieve near-optimal solutions (with gaps less than 4% for most cases) to the investigated problem in almost no time (less than 10 ms) for large-scale instances with a wide range of parameter settings (a commercial solver may even not obtain a feasible solution in several hours). The theoretical properties are verified, and managerial insights regarding how input parameters affect system performance are provided through these numerical results. Additionally, results also reveal that, although the CA model does not incorporate vehicle repositioning decisions, the timetabling decisions obtained by solving the CA model can be easily applied to obtain near-optimal repositioning decisions (with gaps less than 5% in most instances) very efficiently (within 10 ms). Thus, the proposed CA model provides a foundation for developing solution approaches for other problems (e.g., MAV repositioning) with more complex system operation constraints whose exact optimal solution can hardly be found with discrete modeling methods.


2017 ◽  
Vol 36 (2) ◽  
pp. 231-258 ◽  
Author(s):  
Shayegan Omidshafiei ◽  
Ali–Akbar Agha–Mohammadi ◽  
Christopher Amato ◽  
Shih–Yuan Liu ◽  
Jonathan P How ◽  
...  

This work focuses on solving general multi-robot planning problems in continuous spaces with partial observability given a high-level domain description. Decentralized Partially Observable Markov Decision Processes (Dec-POMDPs) are general models for multi-robot coordination problems. However, representing and solving Dec-POMDPs is often intractable for large problems. This work extends the Dec-POMDP model to the Decentralized Partially Observable Semi-Markov Decision Process (Dec-POSMDP) to take advantage of the high-level representations that are natural for multi-robot problems and to facilitate scalable solutions to large discrete and continuous problems. The Dec-POSMDP formulation uses task macro-actions created from lower-level local actions that allow for asynchronous decision-making by the robots, which is crucial in multi-robot domains. This transformation from Dec-POMDPs to Dec-POSMDPs with a finite set of automatically-generated macro-actions allows use of efficient discrete-space search algorithms to solve them. The paper presents algorithms for solving Dec-POSMDPs, which are more scalable than previous methods since they can incorporate closed-loop belief space macro-actions in planning. These macro-actions are automatically constructed to produce robust solutions. The proposed algorithms are then evaluated on a complex multi-robot package delivery problem under uncertainty, showing that our approach can naturally represent realistic problems and provide high-quality solutions for large-scale problems.


2013 ◽  
Vol 45 (3) ◽  
pp. 837-859 ◽  
Author(s):  
François Dufour ◽  
A. B. Piunovskiy

In this work, we study discrete-time Markov decision processes (MDPs) with constraints when all the objectives have the same form of expected total cost over the infinite time horizon. Our objective is to analyze this problem by using the linear programming approach. Under some technical hypotheses, it is shown that if there exists an optimal solution for the associated linear program then there exists a randomized stationary policy which is optimal for the MDP, and that the optimal value of the linear program coincides with the optimal value of the constrained control problem. A second important result states that the set of randomized stationary policies provides a sufficient set for solving this MDP. It is important to note that, in contrast with the classical results of the literature, we do not assume the MDP to be transient or absorbing. More importantly, we do not impose the cost functions to be nonnegative or to be bounded below. Several examples are presented to illustrate our results.


2012 ◽  
Vol 44 (03) ◽  
pp. 774-793 ◽  
Author(s):  
François Dufour ◽  
M. Horiguchi ◽  
A. B. Piunovskiy

This paper deals with discrete-time Markov decision processes (MDPs) under constraints where all the objectives have the same form of expected total cost over the infinite time horizon. The existence of an optimal control policy is discussed by using the convex analytic approach. We work under the assumptions that the state and action spaces are general Borel spaces, and that the model is nonnegative, semicontinuous, and there exists an admissible solution with finite cost for the associated linear program. It is worth noting that, in contrast to the classical results in the literature, our hypotheses do not require the MDP to be transient or absorbing. Our first result ensures the existence of an optimal solution to the linear program given by an occupation measure of the process generated by a randomized stationary policy. Moreover, it is shown that this randomized stationary policy provides an optimal solution to this Markov control problem. As a consequence, these results imply that the set of randomized stationary policies is a sufficient set for this optimal control problem. Finally, our last main result states that all optimal solutions of the linear program coincide on a special set with an optimal occupation measure generated by a randomized stationary policy. Several examples are presented to illustrate some theoretical issues and the possible applications of the results developed in the paper.


2011 ◽  
Vol 48 (04) ◽  
pp. 954-967 ◽  
Author(s):  
Chin Hon Tan ◽  
Joseph C. Hartman

Sequential decision problems can often be modeled as Markov decision processes. Classical solution approaches assume that the parameters of the model are known. However, model parameters are usually estimated and uncertain in practice. As a result, managers are often interested in how estimation errors affect the optimal solution. In this paper we illustrate how sensitivity analysis can be performed directly for a Markov decision process with uncertain reward parameters using the Bellman equations. In particular, we consider problems involving (i) a single stationary parameter, (ii) multiple stationary parameters, and (iii) multiple nonstationary parameters. We illustrate the applicability of this work through a capacitated stochastic lot-sizing problem.


Sign in / Sign up

Export Citation Format

Share Document