Temporal concatenation for Markov decision processes

A Modified Value Iteration Algorithm for Discounted Markov Decision Processes

Journal of Electronic Commerce in Organizations ◽

10.4018/jeco.2015070104 ◽

2015 ◽

Vol 13 (3) ◽

pp. 47-57 ◽

Cited By ~ 1

Author(s):

Sanaa Chafik ◽

Cherki Daoui

Keyword(s):

Markov Decision Processes ◽

Decision Processes ◽

Iteration Algorithm ◽

Value Iteration ◽

Decomposition Technique ◽

Artificial Data ◽

Markov Decision ◽

Speed Up ◽

Value Iteration Algorithm

As many real applications need a large amount of states, the classical methods are intractable for solving large Markov Decision Processes. The decomposition technique basing on the topology of each state in the associated graph and the parallelization technique are very useful methods to cope with this problem. In this paper, the authors propose a Modified Value Iteration algorithm, adding the parallelism technique. They test their implementation on artificial data using an Open MP that offers a significant speed-up.

Download Full-text

The Expected Total Cost Criterion for Markov Decision Processes under Constraints: A Convex Analytic Approach

Advances in Applied Probability ◽

10.1239/aap/1346955264 ◽

2012 ◽

Vol 44 (3) ◽

pp. 774-793 ◽

Cited By ~ 4

Author(s):

François Dufour ◽

M. Horiguchi ◽

A. B. Piunovskiy

Keyword(s):

Optimal Control ◽

Markov Decision Processes ◽

Optimal Solution ◽

Decision Processes ◽

Linear Program ◽

Occupation Measure ◽

Stationary Policy ◽

Total Cost ◽

Markov Decision ◽

Expected Total Cost

This paper deals with discrete-time Markov decision processes (MDPs) under constraints where all the objectives have the same form of expected total cost over the infinite time horizon. The existence of an optimal control policy is discussed by using the convex analytic approach. We work under the assumptions that the state and action spaces are general Borel spaces, and that the model is nonnegative, semicontinuous, and there exists an admissible solution with finite cost for the associated linear program. It is worth noting that, in contrast to the classical results in the literature, our hypotheses do not require the MDP to be transient or absorbing. Our first result ensures the existence of an optimal solution to the linear program given by an occupation measure of the process generated by a randomized stationary policy. Moreover, it is shown that this randomized stationary policy provides an optimal solution to this Markov control problem. As a consequence, these results imply that the set of randomized stationary policies is a sufficient set for this optimal control problem. Finally, our last main result states that all optimal solutions of the linear program coincide on a special set with an optimal occupation measure generated by a randomized stationary policy. Several examples are presented to illustrate some theoretical issues and the possible applications of the results developed in the paper.

Download Full-text

Expectation Optimization with Probabilistic Guarantees in POMDPs with Discounted-Sum Objectives

Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2018/652 ◽

2018 ◽

Author(s):

Krishnendu Chatterjee ◽

Adrián Elgyütt ◽

Petr Novotný ◽

Owen Rouillé

Keyword(s):

Markov Decision Processes ◽

Decision Processes ◽

Decision Making Under Uncertainty ◽

Risk Averse ◽

Wide Range ◽

Markov Decision ◽

Expectation Optimization ◽

Low Probability ◽

Partially Observable ◽

Standard Framework

Partially-observable Markov decision processes (POMDPs) with discounted-sum payoff are a standard framework to model a wide range of problems related to decision making under uncertainty. Traditionally, the goal has been to obtain policies that optimize the expectation of the discounted-sum payoff. A key drawback of the expectation measure is that even low probability events with extreme payoff can significantly affect the expectation, and thus the obtained policies are not necessarily risk averse. An alternate approach is to optimize the probability that the payoff is above a certain threshold, which allows to obtain risk-averse policies, but ignore optimization of the expectation. We consider the expectation optimization with probabilistic guarantee (EOPG) problem where the goal is to optimize the expectation ensuring that the payoff is above a given threshold with at least a specified probability. We present several results on the EOPG problem, including the first algorithm to solve it.

Download Full-text

A Continuous Model for Designing Corridor Systems with Modular Autonomous Vehicles Enabling Station-wise Docking

Transportation Science ◽

10.1287/trsc.2021.1085 ◽

2021 ◽

Author(s):

Zhiwei Chen ◽

Xiaopeng Li ◽

Xiaobo Qu

Keyword(s):

Large Scale ◽

Original Problem ◽

Optimal Solution ◽

Continuum Approximation ◽

Optimal Solutions ◽

Property A ◽

Passenger Demand ◽

Operational Design ◽

Wide Range ◽

Ca Model

The “asymmetry” between spatiotemporally varying passenger demand and fixed-capacity transportation supply has been a long-standing problem in urban mass transportation (UMT) systems around the world. The emerging modular autonomous vehicle (MAV) technology offers us an opportunity to close the substantial gap between passenger demand and vehicle capacity through station-wise docking and undocking operations. However, there still lacks an appropriate approach that can solve the operational design problem for UMT corridor systems with MAVs efficiently. To bridge this methodological gap, this paper proposes a continuum approximation (CA) model that can offer near-optimal solutions to the operational design for MAV-based transit corridors very efficiently. We investigate the theoretical properties of the optimal solutions to the investigated problem in a certain (yet not uncommon) case. These theoretical properties allow us to estimate the seat demand of each time neighborhood with the arrival demand curves, which recover the “local impact” property of the investigated problem. With the property, a CA model is properly formulated to decompose the original problem into a finite number of subproblems that can be analytically solved. A discretization heuristic is then proposed to convert the analytical solution from the CA model to feasible solutions to the original problem. With two sets of numerical experiments, we show that the proposed CA model can achieve near-optimal solutions (with gaps less than 4% for most cases) to the investigated problem in almost no time (less than 10 ms) for large-scale instances with a wide range of parameter settings (a commercial solver may even not obtain a feasible solution in several hours). The theoretical properties are verified, and managerial insights regarding how input parameters affect system performance are provided through these numerical results. Additionally, results also reveal that, although the CA model does not incorporate vehicle repositioning decisions, the timetabling decisions obtained by solving the CA model can be easily applied to obtain near-optimal repositioning decisions (with gaps less than 5% in most instances) very efficiently (within 10 ms). Thus, the proposed CA model provides a foundation for developing solution approaches for other problems (e.g., MAV repositioning) with more complex system operation constraints whose exact optimal solution can hardly be found with discrete modeling methods.

Download Full-text

Decentralized control of multi-robot partially observable Markov decision processes using belief space macro-actions

The International Journal of Robotics Research ◽

10.1177/0278364917692864 ◽

2017 ◽

Vol 36 (2) ◽

pp. 231-258 ◽

Cited By ~ 14

Author(s):

Shayegan Omidshafiei ◽

Ali–Akbar Agha–Mohammadi ◽

Christopher Amato ◽

Shih–Yuan Liu ◽

Jonathan P How ◽

...

Keyword(s):

Markov Decision Processes ◽

Large Scale ◽

Decision Processes ◽

Delivery Problem ◽

Package Delivery ◽

Markov Decision ◽

Partially Observable Markov ◽

High Level ◽

Partially Observable ◽

Multi Robot

This work focuses on solving general multi-robot planning problems in continuous spaces with partial observability given a high-level domain description. Decentralized Partially Observable Markov Decision Processes (Dec-POMDPs) are general models for multi-robot coordination problems. However, representing and solving Dec-POMDPs is often intractable for large problems. This work extends the Dec-POMDP model to the Decentralized Partially Observable Semi-Markov Decision Process (Dec-POSMDP) to take advantage of the high-level representations that are natural for multi-robot problems and to facilitate scalable solutions to large discrete and continuous problems. The Dec-POSMDP formulation uses task macro-actions created from lower-level local actions that allow for asynchronous decision-making by the robots, which is crucial in multi-robot domains. This transformation from Dec-POMDPs to Dec-POSMDPs with a finite set of automatically-generated macro-actions allows use of efficient discrete-space search algorithms to solve them. The paper presents algorithms for solving Dec-POSMDPs, which are more scalable than previous methods since they can incorporate closed-loop belief space macro-actions in planning. These macro-actions are automatically constructed to produce robust solutions. The proposed algorithms are then evaluated on a complex multi-robot package delivery problem under uncertainty, showing that our approach can naturally represent realistic problems and provide high-quality solutions for large-scale problems.

Download Full-text

The Expected Total Cost Criterion for Markov Decision Processes under Constraints

Advances in Applied Probability ◽

10.1239/aap/1377868541 ◽

2013 ◽

Vol 45 (3) ◽

pp. 837-859 ◽

Cited By ~ 5

Author(s):

François Dufour ◽

A. B. Piunovskiy

Keyword(s):

Markov Decision Processes ◽

Optimal Solution ◽

Decision Processes ◽

Linear Program ◽

Programming Approach ◽

Stationary Policy ◽

Total Cost ◽

Optimal Value ◽

Markov Decision ◽

Expected Total Cost

In this work, we study discrete-time Markov decision processes (MDPs) with constraints when all the objectives have the same form of expected total cost over the infinite time horizon. Our objective is to analyze this problem by using the linear programming approach. Under some technical hypotheses, it is shown that if there exists an optimal solution for the associated linear program then there exists a randomized stationary policy which is optimal for the MDP, and that the optimal value of the linear program coincides with the optimal value of the constrained control problem. A second important result states that the set of randomized stationary policies provides a sufficient set for solving this MDP. It is important to note that, in contrast with the classical results of the literature, we do not assume the MDP to be transient or absorbing. More importantly, we do not impose the cost functions to be nonnegative or to be bounded below. Several examples are presented to illustrate our results.

Download Full-text

Suboptimal policy determination for large-scale Markov decision processes, Part 1: Description and bounds

Journal of Optimization Theory and Applications ◽

10.1007/bf00939287 ◽

1985 ◽

Vol 46 (3) ◽

pp. 319-341 ◽

Cited By ~ 4

Author(s):

C. C. White ◽

J. L. Popyack

Keyword(s):

Markov Decision Processes ◽

Large Scale ◽

Decision Processes ◽

Markov Decision

Download Full-text

Simulation-based policy generation using large-scale Markov decision processes

IEEE Transactions on Systems Man and Cybernetics - Part A Systems and Humans ◽

10.1109/3468.983417 ◽

2001 ◽

Vol 31 (6) ◽

pp. 609-622 ◽

Cited By ~ 2

Author(s):

C.W. Zobel ◽

W.T. Scherer

Keyword(s):

Markov Decision Processes ◽

Large Scale ◽

Decision Processes ◽

Simulation Based ◽

Markov Decision

Download Full-text

The Expected Total Cost Criterion for Markov Decision Processes under Constraints: A Convex Analytic Approach

Advances in Applied Probability ◽

10.1017/s0001867800005875 ◽

2012 ◽

Vol 44 (03) ◽

pp. 774-793 ◽

Cited By ~ 4

Author(s):

François Dufour ◽

M. Horiguchi ◽

A. B. Piunovskiy

Keyword(s):

Optimal Control ◽

Markov Decision Processes ◽

Optimal Solution ◽

Decision Processes ◽

Linear Program ◽

Occupation Measure ◽

Stationary Policy ◽

Total Cost ◽

Markov Decision ◽

Expected Total Cost

This paper deals with discrete-time Markov decision processes (MDPs) under constraints where all the objectives have the same form of expected total cost over the infinite time horizon. The existence of an optimal control policy is discussed by using the convex analytic approach. We work under the assumptions that the state and action spaces are general Borel spaces, and that the model is nonnegative, semicontinuous, and there exists an admissible solution with finite cost for the associated linear program. It is worth noting that, in contrast to the classical results in the literature, our hypotheses do not require the MDP to be transient or absorbing. Our first result ensures the existence of an optimal solution to the linear program given by an occupation measure of the process generated by a randomized stationary policy. Moreover, it is shown that this randomized stationary policy provides an optimal solution to this Markov control problem. As a consequence, these results imply that the set of randomized stationary policies is a sufficient set for this optimal control problem. Finally, our last main result states that all optimal solutions of the linear program coincide on a special set with an optimal occupation measure generated by a randomized stationary policy. Several examples are presented to illustrate some theoretical issues and the possible applications of the results developed in the paper.

Download Full-text

Sensitivity Analysis in Markov Decision Processes with Uncertain Reward Parameters

Journal of Applied Probability ◽

10.1017/s002190020000855x ◽

2011 ◽

Vol 48 (04) ◽

pp. 954-967 ◽

Cited By ~ 1

Author(s):

Chin Hon Tan ◽

Joseph C. Hartman

Keyword(s):

Sensitivity Analysis ◽

Markov Decision Processes ◽

Lot Sizing ◽

Optimal Solution ◽

Decision Processes ◽

Model Parameters ◽

Sequential Decision ◽

Estimation Errors ◽

Bellman Equations ◽

Markov Decision

Sequential decision problems can often be modeled as Markov decision processes. Classical solution approaches assume that the parameters of the model are known. However, model parameters are usually estimated and uncertain in practice. As a result, managers are often interested in how estimation errors affect the optimal solution. In this paper we illustrate how sensitivity analysis can be performed directly for a Markov decision process with uncertain reward parameters using the Bellman equations. In particular, we consider problems involving (i) a single stationary parameter, (ii) multiple stationary parameters, and (iii) multiple nonstationary parameters. We illustrate the applicability of this work through a capacitated stochastic lot-sizing problem.

Download Full-text