Massively parallel motion planning algorithms under uncertainty using POMDP

We present new parallel algorithms that solve continuous-state partially observable Markov decision process (POMDP) problems using the GPU (gPOMDP) and a hybrid of the GPU and CPU (hPOMDP). We choose the Monte Carlo value iteration (MCVI) method as our base algorithm and parallelize this algorithm using the multi-level parallel formulation of MCVI. For each parallel level, we propose efficient algorithms to utilize the massive data parallelism available on modern GPUs. Our GPU-based method uses the two workload distribution techniques, compute/data interleaving and workload balancing, in order to obtain the maximum parallel performance at the highest level. Here we also present a CPU–GPU hybrid method that takes advantage of both CPU and GPU parallelism in order to solve highly complex POMDP planning problems. The CPU is responsible for data preparation, while the GPU performs Monte Cacrlo simulations; these operations are performed concurrently using the compute/data overlap technique between the CPU and GPU. To the best of the authors’ knowledge, our algorithms are the first parallel algorithms that efficiently execute POMDP in a massively parallel fashion utilizing the GPU or a hybrid of the GPU and CPU. Our algorithms outperform the existing CPU-based algorithm by a factor of 75–99 based on the chosen benchmark.

Download Full-text

Optimally Solving Dec-POMDPs as Continuous-State MDPs

Journal of Artificial Intelligence Research ◽

10.1613/jair.4623 ◽

2016 ◽

Vol 55 ◽

pp. 443-497 ◽

Cited By ~ 4

Author(s):

Jilles Steeve Dibangoye ◽

Christopher Amato ◽

Olivier Buffet ◽

François Charpillet

Keyword(s):

Heuristic Search ◽

Piecewise Linear ◽

Optimal Solution ◽

Value Iteration ◽

Compact Representations ◽

Continuous State ◽

Markov Decision ◽

Feature Based ◽

Multi Agent ◽

Partially Observable

Decentralized partially observable Markov decision processes (Dec-POMDPs) provide a general model for decision-making under uncertainty in decentralized settings, but are difficult to solve optimally (NEXP-Complete). As a new way of solving these problems, we introduce the idea of transforming a Dec-POMDP into a continuous-state deterministic MDP with a piecewise-linear and convex value function. This approach makes use of the fact that planning can be accomplished in a centralized offline manner, while execution can still be decentralized. This new Dec-POMDP formulation, which we call an occupancy MDP, allows powerful POMDP and continuous-state MDP methods to be used for the first time. To provide scalability, we refine this approach by combining heuristic search and compact representations that exploit the structure present in multi-agent domains, without losing the ability to converge to an optimal solution. In particular, we introduce a feature-based heuristic search value iteration (FB-HSVI) algorithm that relies on feature-based compact representations, point-based updates and efficient action selection. A theoretical analysis demonstrates that FB-HSVI terminates in finite time with an optimal solution. We include an extensive empirical analysis using well-known benchmarks, thereby demonstrating that our approach provides significant scalability improvements compared to the state of the art.

Download Full-text

Maintenance planning using continuous-state partially observable Markov decision processes and non-linear action models

Structure and Infrastructure Engineering ◽

10.1080/15732479.2015.1076485 ◽

2015 ◽

Vol 12 (8) ◽

pp. 977-994 ◽

Cited By ~ 15

Author(s):

Roland Schöbi ◽

Eleni N. Chatzi

Keyword(s):

Markov Decision Processes ◽

Decision Processes ◽

Maintenance Planning ◽

Linear Action ◽

Continuous State ◽

Non Linear ◽

Markov Decision ◽

Action Models ◽

Partially Observable Markov ◽

Partially Observable

Download Full-text

Goal-HSVI: Heuristic Search Value Iteration for Goal POMDPs

Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2018/662 ◽

2018 ◽

Cited By ~ 1

Author(s):

Karel Horák ◽

Branislav Bošanský ◽

Krishnendu Chatterjee

Keyword(s):

Heuristic Search ◽

Infinite Horizon ◽

Decision Processes ◽

Value Iteration ◽

Planning Under Uncertainty ◽

Total Cost ◽

Markov Decision ◽

Standard Models ◽

Target States ◽

Partially Observable

Partially observable Markov decision processes (POMDPs) are the standard models for planning under uncertainty with both finite and infinite horizon. Besides the well-known discounted-sum objective, indefinite-horizon objective (aka Goal-POMDPs) is another classical objective for POMDPs. In this case, given a set of target states and a positive cost for each transition, the optimization objective is to minimize the expected total cost until a target state is reached. In the literature, RTDP-Bel or heuristic search value iteration (HSVI) have been used for solving Goal-POMDPs. Neither of these algorithms has theoretical convergence guarantees, and HSVI may even fail to terminate its trials. We give the following contributions: (1) We discuss the challenges introduced in Goal-POMDPs and illustrate how they prevent the original HSVI from converging. (2) We present a novel algorithm inspired by HSVI, termed Goal-HSVI, and show that our algorithm has convergence guarantees. (3) We show that Goal-HSVI outperforms RTDP-Bel on a set of well-known examples.

Download Full-text

A Framework for Sequential Planning in Multi-Agent Settings

Journal of Artificial Intelligence Research ◽

10.1613/jair.1579 ◽

2005 ◽

Vol 24 ◽

pp. 49-79 ◽

Cited By ~ 93

Author(s):

P. J. Gmytrasiewicz ◽

P. Doshi

Keyword(s):

Traditional Approach ◽

Value Functions ◽

Value Iteration ◽

Markov Decision ◽

Multi Agent ◽

Carry Over ◽

The Cost ◽

Partially Observable ◽

Belief States ◽

Do So

This paper extends the framework of partially observable Markov decision processes (POMDPs) to multi-agent settings by incorporating the notion of agent models into the state space. Agents maintain beliefs over physical states of the environment and over models of other agents, and they use Bayesian updates to maintain their beliefs over time. The solutions map belief states to actions. Models of other agents may include their belief states and are related to agent types considered in games of incomplete information. We express the agents' autonomy by postulating that their models are not directly manipulable or observable by other agents. We show that important properties of POMDPs, such as convergence of value iteration, the rate of convergence, and piece-wise linearity and convexity of the value functions carry over to our framework. Our approach complements a more traditional approach to interactive settings which uses Nash equilibria as a solution paradigm. We seek to avoid some of the drawbacks of equilibria which may be non-unique and do not capture off-equilibrium behaviors. We do so at the cost of having to represent, process and continuously revise models of other agents. Since the agent's beliefs may be arbitrarily nested, the optimal solutions to decision making problems are only asymptotically computable. However, approximate belief updates and approximately optimal plans are computable. We illustrate our framework using a simple application domain, and we show examples of belief updates and value functions.

Download Full-text

Perception-Aware Point-Based Value Iteration for Partially Observable Markov Decision Processes

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2019/329 ◽

2019 ◽

Author(s):

Mahsa Ghasemi ◽

Ufuk Topcu

Keyword(s):

Markov Decision Processes ◽

Active Role ◽

Decision Processes ◽

Iteration Algorithm ◽

Value Iteration ◽

Greedy Strategy ◽

Markov Decision ◽

Partially Observable Markov ◽

Observation Selection ◽

Partially Observable

In conventional partially observable Markov decision processes, the observations that the agent receives originate from fixed known distributions. However, in a variety of real-world scenarios, the agent has an active role in its perception by selecting which observations to receive. We avoid combinatorial expansion of the action space from integration of planning and perception decisions, through a greedy strategy for observation selection that minimizes an information-theoretic measure of the state uncertainty. We develop a novel point-based value iteration algorithm that incorporates this greedy strategy to pick perception actions for each sampled belief point in each iteration. As a result, not only the solver requires less belief points to approximate the reachable subspace of the belief simplex, but it also requires less computation per iteration. Further, we prove that the proposed algorithm achieves a near-optimal guarantee on value function with respect to an optimal perception strategy, and demonstrate its performance empirically.

Download Full-text

Cooperation and coordination between fuzzy reinforcement learning agents in continuous state partially observable Markov decision processes

FUZZ-IEEE'99. 1999 IEEE International Fuzzy Systems. Conference Proceedings (Cat. No.99CH36315) ◽

10.1109/fuzzy.1999.793014 ◽

1999 ◽

Cited By ~ 10

Author(s):

H.R. Berenji ◽

D. Vengerov

Keyword(s):

Reinforcement Learning ◽

Markov Decision Processes ◽

Decision Processes ◽

Learning Agents ◽

Continuous State ◽

Markov Decision ◽

Partially Observable Markov ◽

Partially Observable

Download Full-text

Perseus: Randomized Point-based Value Iteration for POMDPs

Journal of Artificial Intelligence Research ◽

10.1613/jair.1659 ◽

2005 ◽

Vol 24 ◽

pp. 195-220 ◽

Cited By ~ 209

Author(s):

M. T.J. Spaan ◽

N. Vlassis

Keyword(s):

Large Scale ◽

Iteration Algorithm ◽

Value Iteration ◽

Planning Under Uncertainty ◽

Markov Decision ◽

Finite Set ◽

Partially Observable ◽

Set Of Points ◽

Action Spaces ◽

Belief Set

Partially observable Markov decision processes (POMDPs) form an attractive and principled framework for agent planning under uncertainty. Point-based approximate techniques for POMDPs compute a policy based on a finite set of points collected in advance from the agent's belief space. We present a randomized point-based value iteration algorithm called Perseus. The algorithm performs approximate value backup stages, ensuring that in each backup stage the value of each point in the belief set is improved; the key observation is that a single backup may improve the value of many belief points. Contrary to other point-based methods, Perseus backs up only a (randomly selected) subset of points in the belief set, sufficient for improving the value of each belief point in the set. We show how the same idea can be extended to dealing with continuous action spaces. Experimental results show the potential of Perseus in large scale POMDP problems.

Download Full-text

Sparse Tree Search Optimality Guarantees in POMDPs with Continuous Observation Spaces

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2020/572 ◽

2020 ◽

Author(s):

Michael H. Lim ◽

Claire Tomlin ◽

Zachary N. Sunberg

Keyword(s):

Optimal Solution ◽

Tree Search ◽

Continuous Observation ◽

Theoretical Justification ◽

Continuous State ◽

Markov Decision ◽

Simplified Algorithm ◽

Partially Observable ◽

Online Sampling ◽

And Control

Partially observable Markov decision processes (POMDPs) with continuous state and observation spaces have powerful flexibility for representing real-world decision and control problems but are notoriously difficult to solve. Recent online sampling-based algorithms that use observation likelihood weighting have shown unprecedented effectiveness in domains with continuous observation spaces. However there has been no formal theoretical justification for this technique. This work offers such a justification, proving that a simplified algorithm, partially observable weighted sparse sampling (POWSS), will estimate Q-values accurately with high probability and can be made to perform arbitrarily near the optimal solution by increasing computational power.

Download Full-text