Incremental Clustering and Expansion for Faster Optimal Planning in Dec-POMDPs

This article presents the state-of-the-art in optimal solution methods for decentralized partially observable Markov decision processes (Dec-POMDPs), which are general models for collaborative multiagent planning under uncertainty. Building off the generalized multiagent A* (GMAA*) algorithm, which reduces the problem to a tree of one-shot collaborative Bayesian games (CBGs), we describe several advances that greatly expand the range of Dec-POMDPs that can be solved optimally. First, we introduce lossless incremental clustering of the CBGs solved by GMAA*, which achieves exponential speedups without sacrificing optimality. Second, we introduce incremental expansion of nodes in the GMAA* search tree, which avoids the need to expand all children, the number of which is in the worst case doubly exponential in the node's depth. This is particularly beneficial when little clustering is possible. In addition, we introduce new hybrid heuristic representations that are more compact and thereby enable the solution of larger Dec-POMDPs. We provide theoretical guarantees that, when a suitable heuristic is used, both incremental clustering and incremental expansion yield algorithms that are both complete and search equivalent. Finally, we present extensive empirical results demonstrating that GMAA*-ICE, an algorithm that synthesizes these advances, can optimally solve Dec-POMDPs of unprecedented size.

Download Full-text

Partially Observable Markov Decision Processes and Robotics

Annual Review of Control Robotics and Autonomous Systems ◽

10.1146/annurev-control-042920-092451 ◽

2022 ◽

Vol 5 (1) ◽

Author(s):

Hanna Kurniawati

Keyword(s):

Autonomous Systems ◽

Optimal Solution ◽

Lessons Learned ◽

Annual Review ◽

Publication Date ◽

Mathematical Framework ◽

Planning Under Uncertainty ◽

Markov Decision ◽

Partially Observable Markov ◽

Partially Observable

Planning under uncertainty is critical to robotics. The partially observable Markov decision process (POMDP) is a mathematical framework for such planning problems. POMDPs are powerful because of their careful quantification of the nondeterministic effects of actions and the partial observability of the states. But for the same reason, they are notorious for their high computational complexity and have been deemed impractical for robotics. However, over the past two decades, the development of sampling-based approximate solvers has led to tremendous advances in POMDP-solving capabilities. Although these solvers do not generate the optimal solution, they can compute good POMDP solutions that significantly improve the robustness of robotics systems within reasonable computational resources, thereby making POMDPs practical for many realistic robotics problems. This article presents a review of POMDPs, emphasizing computational issues that have hindered their practicality in robotics and ideas in sampling-based solvers that have alleviated such difficulties, together with lessons learned from applying POMDPs to physical robots. Expected final online publication date for the Annual Review of Control, Robotics, and Autonomous Systems, Volume 5 is May 2022. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.

Download Full-text

COG-DICE: An Algorithm for Solving Continuous-Observation Dec-POMDPs

Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2017/638 ◽

2017 ◽

Author(s):

Madison Clark-Turner ◽

Christopher Amato

Keyword(s):

Markov Decision Process ◽

Real World ◽

Decision Process ◽

Extended Version ◽

Continuous Observation ◽

Solution Methods ◽

Markov Decision ◽

Multi Agent ◽

Partially Observable Markov ◽

Partially Observable

The decentralized partially observable Markov decision process (Dec-POMDP) is a powerful model for representing multi-agent problems with decentralized behavior. Unfortunately, current Dec-POMDP solution methods cannot solve problems with continuous observations, which are common in many real-world domains. To that end, we present a framework for representing and generating Dec-POMDP policies that explicitly include continuous observations. We apply our algorithm to a novel tagging problem and an extended version of a common benchmark, where it generates policies that meet or exceed the values of equivalent discretized domains without the need for finding an adequate discretization.

Download Full-text

Goal-HSVI: Heuristic Search Value Iteration for Goal POMDPs

Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2018/662 ◽

2018 ◽

Cited By ~ 1

Author(s):

Karel Horák ◽

Branislav Bošanský ◽

Krishnendu Chatterjee

Keyword(s):

Heuristic Search ◽

Infinite Horizon ◽

Decision Processes ◽

Value Iteration ◽

Planning Under Uncertainty ◽

Total Cost ◽

Markov Decision ◽

Standard Models ◽

Target States ◽

Partially Observable

Partially observable Markov decision processes (POMDPs) are the standard models for planning under uncertainty with both finite and infinite horizon. Besides the well-known discounted-sum objective, indefinite-horizon objective (aka Goal-POMDPs) is another classical objective for POMDPs. In this case, given a set of target states and a positive cost for each transition, the optimization objective is to minimize the expected total cost until a target state is reached. In the literature, RTDP-Bel or heuristic search value iteration (HSVI) have been used for solving Goal-POMDPs. Neither of these algorithms has theoretical convergence guarantees, and HSVI may even fail to terminate its trials. We give the following contributions: (1) We discuss the challenges introduced in Goal-POMDPs and illustrate how they prevent the original HSVI from converging. (2) We present a novel algorithm inspired by HSVI, termed Goal-HSVI, and show that our algorithm has convergence guarantees. (3) We show that Goal-HSVI outperforms RTDP-Bel on a set of well-known examples.

Download Full-text

Importance sampling for online planning under uncertainty

The International Journal of Robotics Research ◽

10.1177/0278364918780322 ◽

2018 ◽

Vol 38 (2-3) ◽

pp. 162-181 ◽

Cited By ~ 2

Author(s):

Yuanfu Luo ◽

Haoyu Bai ◽

David Hsu ◽

Wee Sun Lee

Keyword(s):

Importance Sampling ◽

Autonomous Vehicles ◽

State Of The Art ◽

Monte Carlo Sampling ◽

Planning Under Uncertainty ◽

Online Planning ◽

Markov Decision ◽

Partially Observable ◽

Robotic Tasks ◽

General Method

The partially observable Markov decision process (POMDP) provides a principled general framework for robot planning under uncertainty. Leveraging the idea of Monte Carlo sampling, recent POMDP planning algorithms have scaled up to various challenging robotic tasks, including, real-time online planning for autonomous vehicles. To further improve online planning performance, this paper presents IS-DESPOT, which introduces importance sampling to DESPOT, a state-of-the-art sampling-based POMDP algorithm for planning under uncertainty. Importance sampling improves DESPOT’s performance when there are critical, but rare events, which are difficult to sample. We prove that IS-DESPOT retains the theoretical guarantee of DESPOT. We demonstrate empirically that importance sampling significantly improves the performance of online POMDP planning for suitable tasks. We also present a general method for learning the importance sampling distribution.

Download Full-text

Point-Based Monte Carto Online Planning in POMDPs

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.846-847.1388 ◽

2013 ◽

Vol 846-847 ◽

pp. 1388-1391

Author(s):

Bo Wu ◽

Yan Peng Feng ◽

Hong Yan Zheng

Keyword(s):

Mean Squared Error ◽

Search Algorithm ◽

Search Tree ◽

Real Time System ◽

Monte Carlo Tree Search ◽

Online Planning ◽

Markov Decision ◽

Partially Observable ◽

Belief States ◽

Tree Search Algorithm

The online planning and learning in partially observable Markov decision processes are often intractable because belief states space has two curses: dimensionality and history. In order to address this problem, this paper proposes a point-based Monte Carto online planning approach in POMDPs. This approach involves performing value backup at specific reachable belief points, rather than over the entire belief simplex, to speed up computation processes. Then Monte Carlo tree search algorithm is exploited to share the value of actions across each subtree of the search tree so as to minimise the mean squared error. The experimental results show that the proposed algorithm is effective in real-time system.

Download Full-text

Perseus: Randomized Point-based Value Iteration for POMDPs

Journal of Artificial Intelligence Research ◽

10.1613/jair.1659 ◽

2005 ◽

Vol 24 ◽

pp. 195-220 ◽

Cited By ~ 209

Author(s):

M. T.J. Spaan ◽

N. Vlassis

Keyword(s):

Large Scale ◽

Iteration Algorithm ◽

Value Iteration ◽

Planning Under Uncertainty ◽

Markov Decision ◽

Finite Set ◽

Partially Observable ◽

Set Of Points ◽

Action Spaces ◽

Belief Set

Partially observable Markov decision processes (POMDPs) form an attractive and principled framework for agent planning under uncertainty. Point-based approximate techniques for POMDPs compute a policy based on a finite set of points collected in advance from the agent's belief space. We present a randomized point-based value iteration algorithm called Perseus. The algorithm performs approximate value backup stages, ensuring that in each backup stage the value of each point in the belief set is improved; the key observation is that a single backup may improve the value of many belief points. Contrary to other point-based methods, Perseus backs up only a (randomly selected) subset of points in the belief set, sufficient for improving the value of each belief point in the set. We show how the same idea can be extended to dealing with continuous action spaces. Experimental results show the potential of Perseus in large scale POMDP problems.

Download Full-text

Sparse Tree Search Optimality Guarantees in POMDPs with Continuous Observation Spaces

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2020/572 ◽

2020 ◽

Author(s):

Michael H. Lim ◽

Claire Tomlin ◽

Zachary N. Sunberg

Keyword(s):

Optimal Solution ◽

Tree Search ◽

Continuous Observation ◽

Theoretical Justification ◽

Continuous State ◽

Markov Decision ◽

Simplified Algorithm ◽

Partially Observable ◽

Online Sampling ◽

And Control

Partially observable Markov decision processes (POMDPs) with continuous state and observation spaces have powerful flexibility for representing real-world decision and control problems but are notoriously difficult to solve. Recent online sampling-based algorithms that use observation likelihood weighting have shown unprecedented effectiveness in domains with continuous observation spaces. However there has been no formal theoretical justification for this technique. This work offers such a justification, proving that a simplified algorithm, partially observable weighted sparse sampling (POWSS), will estimate Q-values accurately with high probability and can be made to perform arbitrarily near the optimal solution by increasing computational power.

Download Full-text

Optimally Solving Dec-POMDPs as Continuous-State MDPs

Journal of Artificial Intelligence Research ◽

10.1613/jair.4623 ◽

2016 ◽

Vol 55 ◽

pp. 443-497 ◽

Cited By ~ 4

Author(s):

Jilles Steeve Dibangoye ◽

Christopher Amato ◽

Olivier Buffet ◽

François Charpillet

Keyword(s):

Heuristic Search ◽

Piecewise Linear ◽

Optimal Solution ◽

Value Iteration ◽

Compact Representations ◽

Continuous State ◽

Markov Decision ◽

Feature Based ◽

Multi Agent ◽

Partially Observable

Decentralized partially observable Markov decision processes (Dec-POMDPs) provide a general model for decision-making under uncertainty in decentralized settings, but are difficult to solve optimally (NEXP-Complete). As a new way of solving these problems, we introduce the idea of transforming a Dec-POMDP into a continuous-state deterministic MDP with a piecewise-linear and convex value function. This approach makes use of the fact that planning can be accomplished in a centralized offline manner, while execution can still be decentralized. This new Dec-POMDP formulation, which we call an occupancy MDP, allows powerful POMDP and continuous-state MDP methods to be used for the first time. To provide scalability, we refine this approach by combining heuristic search and compact representations that exploit the structure present in multi-agent domains, without losing the ability to converge to an optimal solution. In particular, we introduce a feature-based heuristic search value iteration (FB-HSVI) algorithm that relies on feature-based compact representations, point-based updates and efficient action selection. A theoretical analysis demonstrates that FB-HSVI terminates in finite time with an optimal solution. We include an extensive empirical analysis using well-known benchmarks, thereby demonstrating that our approach provides significant scalability improvements compared to the state of the art.

Download Full-text

Speeding Up the Convergence of Value Iteration in Partially Observable Markov Decision Processes

Journal of Artificial Intelligence Research ◽

10.1613/jair.761 ◽

2001 ◽

Vol 14 ◽

pp. 29-51 ◽

Cited By ~ 39

Author(s):

N. L. Zhang ◽

W. Zhang

Keyword(s):

Markov Decision Processes ◽

Decision Processes ◽

Benchmark Problems ◽

Test Problems ◽

Value Iteration ◽

Planning Under Uncertainty ◽

Markov Decision ◽

Partially Observable Markov ◽

Partially Observable ◽

Number Of Iterations

Partially observable Markov decision processes (POMDPs) have recently become popular among many AI researchers because they serve as a natural model for planning under uncertainty. Value iteration is a well-known algorithm for finding optimal policies for POMDPs. It typically takes a large number of iterations to converge. This paper proposes a method for accelerating the convergence of value iteration. The method has been evaluated on an array of benchmark problems and was found to be very effective: It enabled value iteration to converge after only a few iterations on all the test problems.

Download Full-text

DESPOT: Online POMDP Planning with Regularization

Journal of Artificial Intelligence Research ◽

10.1613/jair.5328 ◽

2017 ◽

Vol 58 ◽

pp. 231-266 ◽

Cited By ~ 27

Author(s):

Nan Ye ◽

Adhiraj Somani ◽

David Hsu ◽

Wee Sun Lee

Keyword(s):

Autonomous Driving ◽

Vehicle Control ◽

Planning Under Uncertainty ◽

Driving System ◽

Online Planning ◽

Markov Decision ◽

Planning Algorithm ◽

Regret Bound ◽

Partially Observable ◽

Autonomous Driving System

The partially observable Markov decision process (POMDP) provides a principled general framework for planning under uncertainty, but solving POMDPs optimally is computationally intractable, due to the "curse of dimensionality" and the "curse of history". To overcome these challenges, we introduce the Determinized Sparse Partially Observable Tree (DESPOT), a sparse approximation of the standard belief tree, for online planning under uncertainty. A DESPOT focuses online planning on a set of randomly sampled scenarios and compactly captures the "execution" of all policies under these scenarios. We show that the best policy obtained from a DESPOT is near-optimal, with a regret bound that depends on the representation size of the optimal policy. Leveraging this result, we give an anytime online planning algorithm, which searches a DESPOT for a policy that optimizes a regularized objective function. Regularization balances the estimated value of a policy under the sampled scenarios and the policy size, thus avoiding overfitting. The algorithm demonstrates strong experimental results, compared with some of the best online POMDP algorithms available. It has also been incorporated into an autonomous driving system for real-time vehicle control. The source code for the algorithm is available online.

Download Full-text