scholarly journals Sparse Tree Search Optimality Guarantees in POMDPs with Continuous Observation Spaces

Author(s):  
Michael H. Lim ◽  
Claire Tomlin ◽  
Zachary N. Sunberg

Partially observable Markov decision processes (POMDPs) with continuous state and observation spaces have powerful flexibility for representing real-world decision and control problems but are notoriously difficult to solve. Recent online sampling-based algorithms that use observation likelihood weighting have shown unprecedented effectiveness in domains with continuous observation spaces. However there has been no formal theoretical justification for this technique. This work offers such a justification, proving that a simplified algorithm, partially observable weighted sparse sampling (POWSS), will estimate Q-values accurately with high probability and can be made to perform arbitrarily near the optimal solution by increasing computational power.

2016 ◽  
Vol 55 ◽  
pp. 443-497 ◽  
Author(s):  
Jilles Steeve Dibangoye ◽  
Christopher Amato ◽  
Olivier Buffet ◽  
François Charpillet

Decentralized partially observable Markov decision processes (Dec-POMDPs) provide a general model for decision-making under uncertainty in decentralized settings, but are difficult to solve optimally (NEXP-Complete). As a new way of solving these problems, we introduce the idea of transforming a Dec-POMDP into a continuous-state deterministic MDP with a piecewise-linear and convex value function. This approach makes use of the fact that planning can be accomplished in a centralized offline manner, while execution can still be decentralized. This new Dec-POMDP formulation, which we call an occupancy MDP, allows powerful POMDP and continuous-state MDP methods to be used for the first time. To provide scalability, we refine this approach by combining heuristic search and compact representations that exploit the structure present in multi-agent domains, without losing the ability to converge to an optimal solution. In particular, we introduce a feature-based heuristic search value iteration (FB-HSVI) algorithm that relies on feature-based compact representations, point-based updates and efficient action selection. A theoretical analysis demonstrates that FB-HSVI terminates in finite time with an optimal solution. We include an extensive empirical analysis using well-known benchmarks, thereby demonstrating that our approach provides significant scalability improvements compared to the state of the art.


Author(s):  
Chuande Liu ◽  
Chuang Yu ◽  
Bingtuan Gao ◽  
Syed Awais Ali Shah ◽  
Adriana Tapus

AbstractTelemanipulation in power stations commonly require robots first to open doors and then gain access to a new workspace. However, the opened doors can easily close by disturbances, interrupt the operations, and potentially lead to collision damages. Although existing telemanipulation is a highly efficient master–slave work pattern due to human-in-the-loop control, it is not trivial for a user to specify the optimal measures to guarantee safety. This paper investigates the safety-critical motion planning and control problem to balance robotic safety against manipulation performance during work emergencies. Based on a dynamic workspace released by door-closing, the interactions between the workspace and robot are analyzed using a partially observable Markov decision process, thereby making the balance mechanism executed as belief tree planning. To act the planning, apart from telemanipulation actions, we clarify other three safety-guaranteed actions: on guard, defense and escape for self-protection by estimating collision risk levels to trigger them. Besides, our experiments show that the proposed method is capable of determining multiple solutions for balancing robotic safety and work efficiency during telemanipulation tasks.


Author(s):  
Madison Clark-Turner ◽  
Christopher Amato

The decentralized partially observable Markov decision process (Dec-POMDP) is a powerful model for representing multi-agent problems with decentralized behavior. Unfortunately, current Dec-POMDP solution methods cannot solve problems with continuous observations, which are common in many real-world domains. To that end, we present a framework for representing and generating Dec-POMDP policies that explicitly include continuous observations. We apply our algorithm to a novel tagging problem and an extended version of a common benchmark, where it generates policies that meet or exceed the values of equivalent discretized domains without the need for finding an adequate discretization.


Author(s):  
Hanna Kurniawati

Planning under uncertainty is critical to robotics. The partially observable Markov decision process (POMDP) is a mathematical framework for such planning problems. POMDPs are powerful because of their careful quantification of the nondeterministic effects of actions and the partial observability of the states. But for the same reason, they are notorious for their high computational complexity and have been deemed impractical for robotics. However, over the past two decades, the development of sampling-based approximate solvers has led to tremendous advances in POMDP-solving capabilities. Although these solvers do not generate the optimal solution, they can compute good POMDP solutions that significantly improve the robustness of robotics systems within reasonable computational resources, thereby making POMDPs practical for many realistic robotics problems. This article presents a review of POMDPs, emphasizing computational issues that have hindered their practicality in robotics and ideas in sampling-based solvers that have alleviated such difficulties, together with lessons learned from applying POMDPs to physical robots. Expected final online publication date for the Annual Review of Control, Robotics, and Autonomous Systems, Volume 5 is May 2022. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.


2013 ◽  
Vol 46 ◽  
pp. 449-509 ◽  
Author(s):  
F. A. Oliehoek ◽  
M. T. J. Spaan ◽  
C. Amato ◽  
S. Whiteson

This article presents the state-of-the-art in optimal solution methods for decentralized partially observable Markov decision processes (Dec-POMDPs), which are general models for collaborative multiagent planning under uncertainty. Building off the generalized multiagent A* (GMAA*) algorithm, which reduces the problem to a tree of one-shot collaborative Bayesian games (CBGs), we describe several advances that greatly expand the range of Dec-POMDPs that can be solved optimally. First, we introduce lossless incremental clustering of the CBGs solved by GMAA*, which achieves exponential speedups without sacrificing optimality. Second, we introduce incremental expansion of nodes in the GMAA* search tree, which avoids the need to expand all children, the number of which is in the worst case doubly exponential in the node's depth. This is particularly beneficial when little clustering is possible. In addition, we introduce new hybrid heuristic representations that are more compact and thereby enable the solution of larger Dec-POMDPs. We provide theoretical guarantees that, when a suitable heuristic is used, both incremental clustering and incremental expansion yield algorithms that are both complete and search equivalent. Finally, we present extensive empirical results demonstrating that GMAA*-ICE, an algorithm that synthesizes these advances, can optimally solve Dec-POMDPs of unprecedented size.


Author(s):  
Tuan Dam ◽  
Pascal Klink ◽  
Carlo D'Eramo ◽  
Jan Peters ◽  
Joni Pajarinen

We consider Monte-Carlo Tree Search (MCTS) applied to Markov Decision Processes (MDPs) and Partially Observable MDPs (POMDPs), and the well-known Upper Confidence bound for Trees (UCT) algorithm. In UCT, a tree with nodes (states) and edges (actions) is incrementally built by the expansion of nodes, and the values of nodes are updated through a backup strategy based on the average value of child nodes. However, it has been shown that with enough samples the maximum operator yields more accurate node value estimates than averaging. Instead of settling for one of these value estimates, we go a step further proposing a novel backup strategy which uses the power mean operator, which computes a value between the average and maximum value. We call our new approach Power-UCT, and argue how the use of the power mean operator helps to speed up the learning in MCTS. We theoretically analyze our method providing guarantees of convergence to the optimum. Finally, we empirically demonstrate the effectiveness of our method in well-known MDP and POMDP benchmarks, showing significant improvement in performance and convergence speed w.r.t. state of the art algorithms.


Sign in / Sign up

Export Citation Format

Share Document