Sparse Tree Search Optimality Guarantees in POMDPs with Continuous Observation Spaces

Partially observable Markov decision processes (POMDPs) with continuous state and observation spaces have powerful flexibility for representing real-world decision and control problems but are notoriously difficult to solve. Recent online sampling-based algorithms that use observation likelihood weighting have shown unprecedented effectiveness in domains with continuous observation spaces. However there has been no formal theoretical justification for this technique. This work offers such a justification, proving that a simplified algorithm, partially observable weighted sparse sampling (POWSS), will estimate Q-values accurately with high probability and can be made to perform arbitrarily near the optimal solution by increasing computational power.

Download Full-text

Optimally Solving Dec-POMDPs as Continuous-State MDPs

Journal of Artificial Intelligence Research ◽

10.1613/jair.4623 ◽

2016 ◽

Vol 55 ◽

pp. 443-497 ◽

Cited By ~ 4

Author(s):

Jilles Steeve Dibangoye ◽

Christopher Amato ◽

Olivier Buffet ◽

François Charpillet

Keyword(s):

Heuristic Search ◽

Piecewise Linear ◽

Optimal Solution ◽

Value Iteration ◽

Compact Representations ◽

Continuous State ◽

Markov Decision ◽

Feature Based ◽

Multi Agent ◽

Partially Observable

Decentralized partially observable Markov decision processes (Dec-POMDPs) provide a general model for decision-making under uncertainty in decentralized settings, but are difficult to solve optimally (NEXP-Complete). As a new way of solving these problems, we introduce the idea of transforming a Dec-POMDP into a continuous-state deterministic MDP with a piecewise-linear and convex value function. This approach makes use of the fact that planning can be accomplished in a centralized offline manner, while execution can still be decentralized. This new Dec-POMDP formulation, which we call an occupancy MDP, allows powerful POMDP and continuous-state MDP methods to be used for the first time. To provide scalability, we refine this approach by combining heuristic search and compact representations that exploit the structure present in multi-agent domains, without losing the ability to converge to an optimal solution. In particular, we introduce a feature-based heuristic search value iteration (FB-HSVI) algorithm that relies on feature-based compact representations, point-based updates and efficient action selection. A theoretical analysis demonstrates that FB-HSVI terminates in finite time with an optimal solution. We include an extensive empirical analysis using well-known benchmarks, thereby demonstrating that our approach provides significant scalability improvements compared to the state of the art.

Download Full-text

Towards a balancing safety against performance approach in human–robot co-manipulation for door-closing emergencies

Complex & Intelligent Systems ◽

10.1007/s40747-021-00420-y ◽

2021 ◽

Author(s):

Chuande Liu ◽

Chuang Yu ◽

Bingtuan Gao ◽

Syed Awais Ali Shah ◽

Adriana Tapus

Keyword(s):

Loop Control ◽

Risk Levels ◽

Human In The Loop ◽

Planning And Control ◽

Power Stations ◽

Markov Decision ◽

Partially Observable ◽

And Control ◽

Self Protection ◽

Balance Mechanism

AbstractTelemanipulation in power stations commonly require robots first to open doors and then gain access to a new workspace. However, the opened doors can easily close by disturbances, interrupt the operations, and potentially lead to collision damages. Although existing telemanipulation is a highly efficient master–slave work pattern due to human-in-the-loop control, it is not trivial for a user to specify the optimal measures to guarantee safety. This paper investigates the safety-critical motion planning and control problem to balance robotic safety against manipulation performance during work emergencies. Based on a dynamic workspace released by door-closing, the interactions between the workspace and robot are analyzed using a partially observable Markov decision process, thereby making the balance mechanism executed as belief tree planning. To act the planning, apart from telemanipulation actions, we clarify other three safety-guaranteed actions: on guard, defense and escape for self-protection by estimating collision risk levels to trigger them. Besides, our experiments show that the proposed method is capable of determining multiple solutions for balancing robotic safety and work efficiency during telemanipulation tasks.

Download Full-text

Maintenance planning using continuous-state partially observable Markov decision processes and non-linear action models

Structure and Infrastructure Engineering ◽

10.1080/15732479.2015.1076485 ◽

2015 ◽

Vol 12 (8) ◽

pp. 977-994 ◽

Cited By ~ 15

Author(s):

Roland Schöbi ◽

Eleni N. Chatzi

Keyword(s):

Markov Decision Processes ◽

Decision Processes ◽

Maintenance Planning ◽

Linear Action ◽

Continuous State ◽

Non Linear ◽

Markov Decision ◽

Action Models ◽

Partially Observable Markov ◽

Partially Observable

Download Full-text

COG-DICE: An Algorithm for Solving Continuous-Observation Dec-POMDPs

Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2017/638 ◽

2017 ◽

Author(s):

Madison Clark-Turner ◽

Christopher Amato

Keyword(s):

Markov Decision Process ◽

Real World ◽

Decision Process ◽

Extended Version ◽

Continuous Observation ◽

Solution Methods ◽

Markov Decision ◽

Multi Agent ◽

Partially Observable Markov ◽

Partially Observable

The decentralized partially observable Markov decision process (Dec-POMDP) is a powerful model for representing multi-agent problems with decentralized behavior. Unfortunately, current Dec-POMDP solution methods cannot solve problems with continuous observations, which are common in many real-world domains. To that end, we present a framework for representing and generating Dec-POMDP policies that explicitly include continuous observations. We apply our algorithm to a novel tagging problem and an extended version of a common benchmark, where it generates policies that meet or exceed the values of equivalent discretized domains without the need for finding an adequate discretization.

Download Full-text

Partially Observable Markov Decision Processes and Robotics

Annual Review of Control Robotics and Autonomous Systems ◽

10.1146/annurev-control-042920-092451 ◽

2022 ◽

Vol 5 (1) ◽

Author(s):

Hanna Kurniawati

Keyword(s):

Autonomous Systems ◽

Optimal Solution ◽

Lessons Learned ◽

Annual Review ◽

Publication Date ◽

Mathematical Framework ◽

Planning Under Uncertainty ◽

Markov Decision ◽

Partially Observable Markov ◽

Partially Observable

Planning under uncertainty is critical to robotics. The partially observable Markov decision process (POMDP) is a mathematical framework for such planning problems. POMDPs are powerful because of their careful quantification of the nondeterministic effects of actions and the partial observability of the states. But for the same reason, they are notorious for their high computational complexity and have been deemed impractical for robotics. However, over the past two decades, the development of sampling-based approximate solvers has led to tremendous advances in POMDP-solving capabilities. Although these solvers do not generate the optimal solution, they can compute good POMDP solutions that significantly improve the robustness of robotics systems within reasonable computational resources, thereby making POMDPs practical for many realistic robotics problems. This article presents a review of POMDPs, emphasizing computational issues that have hindered their practicality in robotics and ideas in sampling-based solvers that have alleviated such difficulties, together with lessons learned from applying POMDPs to physical robots. Expected final online publication date for the Annual Review of Control, Robotics, and Autonomous Systems, Volume 5 is May 2022. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.

Download Full-text

Cooperation and coordination between fuzzy reinforcement learning agents in continuous state partially observable Markov decision processes

FUZZ-IEEE'99. 1999 IEEE International Fuzzy Systems. Conference Proceedings (Cat. No.99CH36315) ◽

10.1109/fuzzy.1999.793014 ◽

1999 ◽

Cited By ~ 10

Author(s):

H.R. Berenji ◽

D. Vengerov

Keyword(s):

Reinforcement Learning ◽

Markov Decision Processes ◽

Decision Processes ◽

Learning Agents ◽

Continuous State ◽

Markov Decision ◽

Partially Observable Markov ◽

Partially Observable

Download Full-text

Incremental Clustering and Expansion for Faster Optimal Planning in Dec-POMDPs

Journal of Artificial Intelligence Research ◽

10.1613/jair.3804 ◽

2013 ◽

Vol 46 ◽

pp. 449-509 ◽

Cited By ~ 9

Author(s):

F. A. Oliehoek ◽

M. T. J. Spaan ◽

C. Amato ◽

S. Whiteson

Keyword(s):

Optimal Solution ◽

Search Tree ◽

Planning Under Uncertainty ◽

Incremental Clustering ◽

Bayesian Games ◽

Worst Case ◽

Solution Methods ◽

Markov Decision ◽

Expansion Yield ◽

Partially Observable

This article presents the state-of-the-art in optimal solution methods for decentralized partially observable Markov decision processes (Dec-POMDPs), which are general models for collaborative multiagent planning under uncertainty. Building off the generalized multiagent A* (GMAA*) algorithm, which reduces the problem to a tree of one-shot collaborative Bayesian games (CBGs), we describe several advances that greatly expand the range of Dec-POMDPs that can be solved optimally. First, we introduce lossless incremental clustering of the CBGs solved by GMAA*, which achieves exponential speedups without sacrificing optimality. Second, we introduce incremental expansion of nodes in the GMAA* search tree, which avoids the need to expand all children, the number of which is in the worst case doubly exponential in the node's depth. This is particularly beneficial when little clustering is possible. In addition, we introduce new hybrid heuristic representations that are more compact and thereby enable the solution of larger Dec-POMDPs. We provide theoretical guarantees that, when a suitable heuristic is used, both incremental clustering and incremental expansion yield algorithms that are both complete and search equivalent. Finally, we present extensive empirical results demonstrating that GMAA*-ICE, an algorithm that synthesizes these advances, can optimally solve Dec-POMDPs of unprecedented size.

Download Full-text

Continuous-Observation Partially Observable Semi-Markov Decision Processes for Machine Maintenance

IEEE Transactions on Reliability ◽

10.1109/tr.2016.2626477 ◽

2017 ◽

Vol 66 (1) ◽

pp. 202-218 ◽

Cited By ~ 7

Author(s):

Mimi Zhang ◽

Matthew Revie

Keyword(s):

Markov Decision Processes ◽

Decision Processes ◽

Continuous Observation ◽

Machine Maintenance ◽

Markov Decision ◽

Partially Observable

Download Full-text

Maintenance strategy optimization using a continuous-state partially observable semi-Markov decision process

Microelectronics Reliability ◽

10.1016/j.microrel.2010.09.023 ◽

2011 ◽

Vol 51 (2) ◽

pp. 300-309 ◽

Cited By ~ 9

Author(s):

Yifan Zhou ◽

Lin Ma ◽

Joseph Mathew ◽

Yong Sun ◽

Rodney Wolff

Keyword(s):

Markov Decision Process ◽

Decision Process ◽

Maintenance Strategy ◽

Continuous State ◽

Markov Decision ◽

Partially Observable

Download Full-text

Generalized Mean Estimation in Monte-Carlo Tree Search

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2020/332 ◽

2020 ◽

Author(s):

Tuan Dam ◽

Pascal Klink ◽

Carlo D'Eramo ◽

Jan Peters ◽

Joni Pajarinen

Keyword(s):

Monte Carlo ◽

Tree Search ◽

Power Mean ◽

Monte Carlo Tree Search ◽

Average Value ◽

Mean Estimation ◽

Markov Decision ◽

Speed Up ◽

Upper Confidence Bound ◽

Partially Observable

We consider Monte-Carlo Tree Search (MCTS) applied to Markov Decision Processes (MDPs) and Partially Observable MDPs (POMDPs), and the well-known Upper Confidence bound for Trees (UCT) algorithm. In UCT, a tree with nodes (states) and edges (actions) is incrementally built by the expansion of nodes, and the values of nodes are updated through a backup strategy based on the average value of child nodes. However, it has been shown that with enough samples the maximum operator yields more accurate node value estimates than averaging. Instead of settling for one of these value estimates, we go a step further proposing a novel backup strategy which uses the power mean operator, which computes a value between the average and maximum value. We call our new approach Power-UCT, and argue how the use of the power mean operator helps to speed up the learning in MCTS. We theoretically analyze our method providing guarantees of convergence to the optimum. Finally, we empirically demonstrate the effectiveness of our method in well-known MDP and POMDP benchmarks, showing significant improvement in performance and convergence speed w.r.t. state of the art algorithms.

Download Full-text