Partially Observable Markov Decision Processes and Robotics

Hanna Kurniawati

doi:10.1146/annurev-control-042920-092451

Partially Observable Markov Decision Processes and Robotics

Annual Review of Control Robotics and Autonomous Systems ◽

10.1146/annurev-control-042920-092451 ◽

2022 ◽

Vol 5 (1) ◽

Author(s):

Hanna Kurniawati

Keyword(s):

Autonomous Systems ◽

Optimal Solution ◽

Lessons Learned ◽

Annual Review ◽

Publication Date ◽

Mathematical Framework ◽

Planning Under Uncertainty ◽

Markov Decision ◽

Partially Observable Markov ◽

Partially Observable

Planning under uncertainty is critical to robotics. The partially observable Markov decision process (POMDP) is a mathematical framework for such planning problems. POMDPs are powerful because of their careful quantification of the nondeterministic effects of actions and the partial observability of the states. But for the same reason, they are notorious for their high computational complexity and have been deemed impractical for robotics. However, over the past two decades, the development of sampling-based approximate solvers has led to tremendous advances in POMDP-solving capabilities. Although these solvers do not generate the optimal solution, they can compute good POMDP solutions that significantly improve the robustness of robotics systems within reasonable computational resources, thereby making POMDPs practical for many realistic robotics problems. This article presents a review of POMDPs, emphasizing computational issues that have hindered their practicality in robotics and ideas in sampling-based solvers that have alleviated such difficulties, together with lessons learned from applying POMDPs to physical robots. Expected final online publication date for the Annual Review of Control, Robotics, and Autonomous Systems, Volume 5 is May 2022. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.

Download Full-text

Asymptotically Optimal Sampling-Based Motion Planning Methods

Annual Review of Control Robotics and Autonomous Systems ◽

10.1146/annurev-control-061920-093753 ◽

2021 ◽

Vol 4 (1) ◽

Author(s):

Jonathan D. Gammell ◽

Marlin P. Strub

Keyword(s):

Motion Planning ◽

Fundamental Problem ◽

Autonomous Systems ◽

Optimal Solution ◽

Computational Effort ◽

Annual Review ◽

Publication Date ◽

Ongoing Research ◽

Asymptotically Optimal ◽

Planning Methods

Motion planning is a fundamental problem in autonomous robotics that requires finding a path to a specified goal that avoids obstacles and takes into account a robot's limitations and constraints. It is often desirable for this path to also optimize a cost function, such as path length. Formal path-quality guarantees for continuously valued search spaces are an active area of research interest. Recent results have proven that some sampling-based planning methods probabilistically converge toward the optimal solution as computational effort approaches infinity. This article summarizes the assumptions behind these popular asymptotically optimal techniques and provides an introduction to the significant ongoing research on this topic. Expected final online publication date for the Annual Review of Control, Robotics, and Autonomous Systems, Volume 4 is May 2021. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.

Download Full-text

Decision Making in Complex Multiagent Contexts: A Tale of Two Frameworks

AI Magazine ◽

10.1609/aimag.v33i4.2402 ◽

2012 ◽

Vol 33 (4) ◽

pp. 82 ◽

Cited By ~ 6

Author(s):

Prashant J. Doshi

Keyword(s):

Decision Making ◽

Markov Decision Process ◽

Decision Process ◽

Partial Information ◽

Autonomous Systems ◽

Relevant Research ◽

Physical Context ◽

Markov Decision ◽

Partially Observable Markov ◽

Partially Observable

Decision making is a key feature of autonomous systems. It involves choosing optimally between different lines of action in various information contexts that range from perfectly knowing all aspects of the decision problem to having just partial knowledge about it. The physical context often includes other interacting autonomous systems, typically called agents. In this article, I focus on decision making in a multiagent context with partial information about the problem. Relevant research in this complex but realistic setting has converged around two complementary, general frameworks and also introduced myriad specializations on its way. I put the two frameworks, decentralized partially observable Markov decision process (Dec-POMDP) and the interactive partially observable Markov decision process (I-POMDP), in context and review the foundational algorithms for these frameworks, while briefly discussing the advances in their specializations. I conclude by examining the avenues that research pertaining to these frameworks is pursuing.

Download Full-text

Incremental Clustering and Expansion for Faster Optimal Planning in Dec-POMDPs

Journal of Artificial Intelligence Research ◽

10.1613/jair.3804 ◽

2013 ◽

Vol 46 ◽

pp. 449-509 ◽

Cited By ~ 9

Author(s):

F. A. Oliehoek ◽

M. T. J. Spaan ◽

C. Amato ◽

S. Whiteson

Keyword(s):

Optimal Solution ◽

Search Tree ◽

Planning Under Uncertainty ◽

Incremental Clustering ◽

Bayesian Games ◽

Worst Case ◽

Solution Methods ◽

Markov Decision ◽

Expansion Yield ◽

Partially Observable

This article presents the state-of-the-art in optimal solution methods for decentralized partially observable Markov decision processes (Dec-POMDPs), which are general models for collaborative multiagent planning under uncertainty. Building off the generalized multiagent A* (GMAA*) algorithm, which reduces the problem to a tree of one-shot collaborative Bayesian games (CBGs), we describe several advances that greatly expand the range of Dec-POMDPs that can be solved optimally. First, we introduce lossless incremental clustering of the CBGs solved by GMAA*, which achieves exponential speedups without sacrificing optimality. Second, we introduce incremental expansion of nodes in the GMAA* search tree, which avoids the need to expand all children, the number of which is in the worst case doubly exponential in the node's depth. This is particularly beneficial when little clustering is possible. In addition, we introduce new hybrid heuristic representations that are more compact and thereby enable the solution of larger Dec-POMDPs. We provide theoretical guarantees that, when a suitable heuristic is used, both incremental clustering and incremental expansion yield algorithms that are both complete and search equivalent. Finally, we present extensive empirical results demonstrating that GMAA*-ICE, an algorithm that synthesizes these advances, can optimally solve Dec-POMDPs of unprecedented size.

Download Full-text

Speeding Up the Convergence of Value Iteration in Partially Observable Markov Decision Processes

Journal of Artificial Intelligence Research ◽

10.1613/jair.761 ◽

2001 ◽

Vol 14 ◽

pp. 29-51 ◽

Cited By ~ 39

Author(s):

N. L. Zhang ◽

W. Zhang

Keyword(s):

Markov Decision Processes ◽

Decision Processes ◽

Benchmark Problems ◽

Test Problems ◽

Value Iteration ◽

Planning Under Uncertainty ◽

Markov Decision ◽

Partially Observable Markov ◽

Partially Observable ◽

Number Of Iterations

Partially observable Markov decision processes (POMDPs) have recently become popular among many AI researchers because they serve as a natural model for planning under uncertainty. Value iteration is a well-known algorithm for finding optimal policies for POMDPs. It typically takes a large number of iterations to converge. This paper proposes a method for accelerating the convergence of value iteration. The method has been evaluated on an array of benchmark problems and was found to be very effective: It enabled value iteration to converge after only a few iterations on all the test problems.

Download Full-text

Tax Evasion as an Optimal Solution to a Partially Observable Markov Decision Process

Approximation and Optimization - Springer Optimization and Its Applications ◽

10.1007/978-3-030-12767-1_11 ◽

2019 ◽

pp. 219-237

Author(s):

Paraskevi Papadopoulou ◽

Dimitrios Hristu-Varsakelis

Keyword(s):

Markov Decision Process ◽

Tax Evasion ◽

Decision Process ◽

Optimal Solution ◽

Markov Decision ◽

Partially Observable Markov ◽

Partially Observable

Download Full-text

The Use of Robots to Respond to Nuclear Accidents: Applying the Lessons of the Past to the Fukushima Daiichi Nuclear Power Station

Annual Review of Control Robotics and Autonomous Systems ◽

10.1146/annurev-control-071420-100248 ◽

2020 ◽

Vol 4 (1) ◽

Author(s):

Yasuyoshi Yokokohji

Keyword(s):

Nuclear Power Station ◽

Nuclear Power ◽

Autonomous Systems ◽

Lessons Learned ◽

Annual Review ◽

Publication Date ◽

Nuclear Accidents ◽

Power Station ◽

Fukushima Daiichi ◽

Further Development

It has been 10 years since the Fukushima Daiichi Nuclear Power Station (NPS) accident. This article begins by discussing the robots used during the responses to the Three Mile Island and Chernobyl nuclear accidents. It then reviews the robots used to respond to the Fukushima Daiichi NPS accident, while considering the lessons learned from the previous accidents. Such discussions will hopefully lead to the further development of robots for decommissioning the Fukushima Daiichi NPS. Expected final online publication date for the Annual Review of Control, Robotics, and Autonomous Systems, Volume 4 is May 3, 2021. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.

Download Full-text

BOUNDED-PARAMETER PARTIALLY OBSERVABLE MARKOV DECISION PROCESSES: FRAMEWORK AND ALGORITHM

International Journal of Uncertainty Fuzziness and Knowledge-Based Systems ◽

10.1142/s0218488513500396 ◽

2013 ◽

Vol 21 (06) ◽

pp. 821-863 ◽

Cited By ~ 2

Author(s):

YAODONG NI ◽

ZHI-QIANG LIU

Keyword(s):

Markov Decision Processes ◽

Real Life ◽

Decision Processes ◽

Iteration Algorithm ◽

Value Iteration ◽

Planning Under Uncertainty ◽

Markov Decision ◽

Real Life Situation ◽

Partially Observable Markov ◽

Partially Observable

Partially observable Markov decision processes (POMDPs) are powerful for planning under uncertainty. However, it is usually impractical to employ a POMDP with exact parameters to model the real-life situation precisely, due to various reasons such as limited data for learning the model, inability of exact POMDPs to model dynamic situations, etc. In this paper, assuming that the parameters of POMDPs are imprecise but bounded, we formulate the framework of bounded-parameter partially observable Markov decision processes (BPOMDPs). A modified value iteration is proposed as a basic strategy for tackling parameter imprecision in BPOMDPs. In addition, we design the UL-based value iteration algorithm, in which each value backup is based on two sets of vectors called U-set and L-set. We propose four strategies for computing U-set and L-set. We analyze theoretically the computational complexity and the reward loss of the algorithm. The effectiveness and robustness of the algorithm are shown empirically.

Download Full-text

Point-Based Methods for Model Checking in Partially Observable Markov Decision Processes

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i06.6563 ◽

2020 ◽

Vol 34 (06) ◽

pp. 10061-10068

Author(s):

Maxime Bouton ◽

Jana Tumova ◽

Mykel J. Kochenderfer

Keyword(s):

Autonomous Systems ◽

Planning Problem ◽

Value Iteration ◽

Maximum Probability ◽

Markov Decision ◽

Iteration Methods ◽

Partially Observable Markov ◽

Temporal Logic Formula ◽

Partially Observable ◽

State Of The Environment

Autonomous systems are often required to operate in partially observable environments. They must reliably execute a specified objective even with incomplete information about the state of the environment. We propose a methodology to synthesize policies that satisfy a linear temporal logic formula in a partially observable Markov decision process (POMDP). By formulating a planning problem, we show how to use point-based value iteration methods to efficiently approximate the maximum probability of satisfying a desired logical formula and compute the associated belief state policy. We demonstrate that our method scales to large POMDP domains and provides strong bounds on the performance of the resulting policy.

Download Full-text

Probabilistic Model Checking and Autonomy

Annual Review of Control Robotics and Autonomous Systems ◽

10.1146/annurev-control-042820-010947 ◽

2021 ◽

Vol 5 (1) ◽

Author(s):

Marta Kwiatkowska ◽

Gethin Norman ◽

David Parker

Keyword(s):

Model Checking ◽

Probabilistic Model ◽

Stochastic Games ◽

Cooperative Behavior ◽

Autonomous Systems ◽

Probabilistic Model Checking ◽

Annual Review ◽

Publication Date ◽

Modeling And Analysis ◽

Markov Decision

The design and control of autonomous systems that operate in uncertain or adversarial environments can be facilitated by formal modeling and analysis. Probabilistic model checking is a technique to automatically verify, for a given temporal logic specification, that a system model satisfies the specification, as well as to synthesize an optimal strategy for its control. This method has recently been extended to multiagent systems that exhibit competitive or cooperative behavior modeled via stochastic games and synthesis of equilibria strategies. In this article, we provide an overview of probabilistic model checking, focusing on models supported by the PRISM and PRISM-games model checkers. This overview includes fully observable and partially observable Markov decision processes, as well as turn-based and concurrent stochastic games, together with associated probabilistic temporal logics. We demonstrate the applicability of the framework through illustrative examples from autonomous systems. Finally, we highlight research challenges and suggest directions for future work in this area. Expected final online publication date for the Annual Review of Control, Robotics, and Autonomous Systems, Volume 5 is May 2022. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.

Download Full-text

Value-Function Approximations for Partially Observable Markov Decision Processes

Journal of Artificial Intelligence Research ◽

10.1613/jair.678 ◽

2000 ◽

Vol 13 ◽

pp. 33-94 ◽

Cited By ~ 173

Author(s):

M. Hauskrecht

Keyword(s):

Markov Decision Processes ◽

Decision Processes ◽

Mathematical Framework ◽

Approximation Methods ◽

Exact Methods ◽

Markov Decision ◽

Partially Observable Markov ◽

Function Approximations ◽

Approximation Heuristic ◽

Partially Observable

Partially observable Markov decision processes (POMDPs) provide an elegant mathematical framework for modeling complex decision and planning problems in stochastic domains in which states of the system are observable only indirectly, via a set of imperfect or noisy observations. The modeling advantage of POMDPs, however, comes at a price -- exact methods for solving them are computationally very expensive and thus applicable in practice only to very simple problems. We focus on efficient approximation (heuristic) methods that attempt to alleviate the computational problem and trade off accuracy for speed. We have two objectives here. First, we survey various approximation methods, analyze their properties and relations and provide some new insights into their differences. Second, we present a number of new approximation methods and novel refinements of existing techniques. The theoretical results are supported by experiments on a problem from the agent navigation domain.

Download Full-text