partially observable markov
Recently Published Documents


TOTAL DOCUMENTS

261
(FIVE YEARS 51)

H-INDEX

22
(FIVE YEARS 2)

Author(s):  
Hanna Kurniawati

Planning under uncertainty is critical to robotics. The partially observable Markov decision process (POMDP) is a mathematical framework for such planning problems. POMDPs are powerful because of their careful quantification of the nondeterministic effects of actions and the partial observability of the states. But for the same reason, they are notorious for their high computational complexity and have been deemed impractical for robotics. However, over the past two decades, the development of sampling-based approximate solvers has led to tremendous advances in POMDP-solving capabilities. Although these solvers do not generate the optimal solution, they can compute good POMDP solutions that significantly improve the robustness of robotics systems within reasonable computational resources, thereby making POMDPs practical for many realistic robotics problems. This article presents a review of POMDPs, emphasizing computational issues that have hindered their practicality in robotics and ideas in sampling-based solvers that have alleviated such difficulties, together with lessons learned from applying POMDPs to physical robots. Expected final online publication date for the Annual Review of Control, Robotics, and Autonomous Systems, Volume 5 is May 2022. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.


2021 ◽  
Vol 72 ◽  
pp. 819-847
Author(s):  
Steven Carr ◽  
Nils Jansen ◽  
Ufuk Topcu

Partially observable Markov decision processes (POMDPs) are models for sequential decision-making under uncertainty and incomplete information. Machine learning methods typically train recurrent neural networks (RNN) as effective representations of POMDP policies that can efficiently process sequential data. However, it is hard to verify whether the POMDP driven by such RNN-based policies satisfies safety constraints, for instance, given by temporal logic specifications. We propose a novel method that combines techniques from machine learning with the field of formal methods: training an RNN-based policy and then automatically extracting a so-called finite-state controller (FSC) from the RNN. Such FSCs offer a convenient way to verify temporal logic constraints. Implemented on a POMDP, they induce a Markov chain, and probabilistic verification methods can efficiently check whether this induced Markov chain satisfies a temporal logic specification. Using such methods, if the Markov chain does not satisfy the specification, a byproduct of verification is diagnostic information about the states in the POMDP that are critical for the specification. The method exploits this diagnostic information to either adjust the complexity of the extracted FSC or improve the policy by performing focused retraining of the RNN. The method synthesizes policies that satisfy temporal logic specifications for POMDPs with up to millions of states, which are three orders of magnitude larger than comparable approaches.


Author(s):  
Iadine Chades ◽  
Luz V. Pascal ◽  
Sam Nicol ◽  
Cameron S. Fletcher ◽  
Jonathan Ferrer Mestres

2021 ◽  
Vol 3 (3) ◽  
pp. 554-581
Author(s):  
Xuanchen Xiang ◽  
Simon Foo

The first part of a two-part series of papers provides a survey on recent advances in Deep Reinforcement Learning (DRL) applications for solving partially observable Markov decision processes (POMDP) problems. Reinforcement Learning (RL) is an approach to simulate the human’s natural learning process, whose key is to let the agent learn by interacting with the stochastic environment. The fact that the agent has limited access to the information of the environment enables AI to be applied efficiently in most fields that require self-learning. Although efficient algorithms are being widely used, it seems essential to have an organized investigation—we can make good comparisons and choose the best structures or algorithms when applying DRL in various applications. In this overview, we introduce Markov Decision Processes (MDP) problems and Reinforcement Learning and applications of DRL for solving POMDP problems in games, robotics, and natural language processing. A follow-up paper will cover applications in transportation, communications and networking, and industries.


2021 ◽  
Author(s):  
Shirin Akbarinasaji

Background: Bug tracking systems receive many bug reports daily. Although the software quality team aims to identify and resolve these bugs, they are never able to fix all of the reported bugs in the issue tracking system before the release deadline. However, postponing the bug fixing may have some consequences. Prioritization of bug reports will help the software manager decide which bugs to fix and which bugs to postpone. Typically, bug reports are prioritized based on the severity, priority, time and effort for fixing, customer pressure, etc. Aim: Previous studies have shown that these factors may not be appropriate for prioritization. Therefore, relying on them to automate bug prioritization might be misleading. In this dissertation, we aim to prioritize bug reports with respect to the consequence of not fixing the bugs in terms of their relative importance in the issue tracking system. Method: In order to measure the relative importance of bugs in the issue tracking system, we propose the construction of a dependency graph based on the reported dependency-blocking information in the issue tracking system. Two metrics, namely depth and degree, are used to measure the relative importance of the bugs. However, there is uncertainty in the dependency graph structure as the dependency information is discovered manually and gradually. Owing to this uncertainty, prioritization of bugs in the descending order of depth and degree may be misleading. To handle the uncertainty, we propose a novel approach of a partially observable Markov decision process (POMDP) and partially observable Monte Carlo planning (POMCP). Result: To check the feasibility of the proposed approach, we analyzed seven years of data from an open source project, Firefox, and a commercial project. We compared the proposed policy with the developer policy, maximum policy, and random policy. Conclusion: The results suggest that software practitioners do not consider the relative importance of bugs in their current practice. The proposed framework can be combined with practitioners’ expertise to prioritize bugs more effectively and take the depth and degree of bugs into account. In practice, the POMDP framework with the POMCP planner can help practitioners sequentially select bugs to minimize the connectivity of the dependency graph.


Sign in / Sign up

Export Citation Format

Share Document