Verifiable RNN-Based Policies for POMDPs Under Temporal Logic Constraints

Recurrent neural networks (RNNs) have emerged as an effective representation of control policies in sequential decision-making problems. However, a major drawback in the application of RNN-based policies is the difficulty in providing formal guarantees on the satisfaction of behavioral specifications, e.g. safety and/or reachability. By integrating techniques from formal methods and machine learning, we propose an approach to automatically extract a finite-state controller (FSC) from an RNN, which, when composed with a finite-state system model, is amenable to existing formal verification tools. Specifically, we introduce an iterative modification to the so-called quantized bottleneck insertion technique to create an FSC as a randomized policy with memory. For the cases in which the resulting FSC fails to satisfy the specification, verification generates diagnostic information. We utilize this information to either adjust the amount of memory in the extracted FSC or perform focused retraining of the RNN. While generally applicable, we detail the resulting iterative procedure in the context of policy synthesis for partially observable Markov decision processes (POMDPs), which is known to be notoriously hard. The numerical experiments show that the proposed approach outperforms traditional POMDP synthesis methods by 3 orders of magnitude within 2% of optimal benchmark values.

Download Full-text

POMDP-Based Coding of Child–Robot Interaction within a Robot-Assisted ASD Diagnostic Protocol

International Journal of Humanoid Robotics ◽

10.1142/s0219843618500111 ◽

2018 ◽

Vol 15 (02) ◽

pp. 1850011 ◽

Cited By ~ 9

Author(s):

Frano Petric ◽

Damjan Miklić ◽

Zdenko Kovačić

Keyword(s):

Humanoid Robots ◽

Autism Diagnostic Observation Schedule ◽

Autism Spectrum ◽

Diagnostic Process ◽

Sequential Decision ◽

Robot Assisted ◽

Robot Interaction ◽

Fine Grained ◽

Markov Decision ◽

Partially Observable

The existing procedures for autism spectrum disorder (ASD) diagnosis are often time consuming and tiresome both for highly-trained human evaluators and children, which may be alleviated by using humanoid robots in the diagnostic process. Hence, this paper proposes a framework for robot-assisted ASD evaluation based on partially observable Markov decision process (POMDP) modeling, specifically POMDPs with mixed observability (MOMDPs). POMDP is broadly used for modeling optimal sequential decision making tasks under uncertainty. Spurred by the widely accepted autism diagnostic observation schedule (ADOS), we emulate ADOS through four tasks, whose models incorporate observations of multiple social cues such as eye contact, gestures and utterances. Relying only on those observations, the robot provides an assessment of the child’s ASD-relevant functioning level (which is partially observable) within a particular task and provides human evaluators with readable information by partitioning its belief space. Finally, we evaluate the proposed MOMDP task models and demonstrate that chaining the tasks provides fine-grained outcome quantification, which could also increase the appeal of robot-assisted diagnostic protocols in the future.

Download Full-text

A Continuous Internal-State Controller for Partially Observable Markov Decision Processes

Artificial Neural Networks - ICANN 2008 - Lecture Notes in Computer Science ◽

10.1007/978-3-540-87536-9_41 ◽

2008 ◽

pp. 397-406

Author(s):

Yuki Taniguchi ◽

Takeshi Mori ◽

Shin Ishii

Keyword(s):

Markov Decision Processes ◽

Internal State ◽

Decision Processes ◽

State Controller ◽

Markov Decision ◽

Partially Observable Markov ◽

Partially Observable

Download Full-text

Enforcing Almost-Sure Reachability in POMDPs

Computer Aided Verification - Lecture Notes in Computer Science ◽

10.1007/978-3-030-81688-9_28 ◽

2021 ◽

pp. 602-625

Author(s):

Sebastian Junges ◽

Nils Jansen ◽

Sanjit A. Seshia

Keyword(s):

Markov Decision Processes ◽

Empirical Evaluation ◽

Decision Processes ◽

Limited Information ◽

Sequential Decision ◽

Goal State ◽

Learning Agent ◽

Markov Decision ◽

System Configurations ◽

Partially Observable

AbstractPartially-Observable Markov Decision Processes (POMDPs) are a well-known stochastic model for sequential decision making under limited information. We consider the EXPTIME-hard problem of synthesising policies that almost-surely reach some goal state without ever visiting a bad state. In particular, we are interested in computing the winning region, that is, the set of system configurations from which a policy exists that satisfies the reachability specification. A direct application of such a winning region is the safe exploration of POMDPs by, for instance, restricting the behavior of a reinforcement learning agent to the region. We present two algorithms: A novel SAT-based iterative approach and a decision-diagram based alternative. The empirical evaluation demonstrates the feasibility and efficacy of the approaches.

Download Full-text

Dynamic Risk Measures for Finite-State Partially Observable Markov Decision Problems

2015 Proceedings of the Conference on Control and its Applications ◽

10.1137/1.9781611974072.22 ◽

2015 ◽

pp. 153-158

Author(s):

Jingnan Fan ◽

Andrzej Ruszczyński

Keyword(s):

Risk Measures ◽

Decision Problems ◽

Dynamic Risk Measures ◽

Dynamic Risk ◽

Markov Decision Problems ◽

Finite State ◽

Markov Decision ◽

Partially Observable Markov ◽

Partially Observable

Download Full-text

Online Planning Algorithms for POMDPs

Journal of Artificial Intelligence Research ◽

10.1613/jair.2567 ◽

2008 ◽

Vol 32 ◽

pp. 663-704 ◽

Cited By ~ 165

Author(s):

S. Ross ◽

J. Pineau ◽

S. Paquet ◽

B. Chaib-draa

Keyword(s):

Heuristic Search ◽

State Of The Art ◽

Sequential Decision Making ◽

Sequential Decision ◽

Time Step ◽

Advantages And Disadvantages ◽

Online Planning ◽

Markov Decision ◽

Heuristic Search Methods ◽

Partially Observable

Partially Observable Markov Decision Processes (POMDPs) provide a rich framework for sequential decision-making under uncertainty in stochastic domains. However, solving a POMDP is often intractable except for small problems due to their complexity. Here, we focus on online approaches that alleviate the computational complexity by computing good local policies at each decision step during the execution. Online algorithms generally consist of a lookahead search to find the best action to execute at each time step in an environment. Our objectives here are to survey the various existing online POMDP methods, analyze their properties and discuss their advantages and disadvantages; and to thoroughly evaluate these online approaches in different environments under various metrics (return, error bound reduction, lower bound improvement). Our experimental results indicate that state-of-the-art online heuristic search methods can handle large POMDP domains efficiently.

Download Full-text

A Novel Point-Based Incremental Pruning Algorithm for POMDP

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.513-517.1088 ◽

2014 ◽

Vol 513-517 ◽

pp. 1088-1091

Author(s):

Bo Wu ◽

Hong Yan Zheng ◽

Yan Peng Feng

Keyword(s):

Large Scale ◽

Piecewise Linear ◽

Sequential Decision ◽

Pruning Algorithm ◽

Piecewise Linearity ◽

Markov Decision ◽

Heuristic Search Methods ◽

Partially Observable ◽

Belief States ◽

The Value Function

Partially Observable Markov Decision Processes (POMDP) provides piecewise-linear a natural and principled framework for sequential decision-making under uncertainty. However, large-scale POMDP suffers from the exponential growth of the belief points and policy trees space. We present a new point-based incremental pruning algorithm based on the piecewise linearity and convexity of the value function. Instead of reasoning about the whole belief space when pruning the cross-sums in POMDP policy construction, our algorithm uses belief points to perform approximate pruning by generating policy trees, and get the optimal policy in real-time belief states. The empirical results indicate that point-based incremental pruning for heuristic search methods can handle large POMDP domains efficiently.

Download Full-text

Kalman Based Finite State Controller for Partially Observable Domains

International Journal of Advanced Robotic Systems ◽

10.5772/5723 ◽

2006 ◽

Vol 3 (4) ◽

pp. 45 ◽

Cited By ~ 2

Author(s):

Alp Sardag ◽

H. Levent Akin

Keyword(s):

State Controller ◽

Finite State ◽

Partially Observable

Download Full-text

Interaction as an Emergent Property of a Partially Observable Markov Decision Process

10.1093/oso/9780198799603.003.0011 ◽

2018 ◽

Cited By ~ 1

Author(s):

Andrew Howes ◽

Xiuli Chen ◽

Aditya Acharya ◽

Richard L. Lewis

Keyword(s):

Markov Decision Process ◽

Decision Process ◽

Explanatory Power ◽

Human Vision ◽

Human Interaction ◽

Human Cognition ◽

Sequential Decision ◽

Markov Decision ◽

Partially Observable Markov ◽

Partially Observable

In this chapter we explore the potential advantages of modeling the interaction between a human and a computer as a consequence of a Partially Observable Markov Decision Process (POMDP) that models human cognition. POMDPs can be used to model human perceptual mechanisms, such as human vision, as partial (uncertain) observers of a hidden state are possible. In general, POMDPs permit a rigorous definition of interaction as the outcome of a reward maximizing stochastic sequential decision processes. They have been shown to explain interaction between a human and an environment in a range of scenarios, including visual search, interactive search and sense-making. The chapter uses these scenarios to illustrate the explanatory power of POMDPs in HCI. It also shows that POMDPs embrace the embodied, ecological and adaptive nature of human interaction.

Download Full-text

An Introduction to Fully and Partially Observable Markov Decision Processes

Decision Theory Models for Applications in Artificial Intelligence ◽

10.4018/978-1-60960-165-2.ch003 ◽

2012 ◽

pp. 33-62 ◽

Cited By ~ 2

Author(s):

Pascal Poupart

Keyword(s):

Decision Making ◽

Markov Decision Processes ◽

Decision Processes ◽

Sequential Decision Making ◽

Decision Making Under Uncertainty ◽

Sequential Decision ◽

Markov Decision ◽

The Common ◽

Partially Observable Markov ◽

Partially Observable

The goal of this chapter is to provide an introduction to Markov decision processes as a framework for sequential decision making under uncertainty. The aim of this introduction is to provide practitioners with a basic understanding of the common modeling and solution techniques. Hence, we will not delve into the details of the most recent algorithms, but rather focus on the main concepts and the issues that impact deployment in practice. More precisely, we will review fully and partially observable Markov decision processes, describe basic algorithms to find good policies and discuss modeling/computational issues that arise in practice.

Download Full-text

Point-Based Value Iteration for Finite-Horizon POMDPs

Journal of Artificial Intelligence Research ◽

10.1613/jair.1.11324 ◽

2019 ◽

Vol 65 ◽

pp. 307-341 ◽

Cited By ~ 1

Author(s):

Erwin Walraven ◽

Matthijs T. J. Spaan

Keyword(s):

Planning Horizon ◽

Finite Horizon ◽

Iteration Algorithm ◽

General Point ◽

Value Iteration ◽

Sequential Decision ◽

Solution Quality ◽

Markov Decision ◽

Horizon Problems ◽

Partially Observable

Partially Observable Markov Decision Processes (POMDPs) are a popular formalism for sequential decision making in partially observable environments. Since solving POMDPs to optimality is a difficult task, point-based value iteration methods are widely used. These methods compute an approximate POMDP solution, and in some cases they even provide guarantees on the solution quality, but these algorithms have been designed for problems with an infinite planning horizon. In this paper we discuss why state-of-the-art point-based algorithms cannot be easily applied to finite-horizon problems that do not include discounting. Subsequently, we present a general point-based value iteration algorithm for finite-horizon problems which provides solutions with guarantees on solution quality. Furthermore, we introduce two heuristics to reduce the number of belief points considered during execution, which lowers the computational requirements. In experiments we demonstrate that the algorithm is an effective method for solving finite-horizon POMDPs.

Download Full-text