Goal-HSVI: Heuristic Search Value Iteration for Goal POMDPs

Partially observable Markov decision processes (POMDPs) are the standard models for planning under uncertainty with both finite and infinite horizon. Besides the well-known discounted-sum objective, indefinite-horizon objective (aka Goal-POMDPs) is another classical objective for POMDPs. In this case, given a set of target states and a positive cost for each transition, the optimization objective is to minimize the expected total cost until a target state is reached. In the literature, RTDP-Bel or heuristic search value iteration (HSVI) have been used for solving Goal-POMDPs. Neither of these algorithms has theoretical convergence guarantees, and HSVI may even fail to terminate its trials. We give the following contributions: (1) We discuss the challenges introduced in Goal-POMDPs and illustrate how they prevent the original HSVI from converging. (2) We present a novel algorithm inspired by HSVI, termed Goal-HSVI, and show that our algorithm has convergence guarantees. (3) We show that Goal-HSVI outperforms RTDP-Bel on a set of well-known examples.

Download Full-text

Speeding Up the Convergence of Value Iteration in Partially Observable Markov Decision Processes

Journal of Artificial Intelligence Research ◽

10.1613/jair.761 ◽

2001 ◽

Vol 14 ◽

pp. 29-51 ◽

Cited By ~ 39

Author(s):

N. L. Zhang ◽

W. Zhang

Keyword(s):

Markov Decision Processes ◽

Decision Processes ◽

Benchmark Problems ◽

Test Problems ◽

Value Iteration ◽

Planning Under Uncertainty ◽

Markov Decision ◽

Partially Observable Markov ◽

Partially Observable ◽

Number Of Iterations

Partially observable Markov decision processes (POMDPs) have recently become popular among many AI researchers because they serve as a natural model for planning under uncertainty. Value iteration is a well-known algorithm for finding optimal policies for POMDPs. It typically takes a large number of iterations to converge. This paper proposes a method for accelerating the convergence of value iteration. The method has been evaluated on an array of benchmark problems and was found to be very effective: It enabled value iteration to converge after only a few iterations on all the test problems.

Download Full-text

BOUNDED-PARAMETER PARTIALLY OBSERVABLE MARKOV DECISION PROCESSES: FRAMEWORK AND ALGORITHM

International Journal of Uncertainty Fuzziness and Knowledge-Based Systems ◽

10.1142/s0218488513500396 ◽

2013 ◽

Vol 21 (06) ◽

pp. 821-863 ◽

Cited By ~ 2

Author(s):

YAODONG NI ◽

ZHI-QIANG LIU

Keyword(s):

Markov Decision Processes ◽

Real Life ◽

Decision Processes ◽

Iteration Algorithm ◽

Value Iteration ◽

Planning Under Uncertainty ◽

Markov Decision ◽

Real Life Situation ◽

Partially Observable Markov ◽

Partially Observable

Partially observable Markov decision processes (POMDPs) are powerful for planning under uncertainty. However, it is usually impractical to employ a POMDP with exact parameters to model the real-life situation precisely, due to various reasons such as limited data for learning the model, inability of exact POMDPs to model dynamic situations, etc. In this paper, assuming that the parameters of POMDPs are imprecise but bounded, we formulate the framework of bounded-parameter partially observable Markov decision processes (BPOMDPs). A modified value iteration is proposed as a basic strategy for tackling parameter imprecision in BPOMDPs. In addition, we design the UL-based value iteration algorithm, in which each value backup is based on two sets of vectors called U-set and L-set. We propose four strategies for computing U-set and L-set. We analyze theoretically the computational complexity and the reward loss of the algorithm. The effectiveness and robustness of the algorithm are shown empirically.

Download Full-text

Functional Reward Markov Decision Processes: Theory and Applications

International Journal of Artificial Intelligence Tools ◽

10.1142/s0218213017600144 ◽

2017 ◽

Vol 26 (03) ◽

pp. 1760014

Author(s):

Paul Weng ◽

Olivier Spanjaard

Keyword(s):

Markov Decision Processes ◽

Infinite Horizon ◽

Standard Form ◽

Sufficient Conditions ◽

Decision Processes ◽

Markov Decision ◽

Standard Models ◽

Reward Functions ◽

Planning Problems ◽

Horizon Case

Markov decision processes (MDP) have become one of the standard models for decisiontheoretic planning problems under uncertainty. In its standard form, rewards are assumed to be numerical additive scalars. In this paper, we propose a generalization of this model allowing rewards to be functional. The value of a history is recursively computed by composing the reward functions. We show that several variants of MDPs presented in the literature can be instantiated in this setting. We then identify sufficient conditions on these reward functions for dynamic programming to be valid. We also discuss the infinite horizon case and the case where a maximum operator does not exist. In order to show the potential of our framework, we conclude the paper by presenting several illustrative examples.

Download Full-text

Perception-Aware Point-Based Value Iteration for Partially Observable Markov Decision Processes

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2019/329 ◽

2019 ◽

Author(s):

Mahsa Ghasemi ◽

Ufuk Topcu

Keyword(s):

Markov Decision Processes ◽

Active Role ◽

Decision Processes ◽

Iteration Algorithm ◽

Value Iteration ◽

Greedy Strategy ◽

Markov Decision ◽

Partially Observable Markov ◽

Observation Selection ◽

Partially Observable

In conventional partially observable Markov decision processes, the observations that the agent receives originate from fixed known distributions. However, in a variety of real-world scenarios, the agent has an active role in its perception by selecting which observations to receive. We avoid combinatorial expansion of the action space from integration of planning and perception decisions, through a greedy strategy for observation selection that minimizes an information-theoretic measure of the state uncertainty. We develop a novel point-based value iteration algorithm that incorporates this greedy strategy to pick perception actions for each sampled belief point in each iteration. As a result, not only the solver requires less belief points to approximate the reachable subspace of the belief simplex, but it also requires less computation per iteration. Further, we prove that the proposed algorithm achieves a near-optimal guarantee on value function with respect to an optimal perception strategy, and demonstrate its performance empirically.

Download Full-text

Partially Observable Total-Cost Markov Decision Processes with Weakly Continuous Transition Probabilities

Mathematics of Operations Research ◽

10.1287/moor.2015.0746 ◽

2016 ◽

Vol 41 (2) ◽

pp. 656-681 ◽

Cited By ~ 21

Author(s):

Eugene A. Feinberg ◽

Pavlo O. Kasyanov ◽

Michael Z. Zgurovsky

Keyword(s):

Markov Decision Processes ◽

Transition Probabilities ◽

Decision Processes ◽

Continuous Transition ◽

Total Cost ◽

Weakly Continuous ◽

Markov Decision ◽

Partially Observable

Download Full-text

On Convergence of Value Iteration for a Class of Total Cost Markov Decision Processes

SIAM Journal on Control and Optimization ◽

10.1137/141000294 ◽

2015 ◽

Vol 53 (4) ◽

pp. 1982-2016 ◽

Cited By ~ 2

Author(s):

Huizhen Yu

Keyword(s):

Markov Decision Processes ◽

Decision Processes ◽

Value Iteration ◽

Total Cost ◽

Markov Decision

Download Full-text

Perseus: Randomized Point-based Value Iteration for POMDPs

Journal of Artificial Intelligence Research ◽

10.1613/jair.1659 ◽

2005 ◽

Vol 24 ◽

pp. 195-220 ◽

Cited By ~ 209

Author(s):

M. T.J. Spaan ◽

N. Vlassis

Keyword(s):

Large Scale ◽

Iteration Algorithm ◽

Value Iteration ◽

Planning Under Uncertainty ◽

Markov Decision ◽

Finite Set ◽

Partially Observable ◽

Set Of Points ◽

Action Spaces ◽

Belief Set

Partially observable Markov decision processes (POMDPs) form an attractive and principled framework for agent planning under uncertainty. Point-based approximate techniques for POMDPs compute a policy based on a finite set of points collected in advance from the agent's belief space. We present a randomized point-based value iteration algorithm called Perseus. The algorithm performs approximate value backup stages, ensuring that in each backup stage the value of each point in the belief set is improved; the key observation is that a single backup may improve the value of many belief points. Contrary to other point-based methods, Perseus backs up only a (randomly selected) subset of points in the belief set, sufficient for improving the value of each belief point in the set. We show how the same idea can be extended to dealing with continuous action spaces. Experimental results show the potential of Perseus in large scale POMDP problems.

Download Full-text

Region-based value iteration for partially observable Markov decision processes

Proceedings of the 23rd international conference on Machine learning - ICML '06 ◽

10.1145/1143844.1143915 ◽

2006 ◽

Cited By ~ 3

Author(s):

Hui Li ◽

Xuejun Liao ◽

Lawrence Carin

Keyword(s):

Markov Decision Processes ◽

Decision Processes ◽

Value Iteration ◽

Markov Decision ◽

Partially Observable Markov ◽

Partially Observable

Download Full-text

OPTIMAL CONTROL FOR PARTIALLY OBSERVABLE MARKOV DECISION PROCESSES OVER AN INFINITE HORIZON

Journal of the Operations Research Society of Japan ◽

10.15807/jorsj.21.1 ◽

1978 ◽

Vol 21 (1) ◽

pp. 1-16 ◽

Cited By ~ 7

Author(s):

Katsushige Sawaki ◽

Akira Ichikawa

Keyword(s):

Optimal Control ◽

Markov Decision Processes ◽

Infinite Horizon ◽

Decision Processes ◽

Markov Decision ◽

Partially Observable Markov ◽

Partially Observable

Download Full-text

PARTIALLY OBSERVABLE MARKOV DECISION PROCESSES AND PERIODIC POLICIES WITH APPLICATIONS

International Journal of Information Technology & Decision Making ◽

10.1142/s0219622011004762 ◽

2011 ◽

Vol 10 (06) ◽

pp. 1175-1197 ◽

Cited By ~ 1

Author(s):

JOHN GOULIONIS ◽

D. STENGOS

Keyword(s):

Markov Decision Processes ◽

Piecewise Linear ◽

Linear Equations ◽

Infinite Horizon ◽

Decision Processes ◽

Value Functions ◽

Discounted Cost ◽

Markov Decision ◽

Partially Observable Markov ◽

Partially Observable

This paper treats the infinite horizon discounted cost control problem for partially observable Markov decision processes. Sondik studied the class of finitely transient policies and showed that their value functions over an infinite time horizon are piecewise linear (p.w.l) and can be computed exactly by solving a system of linear equations. However, the condition for finite transience is stronger than is needed to ensure p.w.l. value functions. In this paper, we introduce alternatively the class of periodic policies whose value functions turn out to be also p.w.l. Moreover, we examine a more general condition than finite transience and periodicity that ensures p.w.l. value functions. We implement these ideas in a replacement problem under Markovian deterioration, investigate for periodic policies and give numerical examples.

Download Full-text