scholarly journals Towards Applying Interactive POMDPs to Real-World Adversary Modeling

2021 ◽  
Vol 24 (2) ◽  
pp. 1814-1820
Author(s):  
Brenda Ng ◽  
Carol Meyers ◽  
Kofi Boakye ◽  
John Nitao

We examine the suitability of using decision processes to model real-world systems of intelligent adversaries. Decision processes have long been used to study cooperative multiagent interactions, but their practical applicability to adversarial problems has received minimal study. We address the pros and cons of applying sequential decision-making in this area, using the crime of money laundering as a specific example. Motivated by case studies, we abstract out a model of the money laundering process, using the framework of interactive partially observable Markov decision processes (I-POMDPs). We address why this framework is well suited for modeling adversarial interactions. Particle filtering and value iteration are used to solve the model, with the application of different pruning and look-ahead strategies to assess the tradeoffs between solution quality and algorithmic run time. Our results show that there is a large gap in the level of realism that can currently be achieved by such decision models, largely due to computational demands that limit the size of problems that can be solved. While these results represent solutions to a simplified model of money laundering, they illustrate nonetheless the kinds of agent interactions that cannot be captured by standard approaches such as anomaly detection. This implies that I-POMDP methods may be valuable in the future, when algorithmic capabilities have further evolved.

2019 ◽  
Vol 65 ◽  
pp. 307-341 ◽  
Author(s):  
Erwin Walraven ◽  
Matthijs T. J. Spaan

Partially Observable Markov Decision Processes (POMDPs) are a popular formalism for sequential decision making in partially observable environments. Since solving POMDPs to optimality is a difficult task, point-based value iteration methods are widely used. These methods compute an approximate POMDP solution, and in some cases they even provide guarantees on the solution quality, but these algorithms have been designed for problems with an infinite planning horizon. In this paper we discuss why state-of-the-art point-based algorithms cannot be easily applied to finite-horizon problems that do not include discounting. Subsequently, we present a general point-based value iteration algorithm for finite-horizon problems which provides solutions with guarantees on solution quality. Furthermore, we introduce two heuristics to reduce the number of belief points considered during execution, which lowers the computational requirements. In experiments we demonstrate that the algorithm is an effective method for solving finite-horizon POMDPs.


2015 ◽  
Vol 13 (3) ◽  
pp. 47-57 ◽  
Author(s):  
Sanaa Chafik ◽  
Cherki Daoui

As many real applications need a large amount of states, the classical methods are intractable for solving large Markov Decision Processes. The decomposition technique basing on the topology of each state in the associated graph and the parallelization technique are very useful methods to cope with this problem. In this paper, the authors propose a Modified Value Iteration algorithm, adding the parallelism technique. They test their implementation on artificial data using an Open MP that offers a significant speed-up.


Author(s):  
Karel Horák ◽  
Branislav Bošanský ◽  
Krishnendu Chatterjee

Partially observable Markov decision processes (POMDPs) are the standard models for planning under uncertainty with both finite and infinite horizon. Besides the well-known discounted-sum objective, indefinite-horizon objective (aka Goal-POMDPs) is another classical objective for POMDPs. In this case, given a set of target states and a positive cost for each transition, the optimization objective is to minimize the expected total cost until a target state is reached. In the literature, RTDP-Bel or heuristic search value iteration (HSVI) have been used for solving Goal-POMDPs. Neither of these algorithms has theoretical convergence guarantees, and HSVI may even fail to terminate its trials. We give the following contributions: (1) We discuss the challenges introduced in Goal-POMDPs and illustrate how they prevent the original HSVI from converging. (2) We present a novel algorithm inspired by HSVI, termed Goal-HSVI, and show that our algorithm has convergence guarantees. (3) We show that Goal-HSVI outperforms RTDP-Bel on a set of well-known examples.


Author(s):  
Sebastian Junges ◽  
Nils Jansen ◽  
Sanjit A. Seshia

AbstractPartially-Observable Markov Decision Processes (POMDPs) are a well-known stochastic model for sequential decision making under limited information. We consider the EXPTIME-hard problem of synthesising policies that almost-surely reach some goal state without ever visiting a bad state. In particular, we are interested in computing the winning region, that is, the set of system configurations from which a policy exists that satisfies the reachability specification. A direct application of such a winning region is the safe exploration of POMDPs by, for instance, restricting the behavior of a reinforcement learning agent to the region. We present two algorithms: A novel SAT-based iterative approach and a decision-diagram based alternative. The empirical evaluation demonstrates the feasibility and efficacy of the approaches.


Author(s):  
Mahsa Ghasemi ◽  
Ufuk Topcu

In conventional partially observable Markov decision processes, the observations that the agent receives originate from fixed known distributions. However, in a variety of real-world scenarios, the agent has an active role in its perception by selecting which observations to receive. We avoid combinatorial expansion of the action space from integration of planning and perception decisions, through a greedy strategy for observation selection that minimizes an information-theoretic measure of the state uncertainty. We develop a novel point-based value iteration algorithm that incorporates this greedy strategy to pick perception actions for each sampled belief point in each iteration. As a result, not only the solver requires less belief points to approximate the reachable subspace of the belief simplex, but it also requires less computation per iteration. Further, we prove that the proposed algorithm achieves a near-optimal guarantee on value function with respect to an optimal perception strategy, and demonstrate its performance empirically.


Sign in / Sign up

Export Citation Format

Share Document