Strong representation theorems for bitone sequential decision processes

2003 ◽  
Vol 18 (4) ◽  
pp. 475-489 ◽  
Author(s):  
Yukihiro Maruyama
2007 ◽  
Vol 24 (02) ◽  
pp. 181-202
Author(s):  
YUKIHIRO MARUYAMA

In this paper, we will introduce a new subclass of bitone sequential decision process (bsdp) and give a representation theorem for the subclass called positively/negatively bsdp, shortly, p/n bsdp, that is, necessary and sufficient condition for p/n bsdp to strongly represent a given discrete decision process (ddp).


Author(s):  
Sebastian Junges ◽  
Nils Jansen ◽  
Sanjit A. Seshia

AbstractPartially-Observable Markov Decision Processes (POMDPs) are a well-known stochastic model for sequential decision making under limited information. We consider the EXPTIME-hard problem of synthesising policies that almost-surely reach some goal state without ever visiting a bad state. In particular, we are interested in computing the winning region, that is, the set of system configurations from which a policy exists that satisfies the reachability specification. A direct application of such a winning region is the safe exploration of POMDPs by, for instance, restricting the behavior of a reinforcement learning agent to the region. We present two algorithms: A novel SAT-based iterative approach and a decision-diagram based alternative. The empirical evaluation demonstrates the feasibility and efficacy of the approaches.


Sign in / Sign up

Export Citation Format

Share Document