Strong representation theorems for bitone sequential decision processes

In this paper, we will introduce a new subclass of bitone sequential decision process (bsdp) and give a representation theorem for the subclass called positively/negatively bsdp, shortly, p/n bsdp, that is, necessary and sufficient condition for p/n bsdp to strongly represent a given discrete decision process (ddp).

Download Full-text

Dynamic programming is optimal for certain sequential decision processes

Journal of Mathematical Analysis and Applications ◽

10.1016/0022-247x(80)90025-6 ◽

1980 ◽

Vol 73 (1) ◽

pp. 134-137 ◽

Cited By ~ 3

Author(s):

Arnon Rosenthal

Keyword(s):

Dynamic Programming ◽

Decision Processes ◽

Sequential Decision

Download Full-text

Improved dynamic programming algorithms for sequential decision processes with applications to economic dispatches of power systems

1993 (25th) Southeastern Symposium on System Theory ◽

10.1109/ssst.1993.522757 ◽

2002 ◽

Cited By ~ 2

Author(s):

M. Sun

Keyword(s):

Dynamic Programming ◽

Power Systems ◽

Decision Processes ◽

Sequential Decision ◽

Programming Algorithms ◽

Improved Dynamic Programming

Download Full-text

Enforcing Almost-Sure Reachability in POMDPs

Computer Aided Verification - Lecture Notes in Computer Science ◽

10.1007/978-3-030-81688-9_28 ◽

2021 ◽

pp. 602-625

Author(s):

Sebastian Junges ◽

Nils Jansen ◽

Sanjit A. Seshia

Keyword(s):

Markov Decision Processes ◽

Empirical Evaluation ◽

Decision Processes ◽

Limited Information ◽

Sequential Decision ◽

Goal State ◽

Learning Agent ◽

Markov Decision ◽

System Configurations ◽

Partially Observable

AbstractPartially-Observable Markov Decision Processes (POMDPs) are a well-known stochastic model for sequential decision making under limited information. We consider the EXPTIME-hard problem of synthesising policies that almost-surely reach some goal state without ever visiting a bad state. In particular, we are interested in computing the winning region, that is, the set of system configurations from which a policy exists that satisfies the reachability specification. A direct application of such a winning region is the safe exploration of POMDPs by, for instance, restricting the behavior of a reinforcement learning agent to the region. We present two algorithms: A novel SAT-based iterative approach and a decision-diagram based alternative. The empirical evaluation demonstrates the feasibility and efficacy of the approaches.

Download Full-text