constrained markov decision processes
Recently Published Documents


TOTAL DOCUMENTS

46
(FIVE YEARS 10)

H-INDEX

11
(FIVE YEARS 1)

Author(s):  
Aria HasanzadeZonuzy ◽  
Dileep Kalathil ◽  
Srinivas Shakkottai

In many real-world reinforcement learning (RL) problems, in addition to maximizing the objective, the learning agent has to maintain some necessary safety constraints. We formulate the problem of learning a safe policy as an infinite-horizon discounted Constrained Markov Decision Process (CMDP) with an unknown transition probability matrix, where the safety requirements are modeled as constraints on expected cumulative costs. We propose two model-based constrained reinforcement learning (CRL) algorithms for learning a safe policy, namely, (i) GM-CRL algorithm, where the algorithm has access to a generative model, and (ii) UC-CRL algorithm, where the algorithm learns the model using an upper confidence style online exploration method. We characterize the sample complexity of these algorithms, i.e., the the number of samples needed to ensure a desired level of accuracy with high probability, both with respect to objective maximization and constraint satisfaction.


IEEE Access ◽  
2019 ◽  
Vol 7 ◽  
pp. 165007-165017 ◽  
Author(s):  
Yangyang Ge ◽  
Fei Zhu ◽  
Xinghong Ling ◽  
Quan Liu

Sign in / Sign up

Export Citation Format

Share Document