Policy Learning for Time-Bounded Reachability in Continuous-Time Markov Decision Processes via Doubly-Stochastic Gradient Ascent
Keyword(s):
1984 ◽
Vol 31
(2)
◽
pp. 265-274
◽