On Thompson Sampling and Asymptotic Optimality
Keyword(s):
We discuss some recent results on Thompson sampling for nonparametric reinforcement learning in countable classes of general stochastic environments. These environments can be non-Markovian, non-ergodic, and partially observable. We show that Thompson sampling learns the environment class in the sense that (1) asymptotically its value converges in mean to the optimal value and (2) given a recoverability assumption regret is sublinear. We conclude with a discussion about optimality in reinforcement learning.
2003 ◽
Vol 19
◽
pp. 11-23
◽
Keyword(s):
2020 ◽
Vol 34
(02)
◽
pp. 2128-2135
2006 ◽
Vol 06
(03)
◽
pp. 413-428
◽
Keyword(s):
2020 ◽
pp. 48-56