scholarly journals Direct Policy Search Reinforcement Learning Based on Variational Bayesian Inference

Author(s):  
Nobuhiko Yamaguchi ◽  

Direct policy search is a promising reinforcement learning framework particularly for controlling continuous, high-dimensional systems. Peters et al. proposed reward-weighted regression (RWR) as a direct policy search. The RWR algorithm estimates the policy parameter based on the expectation-maximization (EM) algorithm and is therefore prone to overfitting. In this study, we focus on variational Bayesian inference to avoid overfitting and propose direct policy search reinforcement learning based on variational Bayesian inference (VBRL). The performance of the proposed VBRL is assessed in several experiments involving a mountain car and a ball batting task. These experiments demonstrate that VBRL yields a higher average return and outperforms the RWR.

2011 ◽  
Vol 23 (11) ◽  
pp. 2798-2832 ◽  
Author(s):  
Hirotaka Hachiya ◽  
Jan Peters ◽  
Masashi Sugiyama

Direct policy search is a promising reinforcement learning framework, in particular for controlling continuous, high-dimensional systems. Policy search often requires a large number of samples for obtaining a stable policy update estimator, and this is prohibitive when the sampling cost is expensive. In this letter, we extend an expectation-maximization-based policy search method so that previously collected samples can be efficiently reused. The usefulness of the proposed method, reward-weighted regression with sample reuse (R[Formula: see text]), is demonstrated through robot learning experiments. (This letter is an extended version of our earlier conference paper: Hachiya, Peters, & Sugiyama, 2009 .)


2021 ◽  
Vol 150 (4) ◽  
pp. A154-A154
Author(s):  
Yongsung Park ◽  
Florian Meyer ◽  
Peter Gerstoft

Sign in / Sign up

Export Citation Format

Share Document