trust region methods
Recently Published Documents


TOTAL DOCUMENTS

107
(FIVE YEARS 8)

H-INDEX

20
(FIVE YEARS 1)

2020 ◽  
Vol 76 (3) ◽  
pp. 701-736 ◽  
Author(s):  
Stefania Bellavia ◽  
Nataša Krejić ◽  
Benedetta Morini

2020 ◽  
Vol 34 (04) ◽  
pp. 5668-5675
Author(s):  
Lior Shani ◽  
Yonathan Efroni ◽  
Shie Mannor

Trust region policy optimization (TRPO) is a popular and empirically successful policy search algorithm in Reinforcement Learning (RL) in which a surrogate problem, that restricts consecutive policies to be ‘close’ to one another, is iteratively solved. Nevertheless, TRPO has been considered a heuristic algorithm inspired by Conservative Policy Iteration (CPI). We show that the adaptive scaling mechanism used in TRPO is in fact the natural “RL version” of traditional trust-region methods from convex analysis. We first analyze TRPO in the planning setting, in which we have access to the model and the entire state space. Then, we consider sample-based TRPO and establish Õ(1/√N) convergence rate to the global optimum. Importantly, the adaptive scaling mechanism allows us to analyze TRPO in regularized MDPs for which we prove fast rates of Õ(1/N), much like results in convex optimization. This is the first result in RL of better rates when regularizing the instantaneous cost or reward.


2019 ◽  
Vol 29 (3) ◽  
pp. 1988-2025
Author(s):  
Marie-Ange Dahito ◽  
Dominique Orban

2019 ◽  
Vol 29 (4) ◽  
pp. 3012-3035 ◽  
Author(s):  
Giampaolo Liuzzi ◽  
Stefano Lucidi ◽  
Francesco Rinaldi ◽  
Luis Nunes Vicente

Sensors ◽  
2018 ◽  
Vol 18 (10) ◽  
pp. 3515
Author(s):  
Yuyun Xu ◽  
Xuekun Zhuang ◽  
Guangyu Hu ◽  
Hongqing Pan ◽  
Feng Shuang

An improved hybrid homotopy method is proposed to decouple the multi-input model of tactile sensors. The time-embedded homotopy algorithm is proved to be very suitable for solving the problem. Three tracking factors that control the efficiency of the algorithm are studied: tracking operator, stepsize, and accuracy. Trust region methods are applied to track the zero paths instead of the traditional differential algorithm, and a periodic sampling method is proposed to improve the efficiency of the algorithm. Numerical experiments show that both the robustness and accuracy have received a huge boost after the hybrid algorithm is applied.


Sign in / Sign up

Export Citation Format

Share Document