experience reuse
Recently Published Documents


TOTAL DOCUMENTS

16
(FIVE YEARS 3)

H-INDEX

4
(FIVE YEARS 1)

Author(s):  
Mike Gimelfarb ◽  
Scott Sanner ◽  
Chi-Guhn Lee

Learning from Demonstrations (LfD) is a powerful approach for incorporating advice from experts in the form of demonstrations. However, demonstrations often come from multiple sub-optimal experts with conflicting goals, rendering them difficult to incorporate effectively in online settings. To address this, we formulate a quadratic program whose solution yields an adaptive weighting over experts, that can be used to sample experts with relevant goals. In order to compare different source and target task goals safely, we model their uncertainty using normal-inverse-gamma priors, whose posteriors are learned from demonstrations using Bayesian neural networks with a shared encoder. Our resulting approach, which we call Bayesian Experience Reuse, can be applied for LfD in static and dynamic decision-making settings. We demonstrate its effectiveness for minimizing multi-modal functions, and optimizing a high-dimensional supply chain with cost uncertainty, where it is also shown to improve upon the performance of the demonstrators' policies.


Author(s):  
WenJi Zhou ◽  
Yang Yu ◽  
Yingfeng Chen ◽  
Kai Guan ◽  
Tangjie Lv ◽  
...  

Experience reuse is key to sample-efficient reinforcement learning. One of the critical issues is how the experience is represented and stored. Previously, the experience can be stored in the forms of features, individual models, and the average model, each lying at a different granularity. However, new tasks may require experience across multiple granularities. In this paper, we propose the policy residual representation (PRR) network, which can extract and store multiple levels of experience. PRR network is trained on a set of tasks with a multi-level architecture, where a module in each level corresponds to a subset of the tasks. Therefore, the PRR network represents the experience in a spectrum-like way. When training on a new task, PRR can provide different levels of experience for accelerating the learning. We experiment with the PRR network on a set of grid world navigation tasks, locomotion tasks, and fighting tasks in a video game. The results show that the PRR network leads to better reuse of experience and thus outperforms some state-of-the-art approaches.


10.5772/60073 ◽  
2015 ◽  
Vol 12 (4) ◽  
pp. 39
Author(s):  
Tekin Mericli ◽  
Manuela Veloso ◽  
Levent Akin

2014 ◽  
Vol 68 ◽  
pp. 1-3 ◽  
Author(s):  
Eric Bonjour ◽  
Laurent Geneste ◽  
Ralph Bergmann
Keyword(s):  

Author(s):  
Ying Du ◽  
Liming Chen ◽  
Bo Hu ◽  
David Patterson ◽  
Hui Wang
Keyword(s):  

Sign in / Sign up

Export Citation Format

Share Document