SHIV: Reducing supervisor burden in DAgger using support vectors for efficient learning from demonstrations in high dimensional state spaces

Author(s):  
Michael Laskey ◽  
Sam Staszak ◽  
Wesley Yu-Shu Hsieh ◽  
Jeffrey Mahler ◽  
Florian T. Pokorny ◽  
...  
Author(s):  
BAO WANG ◽  
STAN J. OSHER

Improving the accuracy and robustness of deep neural nets (DNNs) and adapting them to small training data are primary tasks in deep learning (DL) research. In this paper, we replace the output activation function of DNNs, typically the data-agnostic softmax function, with a graph Laplacian-based high-dimensional interpolating function which, in the continuum limit, converges to the solution of a Laplace–Beltrami equation on a high-dimensional manifold. Furthermore, we propose end-to-end training and testing algorithms for this new architecture. The proposed DNN with graph interpolating activation integrates the advantages of both deep learning and manifold learning. Compared to the conventional DNNs with the softmax function as output activation, the new framework demonstrates the following major advantages: First, it is better applicable to data-efficient learning in which we train high capacity DNNs without using a large number of training data. Second, it remarkably improves both natural accuracy on the clean images and robust accuracy on the adversarial images crafted by both white-box and black-box adversarial attacks. Third, it is a natural choice for semi-supervised learning. This paper is a significant extension of our earlier work published in NeurIPS, 2018. For reproducibility, the code is available at https://github.com/BaoWangMath/DNN-DataDependentActivation.


Author(s):  
Mike Gimelfarb ◽  
Scott Sanner ◽  
Chi-Guhn Lee

Learning from Demonstrations (LfD) is a powerful approach for incorporating advice from experts in the form of demonstrations. However, demonstrations often come from multiple sub-optimal experts with conflicting goals, rendering them difficult to incorporate effectively in online settings. To address this, we formulate a quadratic program whose solution yields an adaptive weighting over experts, that can be used to sample experts with relevant goals. In order to compare different source and target task goals safely, we model their uncertainty using normal-inverse-gamma priors, whose posteriors are learned from demonstrations using Bayesian neural networks with a shared encoder. Our resulting approach, which we call Bayesian Experience Reuse, can be applied for LfD in static and dynamic decision-making settings. We demonstrate its effectiveness for minimizing multi-modal functions, and optimizing a high-dimensional supply chain with cost uncertainty, where it is also shown to improve upon the performance of the demonstrators' policies.


Sign in / Sign up

Export Citation Format

Share Document