scholarly journals Stochastic Gradient Langevin Dynamics with Variance Reduction

Author(s):  
Zhishen Huang ◽  
Stephen Becker
2020 ◽  
Vol 34 (06) ◽  
pp. 10126-10135
Author(s):  
Artyom Gadetsky ◽  
Kirill Struminsky ◽  
Christopher Robinson ◽  
Novi Quadrianto ◽  
Dmitry Vetrov

Learning models with discrete latent variables using stochastic gradient descent remains a challenge due to the high variance of gradient estimates. Modern variance reduction techniques mostly consider categorical distributions and have limited applicability when the number of possible outcomes becomes large. In this work, we consider models with latent permutations and propose control variates for the Plackett-Luce distribution. In particular, the control variates allow us to optimize black-box functions over permutations using stochastic gradient descent. To illustrate the approach, we consider a variety of causal structure learning tasks for continuous and discrete data. We show that our method outperforms competitive relaxation-based optimization methods and is also applicable to non-differentiable score functions.


2018 ◽  
Vol 62 (1) ◽  
Author(s):  
Changyou Chen ◽  
Wenlin Wang ◽  
Yizhe Zhang ◽  
Qinliang Su ◽  
Lawrence Carin

Author(s):  
Derek Driggs ◽  
Matthias J. Ehrhardt ◽  
Carola-Bibiane Schönlieb

Abstract Variance reduction is a crucial tool for improving the slow convergence of stochastic gradient descent. Only a few variance-reduced methods, however, have yet been shown to directly benefit from Nesterov’s acceleration techniques to match the convergence rates of accelerated gradient methods. Such approaches rely on “negative momentum”, a technique for further variance reduction that is generally specific to the SVRG gradient estimator. In this work, we show for the first time that negative momentum is unnecessary for acceleration and develop a universal acceleration framework that allows all popular variance-reduced methods to achieve accelerated convergence rates. The constants appearing in these rates, including their dependence on the number of functions n, scale with the mean-squared-error and bias of the gradient estimator. In a series of numerical experiments, we demonstrate that versions of SAGA, SVRG, SARAH, and SARGE using our framework significantly outperform non-accelerated versions and compare favourably with algorithms using negative momentum.


2018 ◽  
Vol 281 ◽  
pp. 27-36 ◽  
Author(s):  
Yuewei Ming ◽  
Yawei Zhao ◽  
Chengkun Wu ◽  
Kuan Li ◽  
Jianping Yin

2019 ◽  
Vol 108 (8-9) ◽  
pp. 1701-1727
Author(s):  
Zhize Li ◽  
Tianyi Zhang ◽  
Shuyu Cheng ◽  
Jun Zhu ◽  
Jian Li

2021 ◽  
Vol 0 (0) ◽  
pp. 0
Author(s):  
Ruilin Li ◽  
Xin Wang ◽  
Hongyuan Zha ◽  
Molei Tao

<p style='text-indent:20px;'>Many Markov Chain Monte Carlo (MCMC) methods leverage gradient information of the potential function of target distribution to explore sample space efficiently. However, computing gradients can often be computationally expensive for large scale applications, such as those in contemporary machine learning. Stochastic Gradient (SG-)MCMC methods approximate gradients by stochastic ones, commonly via uniformly subsampled data points, and achieve improved computational efficiency, however at the price of introducing sampling error. We propose a non-uniform subsampling scheme to improve the sampling accuracy. The proposed exponentially weighted stochastic gradient (EWSG) is designed so that a non-uniform-SG-MCMC method mimics the statistical behavior of a batch-gradient-MCMC method, and hence the inaccuracy due to SG approximation is reduced. EWSG differs from classical variance reduction (VR) techniques as it focuses on the entire distribution instead of just the variance; nevertheless, its reduced local variance is also proved. EWSG can also be viewed as an extension of the importance sampling idea, successful for stochastic-gradient-based optimizations, to sampling tasks. In our practical implementation of EWSG, the non-uniform subsampling is performed efficiently via a Metropolis-Hastings chain on the data index, which is coupled to the MCMC algorithm. Numerical experiments are provided, not only to demonstrate EWSG's effectiveness, but also to guide hyperparameter choices, and validate our <i>non-asymptotic global error bound</i> despite of approximations in the implementation. Notably, while statistical accuracy is improved, convergence speed can be comparable to the uniform version, which renders EWSG a practical alternative to VR (but EWSG and VR can be combined too).</p>


Sign in / Sign up

Export Citation Format

Share Document