scholarly journals Improving sampling accuracy of stochastic gradient MCMC methods via non-uniform subsampling of gradients

2021 ◽  
Vol 0 (0) ◽  
pp. 0
Author(s):  
Ruilin Li ◽  
Xin Wang ◽  
Hongyuan Zha ◽  
Molei Tao

<p style='text-indent:20px;'>Many Markov Chain Monte Carlo (MCMC) methods leverage gradient information of the potential function of target distribution to explore sample space efficiently. However, computing gradients can often be computationally expensive for large scale applications, such as those in contemporary machine learning. Stochastic Gradient (SG-)MCMC methods approximate gradients by stochastic ones, commonly via uniformly subsampled data points, and achieve improved computational efficiency, however at the price of introducing sampling error. We propose a non-uniform subsampling scheme to improve the sampling accuracy. The proposed exponentially weighted stochastic gradient (EWSG) is designed so that a non-uniform-SG-MCMC method mimics the statistical behavior of a batch-gradient-MCMC method, and hence the inaccuracy due to SG approximation is reduced. EWSG differs from classical variance reduction (VR) techniques as it focuses on the entire distribution instead of just the variance; nevertheless, its reduced local variance is also proved. EWSG can also be viewed as an extension of the importance sampling idea, successful for stochastic-gradient-based optimizations, to sampling tasks. In our practical implementation of EWSG, the non-uniform subsampling is performed efficiently via a Metropolis-Hastings chain on the data index, which is coupled to the MCMC algorithm. Numerical experiments are provided, not only to demonstrate EWSG's effectiveness, but also to guide hyperparameter choices, and validate our <i>non-asymptotic global error bound</i> despite of approximations in the implementation. Notably, while statistical accuracy is improved, convergence speed can be comparable to the uniform version, which renders EWSG a practical alternative to VR (but EWSG and VR can be combined too).</p>

Author(s):  
David Morton ◽  
Bruce Letellier ◽  
Jeremy Tejada ◽  
David Johnson ◽  
Zahra Mohaghegh ◽  
...  

Output from a high-order simulation model with random inputs may be difficult to fully evaluate absent an understanding of sensitivity to the inputs. We describe, and apply, a sensitivity analysis procedure to a large-scale computer simulation model of the processes associated with Nuclear Regulatory Commission (NRC) Generic Safety Issue (GSI) 191. Our GSI-191 simulation model has a number of distinguishing features: (i) The model is large in scale in that it has a high-dimensional vector of inputs; (ii) some model inputs are governed by probability distributions; (iii) a key model output is the probability of system failure — a rare event; (iv) the model’s outputs require estimation by Monte Carlo sampling, including the use of variance reduction techniques; (v) it is computationally expensive to obtain precise estimates of the failure probability; (vi) we seek to propagate key uncertainties on model inputs to obtain distributional characteristics of the model’s outputs; and, (vii) the overall model involves a loose coupling between a physics-based stochastic simulation sub-model and a logic-based Probabilistic Risk Assessment (PRA) sub-model via multiple initiating events. Our proposal is guided by the need to have a practical approach to sensitivity analysis for a computer simulation model with these characteristics. We use common random numbers to reduce variability and smooth output analysis; we assess differences between two model configurations; and, we properly characterize both sampling error and the effect of uncertainties on input parameters. We show selected results of studies for sensitivities to parameters used in the South Texas Project Electric Generating Station (STP) GSI-191 risk-informed resolution project.


Author(s):  
Tu Dinh Nguyen ◽  
Trung Le ◽  
Hung Bui ◽  
Dinh Phung

A typical online kernel learning method faces two fundamental issues: the complexity in dealing with a huge number of observed data points (a.k.a the curse of kernelization) and the difficulty in learning kernel parameters, which often assumed to be fixed. Random Fourier feature is a recent and effective approach to address the former by approximating the shift-invariant kernel function via Bocher's theorem, and allows the model to be maintained directly in the random feature space with a fixed dimension, hence the model size remains constant w.r.t. data size. We further introduce in this paper the reparameterized random feature (RRF), a random feature framework for large-scale online kernel learning to address both aforementioned challenges. Our initial intuition comes from the so-called "reparameterization trick" [Kingma et al., 2014] to lift the source of randomness of Fourier components to another space which can be independently sampled, so that stochastic gradient of the kernel parameters can be analytically derived. We develop a well-founded underlying theory for our method, including a general way to reparameterize the kernel, and a new tighter error bound on the approximation quality. This view further inspires a direct application of stochastic gradient descent for updating our model under an online learning setting. We then conducted extensive experiments on several large-scale datasets where we demonstrate that our work achieves state-of-the-art performance in both learning efficacy and efficiency.


2020 ◽  
Vol 34 (04) ◽  
pp. 6372-6379
Author(s):  
Bingzhe Wu ◽  
Chaochao Chen ◽  
Shiwan Zhao ◽  
Cen Chen ◽  
Yuan Yao ◽  
...  

Bayesian deep learning is recently regarded as an intrinsic way to characterize the weight uncertainty of deep neural networks (DNNs). Stochastic Gradient Langevin Dynamics (SGLD) is an effective method to enable Bayesian deep learning on large-scale datasets. Previous theoretical studies have shown various appealing properties of SGLD, ranging from the convergence properties to the generalization bounds. In this paper, we study the properties of SGLD from a novel perspective of membership privacy protection (i.e., preventing the membership attack). The membership attack, which aims to determine whether a specific sample is used for training a given DNN model, has emerged as a common threat against deep learning algorithms. To this end, we build a theoretical framework to analyze the information leakage (w.r.t. the training dataset) of a model trained using SGLD. Based on this framework, we demonstrate that SGLD can prevent the information leakage of the training dataset to a certain extent. Moreover, our theoretical analysis can be naturally extended to other types of Stochastic Gradient Markov Chain Monte Carlo (SG-MCMC) methods. Empirical results on different datasets and models verify our theoretical findings and suggest that the SGLD algorithm can not only reduce the information leakage but also improve the generalization ability of the DNN models in real-world applications.


2012 ◽  
Vol 38 (2) ◽  
pp. 57-69 ◽  
Author(s):  
Abdulghani Hasan ◽  
Petter Pilesjö ◽  
Andreas Persson

Global change and GHG emission modelling are dependent on accurate wetness estimations for predictions of e.g. methane emissions. This study aims to quantify how the slope, drainage area and the TWI vary with the resolution of DEMs for a flat peatland area. Six DEMs with spatial resolutions from 0.5 to 90 m were interpolated with four different search radiuses. The relationship between accuracy of the DEM and the slope was tested. The LiDAR elevation data was divided into two data sets. The number of data points facilitated an evaluation dataset with data points not more than 10 mm away from the cell centre points in the interpolation dataset. The DEM was evaluated using a quantile-quantile test and the normalized median absolute deviation. It showed independence of the resolution when using the same search radius. The accuracy of the estimated elevation for different slopes was tested using the 0.5 meter DEM and it showed a higher deviation from evaluation data for steep areas. The slope estimations between resolutions showed differences with values that exceeded 50%. Drainage areas were tested for three resolutions, with coinciding evaluation points. The model ability to generate drainage area at each resolution was tested by pair wise comparison of three data subsets and showed differences of more than 50% in 25% of the evaluated points. The results show that consideration of DEM resolution is a necessity for the use of slope, drainage area and TWI data in large scale modelling.


2018 ◽  
Vol 41 (1) ◽  
pp. 125-144 ◽  
Author(s):  
Rebecca Campbell ◽  
Rachael Goodman-Williams ◽  
Hannah Feeney ◽  
Giannina Fehler-Cabral

The purpose of this study was to develop triangulation coding methods for a large-scale action research and evaluation project and to examine how practitioners and policy makers interpreted both convergent and divergent data. We created a color-coded system that evaluated the extent of triangulation across methodologies (qualitative and quantitative), data collection methods (observations, interviews, and archival records), and stakeholder groups (five distinct disciplines/organizations). Triangulation was assessed for both specific data points (e.g., a piece of historical/contextual information or qualitative theme) and substantive findings that emanated from further analysis of those data points (e.g., a statistical model or a mechanistic qualitative assertion that links themes). We present five case study examples that explore the complexities of interpreting triangulation data and determining whether data are deemed credible and actionable if not convergent.


Sign in / Sign up

Export Citation Format

Share Document