The Stochastic Delta Rule: Faster and More Accurate Deep Learning Through Adaptive Weight Noise

2020 ◽  
Vol 32 (5) ◽  
pp. 1018-1032 ◽  
Author(s):  
Noah Frazier-Logue ◽  
Stephen José Hanson

Multilayer neural networks have led to remarkable performance on many kinds of benchmark tasks in text, speech, and image processing. Nonlinear parameter estimation in hierarchical models is known to be subject to overfitting and misspecification. One approach to these estimation and related problems (e.g., saddle points, colinearity, feature discovery) is called Dropout. The Dropout algorithm removes hidden units according to a binomial random variable with probability [Formula: see text] prior to each update, creating random “shocks” to the network that are averaged over updates (thus creating weight sharing). In this letter, we reestablish an older parameter search method and show that Dropout is a special case of this more general model, stochastic delta rule (SDR), published originally in 1990. Unlike Dropout, SDR redefines each weight in the network as a random variable with mean [Formula: see text] and standard deviation [Formula: see text]. Each weight random variable is sampled on each forward activation, consequently creating an exponential number of potential networks with shared weights (accumulated in the mean values). Both parameters are updated according to prediction error, thus resulting in weight noise injections that reflect a local history of prediction error and local model averaging. SDR therefore implements a more sensitive local gradient-dependent simulated annealing per weight converging in the limit to a Bayes optimal network. We run tests on standard benchmarks (CIFAR and ImageNet) using a modified version of DenseNet and show that SDR outperforms standard Dropout in top-5 validation error by approximately 13% with DenseNet-BC 121 on ImageNet and find various validation error improvements in smaller networks. We also show that SDR reaches the same accuracy that Dropout attains in 100 epochs in as few as 40 epochs, as well as improvements in training error by as much as 80%.

Author(s):  
Maria Brigida Ferraro

A linear regression model for imprecise random variables is considered. The imprecision of a random element has been formalized by means of the LR fuzzy random variable, characterized by a center, a left and a right spread. In order to avoid the non-negativity conditions the spreads are transformed by means of two invertible functions. To analyze the generalization performance of that model an appropriate prediction error is introduced, and it is estimated by means of a bootstrap procedure. Furthermore, since the choice of response transformations could affect the inferential procedures, a computational proposal is introduced for choosing from a family of parametric link functions, the Box-Cox family, the transformation parameters that minimize the prediction error of the model.


2005 ◽  
Vol 2005 (5) ◽  
pp. 717-728 ◽  
Author(s):  
K. Neammanee

LetX1,X2,…,Xnbe independent Bernoulli random variables withP(Xj=1)=1−P(Xj=0)=pjand letSn:=X1+X2+⋯+Xn.Snis called a Poisson binomial random variable and it is well known that the distribution of a Poisson binomial random variable can be approximated by the standard normal distribution. In this paper, we use Taylor's formula to improve the approximation by adding some correction terms. Our result is better than before and is of order1/nin the casep1=p2=⋯=pn.


1998 ◽  
Vol 35 (3) ◽  
pp. 589-599
Author(s):  
William L. Cooper

Given a sequence of random variables (rewards), the Haviv–Puterman differential equation relates the expected infinite-horizon λ-discounted reward and the expected total reward up to a random time that is determined by an independent negative binomial random variable with parameters 2 and λ. This paper provides an interpretation of this proven, but previously unexplained, result. Furthermore, the interpretation is formalized into a new proof, which then yields new results for the general case where the rewards are accumulated up to a time determined by an independent negative binomial random variable with parameters k and λ.


Author(s):  
Arkady Bur’yanovaty ◽  
Valery Varentsov

Objective: To develop the method of calculating the electrical loads on the traction circuit components, by using data of train traffic schedule, considering the train weight differences and real-life modes of its traction under the variety of track profile, speed limitations and other parameters of traction and external electric power supply systems. Methods: To determine the mean values and dispersion of current for power supply line of traction circuit, based on the experimental runs and data from movement parameters recorders of locomotives, by type and weight of trains, the methods of probability theory and mathematic statistics were used. The current, consumed by the trains, is considered to be a random variable. Current loads of the trains is put into fixed weight, that allows to obtain statistical expectation and correlation function. Results: The ratios obtained allow to estimate general mean and effective values by sampled values within confidence limits. The paper provides the conditions, under with the current consumption functions have ergodic property cap. It also states the recommendations for train rating, based on its weight. The statistical expectations and mean square deviations of currents of traction power supply components. Practical importance: The load estimations are adjusted, that allows to determine the parameters of power circuits of electric railways in a more substantiated way. Obtained equations allow to determine the possibility of current loads bigger than given value, that should be considered while choosing the capacity of transformers, rectifiers, compensating devices, inverters, power supply and earth leads. Thus, it is possible to reach the required reliability of equipment operation and cost effectiveness of decisions.


1997 ◽  
Vol 34 (03) ◽  
pp. 785-789 ◽  
Author(s):  
Chunsheng Ma

A necessary and sufficient condition is obtained for a Poisson binomial random variable to be stochastically larger (or smaller) than a binomial random variable. It is then used to deal with the stochastic comparisons of order statistics from heterogeneous populations with those from a homogeneous population. The result has obvious applications in the stochastic comparisons of lifetimes of k-out-of-n systems having independent components.


2020 ◽  
Vol 8 (10) ◽  
pp. 738
Author(s):  
Po Cheng ◽  
Jiang Tao Yi ◽  
Fei Liu ◽  
Jun Jie Dong

This paper conducts coupled Eulerian–Lagrangian (CEL) analysis to characterize the model uncertainty of using the cylindrical shear method (CSM) to predict the pullout capacity of helical anchors in cohesive soils. The model factor M is adopted to represent the model uncertainty, which is equal to the value of measured capacity divided by estimated solution. The model factor Mcel can be considered to be a random variable with a lognormal distribution, and its mean value and coefficient of variation (COV) are 1.02 and 0.1, respectively. Correction factor η is introduced when comparing CSM and CEL, which is found to be influenced by input parameters. The dependence on input parameters is removed by performing regression analysis and the regression equation f is obtained. Substituting the regression equation f into the original CSM constitutes the modified CSM (MCSM), and the model factor of MCSM can be modeled as a random variable with a lognormal distribution, and its mean value and COV are 1.02 and 0.13, respectively. Finally, 13 filed tests are collected to compare the prediction accuracy, the results show that the prediction error range of MCSM is mostly within 15%. The present findings might be helpful for engineers and designers to estimate the pullout capacity of helical anchors in cohesive soils more confidently.


Sign in / Sign up

Export Citation Format

Share Document