The Stochastic Delta Rule: Faster and More Accurate Deep Learning Through Adaptive Weight Noise

Multilayer neural networks have led to remarkable performance on many kinds of benchmark tasks in text, speech, and image processing. Nonlinear parameter estimation in hierarchical models is known to be subject to overfitting and misspecification. One approach to these estimation and related problems (e.g., saddle points, colinearity, feature discovery) is called Dropout. The Dropout algorithm removes hidden units according to a binomial random variable with probability [Formula: see text] prior to each update, creating random “shocks” to the network that are averaged over updates (thus creating weight sharing). In this letter, we reestablish an older parameter search method and show that Dropout is a special case of this more general model, stochastic delta rule (SDR), published originally in 1990. Unlike Dropout, SDR redefines each weight in the network as a random variable with mean [Formula: see text] and standard deviation [Formula: see text]. Each weight random variable is sampled on each forward activation, consequently creating an exponential number of potential networks with shared weights (accumulated in the mean values). Both parameters are updated according to prediction error, thus resulting in weight noise injections that reflect a local history of prediction error and local model averaging. SDR therefore implements a more sensitive local gradient-dependent simulated annealing per weight converging in the limit to a Bayes optimal network. We run tests on standard benchmarks (CIFAR and ImageNet) using a modified version of DenseNet and show that SDR outperforms standard Dropout in top-5 validation error by approximately 13% with DenseNet-BC 121 on ImageNet and find various validation error improvements in smaller networks. We also show that SDR reaches the same accuracy that Dropout attains in 100 epochs in as few as 40 epochs, as well as improvements in training error by as much as 80%.

Download Full-text

On the Generalization Performance of a Regression Model with Imprecise Elements

International Journal of Uncertainty Fuzziness and Knowledge-Based Systems ◽

10.1142/s0218488517500313 ◽

2017 ◽

Vol 25 (05) ◽

pp. 723-740 ◽

Cited By ~ 3

Author(s):

Maria Brigida Ferraro

Keyword(s):

Linear Regression ◽

Regression Model ◽

Prediction Error ◽

Random Element ◽

Fuzzy Random Variable ◽

Random Variable ◽

Generalization Performance ◽

Bootstrap Procedure ◽

Link Functions ◽

Fuzzy Random

A linear regression model for imprecise random variables is considered. The imprecision of a random element has been formalized by means of the LR fuzzy random variable, characterized by a center, a left and a right spread. In order to avoid the non-negativity conditions the spreads are transformed by means of two invertible functions. To analyze the generalization performance of that model an appropriate prediction error is introduced, and it is estimated by means of a bootstrap procedure. Furthermore, since the choice of response transformations could affect the inferential procedures, a computational proposal is introduced for choosing from a family of parametric link functions, the Box-Cox family, the transformation parameters that minimize the prediction error of the model.

Download Full-text

A refinement of normal approximation to Poisson binomial

International Journal of Mathematics and Mathematical Sciences ◽

10.1155/ijmms.2005.717 ◽

2005 ◽

Vol 2005 (5) ◽

pp. 717-728 ◽

Cited By ~ 4

Author(s):

K. Neammanee

Keyword(s):

Normal Distribution ◽

Normal Approximation ◽

Random Variables ◽

Random Variable ◽

Standard Normal Distribution ◽

Bernoulli Random Variables ◽

Binomial Random Variable ◽

Standard Normal ◽

Correction Terms ◽

Better Than

LetX1,X2,…,Xnbe independent Bernoulli random variables withP(Xj=1)=1−P(Xj=0)=pjand letSn:=X1+X2+⋯+Xn.Snis called a Poisson binomial random variable and it is well known that the distribution of a Poisson binomial random variable can be approximated by the standard normal distribution. In this paper, we use Taylor's formula to improve the approximation by adding some correction terms. Our result is better than before and is of order1/nin the casep1=p2=⋯=pn.

Download Full-text

Negative binomial sums of random variables and discounted reward processes

Journal of Applied Probability ◽

10.1239/jap/1032265207 ◽

1998 ◽

Vol 35 (3) ◽

pp. 589-599

Author(s):

William L. Cooper

Keyword(s):

Differential Equation ◽

Negative Binomial ◽

Infinite Horizon ◽

Random Variables ◽

Random Variable ◽

Random Time ◽

Sums Of Random Variables ◽

Total Reward ◽

Binomial Random Variable ◽

Binomial Sums

Given a sequence of random variables (rewards), the Haviv–Puterman differential equation relates the expected infinite-horizon λ-discounted reward and the expected total reward up to a random time that is determined by an independent negative binomial random variable with parameters 2 and λ. This paper provides an interpretation of this proven, but previously unexplained, result. Furthermore, the interpretation is formalized into a new proof, which then yields new results for the general case where the rewards are accumulated up to a time determined by an independent negative binomial random variable with parameters k and λ.

Download Full-text

The Arcsine Transformation of a Binomial Random Variable

Wolfram Demonstrations Project ◽

10.3840/002516 ◽

2007 ◽

Keyword(s):

Random Variable ◽

Binomial Random Variable ◽

Arcsine Transformation

Download Full-text

Normal Approximation to a Binomial Random Variable

Wolfram Demonstrations Project ◽

10.3840/002121 ◽

2007 ◽

Keyword(s):

Normal Approximation ◽

Random Variable ◽

Binomial Random Variable

Download Full-text

Using of probabilistic methods for current loads estimation in traction circuit components

Bulletin of scientific research results ◽

10.20295/2223-9987-2016-1-30-36 ◽

2016 ◽

pp. 30-36

Author(s):

Arkady Bur’yanovaty ◽

Valery Varentsov

Keyword(s):

Power Supply ◽

Practical Importance ◽

Real Life ◽

Random Variable ◽

Probabilistic Methods ◽

Mean Values ◽

Electric Power Supply ◽

Fixed Weight ◽

Circuit Components ◽

Compensating Devices

Objective: To develop the method of calculating the electrical loads on the traction circuit components, by using data of train traffic schedule, considering the train weight differences and real-life modes of its traction under the variety of track profile, speed limitations and other parameters of traction and external electric power supply systems. Methods: To determine the mean values and dispersion of current for power supply line of traction circuit, based on the experimental runs and data from movement parameters recorders of locomotives, by type and weight of trains, the methods of probability theory and mathematic statistics were used. The current, consumed by the trains, is considered to be a random variable. Current loads of the trains is put into fixed weight, that allows to obtain statistical expectation and correlation function. Results: The ratios obtained allow to estimate general mean and effective values by sampled values within confidence limits. The paper provides the conditions, under with the current consumption functions have ergodic property cap. It also states the recommendations for train rating, based on its weight. The statistical expectations and mean square deviations of currents of traction power supply components. Practical importance: The load estimations are adjusted, that allows to determine the parameters of power circuits of electric railways in a more substantiated way. Obtained equations allow to determine the possibility of current loads bigger than given value, that should be considered while choosing the capacity of transformers, rectifiers, compensating devices, inverters, power supply and earth leads. Thus, it is possible to reach the required reliability of equipment operation and cost effectiveness of decisions.

Download Full-text

A new prediction interval for binomial random variable based on inferential models

Journal of Statistical Planning and Inference ◽

10.1016/j.jspi.2019.07.001 ◽

2020 ◽

Vol 205 ◽

pp. 156-174

Author(s):

Hezhi Lu ◽

Hua Jin

Keyword(s):

Prediction Interval ◽

Random Variable ◽

Binomial Random Variable

Download Full-text

A note on stochastic ordering of order statistics

Journal of Applied Probability ◽

10.1017/s0021900200101433 ◽

1997 ◽

Vol 34 (03) ◽

pp. 785-789 ◽

Cited By ~ 1

Author(s):

Chunsheng Ma

Keyword(s):

Order Statistics ◽

Stochastic Ordering ◽

Random Variable ◽

Sufficient Condition ◽

Necessary And Sufficient Condition ◽

Stochastic Comparisons ◽

Homogeneous Population ◽

Independent Components ◽

Binomial Random Variable ◽

Necessary And Sufficient

A necessary and sufficient condition is obtained for a Poisson binomial random variable to be stochastically larger (or smaller) than a binomial random variable. It is then used to deal with the stochastic comparisons of order statistics from heterogeneous populations with those from a homogeneous population. The result has obvious applications in the stochastic comparisons of lifetimes of k-out-of-n systems having independent components.

Download Full-text

A lower bound on the probability that a binomial random variable is exceeding its mean

Statistics & Probability Letters ◽

10.1016/j.spl.2016.08.016 ◽

2016 ◽

Vol 119 ◽

pp. 305-309 ◽

Cited By ~ 2

Author(s):

Christos Pelekis ◽

Jan Ramon

Keyword(s):

Lower Bound ◽

Random Variable ◽

Binomial Random Variable

Download Full-text

Characterization of Model Uncertainty for the Vertical Pullout Capacity of Helical Anchors in Cohesive Soils

Journal of Marine Science and Engineering ◽

10.3390/jmse8100738 ◽

2020 ◽

Vol 8 (10) ◽

pp. 738

Author(s):

Po Cheng ◽

Jiang Tao Yi ◽

Fei Liu ◽

Jun Jie Dong

Keyword(s):

Model Uncertainty ◽

Lognormal Distribution ◽

Prediction Error ◽

Random Variable ◽

Mean Value ◽

Regression Equation ◽

Cohesive Soils ◽

Pullout Capacity ◽

Input Parameters

This paper conducts coupled Eulerian–Lagrangian (CEL) analysis to characterize the model uncertainty of using the cylindrical shear method (CSM) to predict the pullout capacity of helical anchors in cohesive soils. The model factor M is adopted to represent the model uncertainty, which is equal to the value of measured capacity divided by estimated solution. The model factor Mcel can be considered to be a random variable with a lognormal distribution, and its mean value and coefficient of variation (COV) are 1.02 and 0.1, respectively. Correction factor η is introduced when comparing CSM and CEL, which is found to be influenced by input parameters. The dependence on input parameters is removed by performing regression analysis and the regression equation f is obtained. Substituting the regression equation f into the original CSM constitutes the modified CSM (MCSM), and the model factor of MCSM can be modeled as a random variable with a lognormal distribution, and its mean value and COV are 1.02 and 0.13, respectively. Finally, 13 filed tests are collected to compare the prediction accuracy, the results show that the prediction error range of MCSM is mostly within 15%. The present findings might be helpful for engineers and designers to estimate the pullout capacity of helical anchors in cohesive soils more confidently.

Download Full-text