Pruning from Adaptive Regularization

1994 ◽  
Vol 6 (6) ◽  
pp. 1223-1232 ◽  
Author(s):  
Lars Kai Hansen ◽  
Carl Edward Rasmussen

Inspired by the recent upsurge of interest in Bayesian methods we consider adaptive regularization. A generalization based scheme for adaptation of regularization parameters is introduced and compared to Bayesian regularization. We show that pruning arises naturally within both adaptive regularization schemes. As model example we have chosen the simplest possible: estimating the mean of a random variable with known variance. Marked similarities are found between the two methods in that they both involve a “noise limit,” below which they regularize with infinite weight decay, i.e., they prune. However, pruning is not always beneficial. We show explicitly that both methods in some cases may increase the generalization error. This corresponds to situations where the underlying assumptions of the regularizer are poorly matched to the environment.

Author(s):  
Cao Xuan Phuong

Consider the model Y = X + Z , where Y is an observable random variable, X is an unobservable random variable with unknown density f , and Z is a random noise independent of X . The density g of Z is known exactly and assumed to be compactly supported. We are interested in estimating the m- fold convolution fm=f*...*f on the basis of independent and identically distributed (i.i.d.) observations Y1,..,Yn drawn from the distribution of Y . Based on the observations as well as the ridge-parameter regularization method, we propose an estimator for the function fm depending on two regularization parameters in which a parameter is given and a parameter must be chosen. The proposed estimator is shown to be consistent with respect to the mean integrated squared error under some conditions of the parameters. After that we derive a convergence rate of the estimator under some additional regular assumptions for the density f .


2021 ◽  
Vol 58 (2) ◽  
pp. 335-346
Author(s):  
Mackenzie Simper

AbstractConsider an urn containing balls labeled with integer values. Define a discrete-time random process by drawing two balls, one at a time and with replacement, and noting the labels. Add a new ball labeled with the sum of the two drawn labels. This model was introduced by Siegmund and Yakir (2005) Ann. Prob.33, 2036 for labels taking values in a finite group, in which case the distribution defined by the urn converges to the uniform distribution on the group. For the urn of integers, the main result of this paper is an exponential limit law. The mean of the exponential is a random variable with distribution depending on the starting configuration. This is a novel urn model which combines multi-drawing and an infinite type of balls. The proof of convergence uses the contraction method for recursive distributional equations.


2011 ◽  
Vol 18 (01) ◽  
pp. 71-85
Author(s):  
Fabrizio Cacciafesta

We provide a simple way to visualize the variance and the mean absolute error of a random variable with finite mean. Some application to options theory and to second order stochastic dominance is given: we show, among other, that the "call-put parity" may be seen as a Taylor formula.


2012 ◽  
Vol DMTCS Proceedings vol. AQ,... (Proceedings) ◽  
Author(s):  
Patrick Bindjeme ◽  
james Allen fill

International audience In a continuous-time setting, Fill (2012) proved, for a large class of probabilistic sources, that the number of symbol comparisons used by $\texttt{QuickSort}$, when centered by subtracting the mean and scaled by dividing by time, has a limiting distribution, but proved little about that limiting random variable $Y$—not even that it is nondegenerate. We establish the nondegeneracy of $Y$. The proof is perhaps surprisingly difficult.


2016 ◽  
Vol 48 (3) ◽  
pp. 744-767
Author(s):  
Clifford Hurvich ◽  
Josh Reed

AbstractWe study random walks whose increments are α-stable distributions with shape parameter 1<α<2. Specifically, assuming a mean increment size which is negative, we provide series expansions in terms of the mean increment size for the probability that the all-time maximum of an α-stable random walk is equal to 0 and, in the totally skewed-to-the-left case of skewness parameter β=-1, for the expected value of the all-time maximum of an α-stable random walk. Our series expansions generalize previous results for Gaussian random walks. Key ingredients in our proofs are Spitzer's identity for random walks, the stability property of α-stable random variables, and Zolotarev's integral representation for the cumulative distribution function of an α-stable random variable. We also discuss an application of our results to a problem arising in queueing theory.


2002 ◽  
Vol 21 (10) ◽  
pp. 1443-1459 ◽  
Author(s):  
Douglas J. Taylor ◽  
Lawrence L. Kupper ◽  
Keith E. Muller

1997 ◽  
Vol 9 (1) ◽  
pp. 1-42 ◽  
Author(s):  
Sepp Hochreiter ◽  
Jürgen Schmidhuber

We present a new algorithm for finding low-complexity neural networks with high generalization capability. The algorithm searches for a “flat” minimum of the error function. A flat minimum is a large connected region in weight space where the error remains approximately constant. An MDL-based, Bayesian argument suggests that flat minima correspond to “simple” networks and low expected overfitting. The argument is based on a Gibbs algorithm variant and a novel way of splitting generalization error into underfitting and overfitting error. Unlike many previous approaches, ours does not require gaussian assumptions and does not depend on a “good” weight prior. Instead we have a prior over input output functions, thus taking into account net architecture and training set. Although our algorithm requires the computation of second-order derivatives, it has backpropagation's order of complexity. Automatically, it effectively prunes units, weights, and input lines. Various experiments with feedforward and recurrent nets are described. In an application to stock market prediction, flat minimum search outperforms conventional backprop, weight decay, and “optimal brain surgeon/optimal brain damage.”


2010 ◽  
Vol 14 (02) ◽  
pp. 225-238 ◽  
Author(s):  
Obinna O. Duru ◽  
Roland N. Horne

Summary Current downhole measuring technologies have provided a means of acquiring downhole measurements of pressure, temperature, and sometimes flow-rate data. Jointly interpreting all three measurements provides a way to overcome data limitations that are associated with interpreting only two measurements—pressure and flow-rate data—as is currently done in pressure-transient analysis. This work shows how temperature measurements can be used to improve estimations in situations where lack of sufficient pressure or flow-rate data makes parameter estimation difficult or impossible. The model that describes the temperature distribution in the reservoir lends itself to quasilinear approximations. This makes the model a candidate for Bayesian inversion. The model that describes the pressure distribution for a multirate flow system is also linear and a candidate for Bayesian inversion. These two conditions were exploited in this work to present a way to cointerpret pressure and temperature signals from a reservoir. Specifically, the Bayesian methods were applied to the deconvolution of both pressure and temperature measurements. The deconvolution of the temperature measurements yielded a vector that is linearly related to the average flow-rate from the reservoir and, hence, could be used for flow-rate estimation, especially in situations in which flow-rate measurements are unavailable or unreliable. This flow rate was shown to be sufficient for a first estimation of the pressure kernel in the pressure-deconvolution problem. When the appropriate regularization parameters are chosen, the Bayesian methods can be used to suppress fluctuations and noise in measurements while maintaining sufficient resolution of the estimates. This is the point of the application of the method to data denoising. In addition, because Bayesian statistics represent a state of knowledge, it is easier to incorporate certain information, such as breakpoints, that may help improve the structure of the estimates. The methods also lend themselves to formulations that make possible the estimation of initial properties, such as initial pressures.


2007 ◽  
Vol 21 (4) ◽  
pp. 611-621 ◽  
Author(s):  
Karthik Natarajan ◽  
Zhou Linyi

In this article, we derive a tight closed-form upper bound on the expected value of a three-piece linear convex function E[max(0, X, mX − z)] given the mean μ and the variance σ2 of the random variable X. The bound is an extension of the well-known mean–variance bound for E[max(0, X)]. An application of the bound to price the strangle option in finance is provided.


Sign in / Sign up

Export Citation Format

Share Document