Gradient Regularization as Approximate Variational Inference

We developed Variational Laplace for Bayesian neural networks (BNNs), which exploits a local approximation of the curvature of the likelihood to estimate the ELBO without the need for stochastic sampling of the neural-network weights. The Variational Laplace objective is simple to evaluate, as it is the log-likelihood plus weight-decay, plus a squared-gradient regularizer. Variational Laplace gave better test performance and expected calibration errors than maximum a posteriori inference and standard sampling-based variational inference, despite using the same variational approximate posterior. Finally, we emphasize the care needed in benchmarking standard VI, as there is a risk of stopping before the variance parameters have converged. We show that early-stopping can be avoided by increasing the learning rate for the variance parameters.

Download Full-text

Noise injection for training artificial neural networks: A comparison with weight decay and early stopping

Medical Physics ◽

10.1118/1.3213517 ◽

2009 ◽

Vol 36 (10) ◽

pp. 4810-4818 ◽

Cited By ~ 62

Author(s):

Richard M. Zur ◽

Yulei Jiang ◽

Lorenzo L. Pesce ◽

Karen Drukker

Keyword(s):

Neural Networks ◽

Artificial Neural Networks ◽

Early Stopping ◽

Weight Decay ◽

Noise Injection ◽

Artificial Neural

Download Full-text

Residual-based a posteriori error analysis for the coupling of the Navier-Stokes and Darcy-Forchheimer equations

ESAIM Mathematical Modelling and Numerical Analysis ◽

10.1051/m2an/2021005 ◽

2021 ◽

Author(s):

Sergio Caucao ◽

Gabriel Gatica ◽

Ricardo Oyarzúa ◽

Felipe Sandoval

Keyword(s):

Adaptive Mesh Refinement ◽

Adaptive Mesh ◽

Posteriori Error ◽

Local Approximation ◽

Navier Stokes ◽

Error Estimator ◽

A Posteriori ◽

Approximation Properties ◽

A Posteriori Error ◽

Posteriori Error Analysis

In this paper we consider a mixed variational formulation that have been recently proposed for the coupling of the Navier--Stokes and Darcy--Forchheimer equations, and derive, though in a non-standard sense, a reliable and efficient residual-based a posteriori error estimator suitable for an adaptive mesh-refinement method. For the reliability estimate, which holds with respect to the square root of the error estimator, we make use of the inf-sup condition and the strict monotonicity of the operators involved, a suitable Helmholtz decomposition in non-standard Banach spaces in the porous medium, local approximation properties of the Cl\'ement interpolant and Raviart--Thomas operator, and a smallness assumption on the data. In turn, inverse inequalities, the localization technique based on triangle-bubble and edge-bubble functions in local $\L^\rp$ spaces, are the main tools for developing the effi\-ciency analysis, which is valid for the error estimator itself up to a suitable additional error term. Finally, several numerical results confirming the properties of the estimator and illustrating the performance of the associated adaptive algorithm are reported.

Download Full-text

Statistical mechanics approach to early stopping and weight decay

Physical Review E ◽

10.1103/physreve.58.833 ◽

1998 ◽

Vol 58 (1) ◽

pp. 833-844 ◽

Cited By ~ 6

Author(s):

Siegfried Bös

Keyword(s):

Statistical Mechanics ◽

Early Stopping ◽

Weight Decay

Download Full-text

Fast probabilistic nonlinear petrophysical inversion

Geophysics ◽

10.1190/1.3540628 ◽

2011 ◽

Vol 76 (2) ◽

pp. E45-E58 ◽

Cited By ~ 23

Author(s):

Mohammad S. Shahraeeni ◽

Andrew Curtis

Keyword(s):

Neural Network ◽

Inverse Problems ◽

Probabilistic Method ◽

Clay Content ◽

P Wave ◽

Model Parameters ◽

A Posteriori ◽

Nonlinear Inverse Problems ◽

Mixture Density ◽

The Neural Network

We have developed an extension of the mixture-density neural network as a computationally efficient probabilistic method to solve nonlinear inverse problems. In this method, any postinversion (a posteriori) joint probability density function (PDF) over the model parameters is represented by a weighted sum of multivariate Gaussian PDFs. A mixture-density neural network estimates the weights, mean vector, and covariance matrix of the Gaussians given any measured data set. In one study, we have jointly inverted compressional- and shear-wave velocity for the joint PDF of porosity, clay content, and water saturation in a synthetic, fluid-saturated, dispersed sand-shale system. Results show that if the method is applied appropriately, the joint PDF estimated by the neural network is comparable to the Monte Carlo sampled a posteriori solution of the inverse problem. However, the computational cost of training and using the neural network is much lower than inversion by sampling (more than a factor of 104 in this case and potentially a much larger factor for 3D seismic inversion). To analyze the performance of the method on real exploration geophysical data, we have jointly inverted P-wave impedance and Poisson’s ratio logs for the joint PDF of porosity and clay content. Results show that the posterior model PDF of porosity and clay content is a good estimate of actual porosity and clay-content log values. Although the results may vary from one field to another, this fast, probabilistic method of solving nonlinear inverse problems can be applied to invert well logs and large seismic data sets for petrophysical parameters in any field.

Download Full-text

Measuring the Uncertainty of Predictions in Deep Neural Networks with Variational Inference

Sensors ◽

10.3390/s20216011 ◽

2020 ◽

Vol 20 (21) ◽

pp. 6011 ◽

Cited By ~ 1

Author(s):

Jan Steinbrener ◽

Konstantin Posch ◽

Jürgen Pilz

Keyword(s):

Neural Networks ◽

Deep Learning ◽

Network Architecture ◽

Deep Neural Networks ◽

Variational Inference ◽

Model Parameters ◽

A Posteriori ◽

Novel Approach ◽

Credible Intervals ◽

Posteriori Distribution

We present a novel approach for training deep neural networks in a Bayesian way. Compared to other Bayesian deep learning formulations, our approach allows for quantifying the uncertainty in model parameters while only adding very few additional parameters to be optimized. The proposed approach uses variational inference to approximate the intractable a posteriori distribution on basis of a normal prior. By representing the a posteriori uncertainty of the network parameters per network layer and depending on the estimated parameter expectation values, only very few additional parameters need to be optimized compared to a non-Bayesian network. We compare our approach to classical deep learning, Bernoulli dropout and Bayes by Backprop using the MNIST dataset. Compared to classical deep learning, the test error is reduced by 15%. We also show that the uncertainty information obtained can be used to calculate credible intervals for the network prediction and to optimize network architecture for the dataset at hand. To illustrate that our approach also scales to large networks and input vector sizes, we apply it to the GoogLeNet architecture on a custom dataset, achieving an average accuracy of 0.92. Using 95% credible intervals, all but one wrong classification result can be detected.

Download Full-text

Neural Networks and Equilibria, Synchronization, and Time Lags

Encyclopedia of Artificial Intelligence ◽

10.4018/978-1-59904-849-9.ch178 ◽

2011 ◽

pp. 1219-1225 ◽

Cited By ~ 3

Author(s):

Daniela Danciu ◽

Vladimir Rasvan

Keyword(s):

Neural Network ◽

Dynamical System ◽

Neural Networks ◽

Point Of View ◽

Forced Oscillations ◽

Time Lags ◽

A Posteriori ◽

Time Dynamics ◽

The Neural Network ◽

Spatio Temporal

All neural networks, both natural and artificial, are characterized by two kinds of dynamics. The first one is concerned with what we would call “learning dynamics”, in fact the sequential (discrete time) dynamics of the choice of synaptic weights. The second one is the intrinsic dynamics of the neural network viewed as a dynamical system after the weights have been established via learning. Regarding the second dynamics, the emergent computational capabilities of a recurrent neural network can be achieved provided it has many equilibria. The network task is achieved provided it approaches these equilibria. But the dynamical system has a dynamics induced a posteriori by the learning process that had established the synaptic weights. It is not compulsory that this a posteriori dynamics should have the required properties, hence they have to be checked separately. The standard stability properties (Lyapunov, asymptotic and exponential stability) are defined for a single equilibrium. Their counterpart for several equilibria are: mutability, global asymptotics, gradient behavior. For the definitions of these general concepts the reader is sent to Gelig et. al., (1978), Leonov et. al., (1992). In the last decades, the number of recurrent neural networks’ applications increased, they being designed for classification, identification and complex image, visual and spatio-temporal processing in fields as engineering, chemistry, biology and medicine (see, for instance: Fortuna et. al., 2001; Fink, 2004; Atencia et. al., 2004; Iwahori et. al., 2005; Maurer et. al., 2005; Guirguis & Ghoneimy, 2007). All these applications are mainly based on the existence of several equilibria for such networks, requiring them the “good behavior” properties above discussed. Another aspect of the qualitative analysis is the so-called synchronization problem, when an external stimulus, in most cases periodic or almost periodic has to be tracked (Gelig, 1982; Danciu, 2002). This problem is, from the mathematical point of view, nothing more but existence, uniqueness and global stability of forced oscillations.

Download Full-text

VOVU: A Method for Predicting Generalization in Deep Neural Networks

Mathematical Problems in Engineering ◽

10.1155/2021/6170662 ◽

2021 ◽

Vol 2021 ◽

pp. 1-10

Author(s):

Juan Wang ◽

Liangzhu Ge ◽

Guorui Liu ◽

Guoyan Li

Keyword(s):

Neural Networks ◽

Deep Neural Networks ◽

Main Idea ◽

Training Data ◽

Early Stopping ◽

Training Set ◽

The Neural Network ◽

Validation Set ◽

Hidden Layer ◽

Fitting Ability

During the development of deep neural networks (DNNs), it is difficult to trade off the performance of fitting ability and generalization ability in training set and unknown data (such as test set). The current solution is to reduce the complexity of the objective function, using regularization methods. In this paper, we propose a method called VOVU (Variance Of Variance of Units in the last hidden layer) to maximize the optimization of the balance between fitting power and generalization during monitoring the training process. The main idea is to give full play to the predictability of the variance of the hidden layer units in the complexity of the neural network model and use it as a generalization evaluation index. In particular, we take full advantage of the last layer of hidden layers since it has the greatest impact. The algorithm was tested on Fashion-MNIST and CIFAR-10. The experimental results demonstrate that VOVU and test loss are highly positively correlated. This implies that a smaller VOVU indicates that the network has better generalization. VOVU can serve as an alternative method for early stopping and a good predictor of the generalization performance in DNNs. Specially, when the sample size is limited, VOVU will be a better choice because it does not require dividing training data as validation set.

Download Full-text

Executive and Language Control in the Multilingual Brain

Behavioural Neurology ◽

10.1155/2014/527951 ◽

2014 ◽

Vol 2014 ◽

pp. 1-7 ◽

Cited By ~ 16

Author(s):

Anthony Pak-Hin Kong ◽

Jubin Abutalebi ◽

Karen Sze-Yan Lam ◽

Brendan Weekes

Keyword(s):

Executive Functions ◽

Executive Control ◽

Test Performance ◽

Wisconsin Card Sorting Test ◽

Control Mechanisms ◽

Card Sorting ◽

Left Frontal Lobe ◽

Discourse Production ◽

The Neural Network ◽

Language Control

Neuroimaging studies suggest that the neural network involved in language control may not be specific to bi-/multilingualism but is part of a domain-general executive control system. We report a trilingual case of a Cantonese (L1), English (L2), and Mandarin (L3) speaker, Dr. T, who sustained a brain injury at the age of 77 causing lesions in the left frontal lobe and in the left temporo-parietal areas resulting in fluent aphasia. Dr. T’s executive functions were impaired according to a modified version of the Stroop color-word test and the Wisconsin Card Sorting Test performance was characterized by frequent perseveration errors. Dr. T demonstrated pathological language switching and mixing across her three languages. Code switching in Cantonese was more prominent in discourse production than confrontation naming. Our case suggests that voluntary control of spoken word production in trilingual speakers shares neural substrata in the frontobasal ganglia system with domain-general executive control mechanisms. One prediction is that lesions to such a system would give rise to both pathological switching and impairments of executive functions in trilingual speakers.

Download Full-text

Online Neural Architecture Search (ONAS): Adapting neural network architecture search in a continuously evolving domain. [Proposal]

10.31219/osf.io/suqxr ◽

2021 ◽

Author(s):

Nathan Buskulic ◽

Edward Bergman ◽

Joeran Beel

Keyword(s):

Neural Network ◽

Network Architecture ◽

Early Stopping ◽

Neural Network Architecture ◽

Initial State ◽

Warm Up ◽

Neural Architecture ◽

The Neural Network ◽

Evolving Data ◽

Minimisation Problem

Neural Architecture Search research has been limited to fixed datasets and as such does not provide the flexibility needed to deal with real-world, constantly evolving data. This is why we propose the basis of Online Neural Architecture Search (ONAS) to deal with complex, evolving, data distributions. We formalise ONAS as a minimisation problem upon which both the weights and the architecture of the neural network needs to be optimised for the data up until a time $t_i$. To solve this problem, we adapt a DARTS optimisation process, associated with an early stopping scheme, by using the supernet optimised on previous data as a warm-up initial state. This allows the architecture of the neural network to evolve as the data distribution evolves while limiting the computational burden. This work aims at building the initial mathematical formalism of the problem as well as the development of a framework where NAS methods could be used to solve this problem. Finally, several possible next steps are presented to show the potential of this field of Online Neural Architecture Search.

Download Full-text

Practical Correction of Three Fold Astigmatism in the Philips CM TEM

Proceedings, annual meeting, Electron Microscopy Society of America ◽

10.1017/s0424820100164556 ◽

1996 ◽

Vol 54 ◽

pp. 418-419

Author(s):

Arno J. Bleeker ◽

Mark H.F. Overwijk ◽

Max T. Otten

Keyword(s):

Optical Properties ◽

Phase Contrast ◽

The Other ◽

Objective Lens ◽

Symmetric Part ◽

A Posteriori ◽

Contrast Transfer Function ◽

High Brightness ◽

Rotationally Symmetric ◽

Brightness Field

With the improvement of the optical properties of the modern TEM objective lenses the point resolution is pushed beyond 0.2 nm. The objective lens of the CM300 UltraTwin combines a Cs of 0. 65 mm with a Cc of 1.4 mm. At 300 kV this results in a point resolution of 0.17 nm. Together with a high-brightness field-emission gun with an energy spread of 0.8 eV the information limit is pushed down to 0.1 nm. The rotationally symmetric part of the phase contrast transfer function (pctf), whose first zero at Scherzer focus determines the point resolution, is mainly determined by the Cs and defocus. Apart from the rotationally symmetric part there is also the non-rotationally symmetric part of the pctf. Here the main contributors are not only two-fold astigmatism and beam tilt but also three-fold astigmatism. The two-fold astigmatism together with the beam tilt can be corrected in a straight-forward way using the coma-free alignment and the objective stigmator. However, this only works well when the coefficient of three-fold astigmatism is negligible compared to the other aberration coefficients. Unfortunately this is not generally the case with the modern high-resolution objective lenses. Measurements done at a CM300 SuperTwin FEG showed a three fold-astigmatism of 1100 nm which is consistent with measurements done by others. A three-fold astigmatism of 1000 nm already sinificantly influences the image at a spatial frequency corresponding to 0.2 nm which is even above the point resolution of the objective lens. In principle it is possible to correct for the three-fold astigmatism a posteriori when through-focus series are taken or when off-axis holography is employed. This is, however not possible for single images. The only possibility is then to correct for the three-fold astigmatism in the microscope by the addition of a hexapole corrector near the objective lens.

Download Full-text