scholarly journals Multilevel Initialization for Layer-Parallel Deep Neural Network Training

This paper investigates multilevel initializa- tion strategies for training very deep neural networks with a layer-parallel multigrid solver. The scheme is based on a continuous interpretation of the training problem as an optimal control problem, in which neu- ral networks are represented as discretizations of time- dependent ordinary differential equations. A key goal is to develop a method able to intelligently initialize the network parameters for the very deep networks en- abled by scalable layer-parallel training. To do this, we apply a uniform refinement strategy across the time domain, that is equivalent to refining in the layer di- mension. This refinement algorithm builds good ini- tializations for deep networks with network parameters coming from the coarser trained networks. The effec- tiveness of multilevel strategies (called nested iteration) for training is investigated using the Peaks and Indian Pines classification data sets. In both cases, the vali- dation accuracy achieved by nested iteration is higher than non-nested training. Moreover, run time to achieve the same validation accuracy is reduced. For instance, the Indian Pines example takes around 25% less time to train with the nested iteration algorithm. Finally, using the Peaks problem, we present preliminary anec- dotal evidence that the initialization strategy provides a regularizing effect on the training process, reducing sensitivity to hyperparameters and randomness in ini- tial network parameters.

Geophysics ◽  
2001 ◽  
Vol 66 (3) ◽  
pp. 845-860 ◽  
Author(s):  
François Clément ◽  
Guy Chavent ◽  
Susana Gómez

Migration‐based traveltime (MBTT) formulation provides algorithms for automatically determining background velocities from full‐waveform surface seismic reflection data using local optimization methods. In particular, it addresses the difficulty of the nonconvexity of the least‐squares data misfit function. The method consists of parameterizing the reflectivity in the time domain through a migration step and providing a multiscale representation for the smooth background velocity. We present an implementation of the MBTT approach for a 2-D finite‐difference (FD) full‐wave acoustic model. Numerical analysis on a 2-D synthetic example shows the ability of the method to find much more reliable estimates of both long and short wavelengths of the velocity than the classical least‐squares approach, even when starting from very poor initial guesses. This enlargement of the domain of attraction for the global minima of the least‐squares misfit has a price: each evaluation of the new objective function requires, besides the usual FD full‐wave forward modeling, an additional full‐wave prestack migration. Hence, the FD implementation of the MBTT approach presented in this paper is expected to provide a useful tool for the inversion of data sets of moderate size.


Author(s):  
L Mohana Tirumala ◽  
S. Srinivasa Rao

Privacy preserving in Data mining & publishing, plays a major role in today networked world. It is important to preserve the privacy of the vital information corresponding to a data set. This process can be achieved by k-anonymization solution for classification. Along with the privacy preserving using anonymization, yielding the optimized data sets is also of equal importance with a cost effective approach. In this paper Top-Down Refinement algorithm has been proposed which yields optimum results in a cost effective manner. Bayesian Classification has been proposed in this paper to predict class membership probabilities for a data tuple for which the associated class label is unknown.


2022 ◽  
pp. 202-226
Author(s):  
Leema N. ◽  
Khanna H. Nehemiah ◽  
Elgin Christo V. R. ◽  
Kannan A.

Artificial neural networks (ANN) are widely used for classification, and the training algorithm commonly used is the backpropagation (BP) algorithm. The major bottleneck faced in the backpropagation neural network training is in fixing the appropriate values for network parameters. The network parameters are initial weights, biases, activation function, number of hidden layers and the number of neurons per hidden layer, number of training epochs, learning rate, minimum error, and momentum term for the classification task. The objective of this work is to investigate the performance of 12 different BP algorithms with the impact of variations in network parameter values for the neural network training. The algorithms were evaluated with different training and testing samples taken from the three benchmark clinical datasets, namely, Pima Indian Diabetes (PID), Hepatitis, and Wisconsin Breast Cancer (WBC) dataset obtained from the University of California Irvine (UCI) machine learning repository.


Author(s):  
Jack J. Matthews ◽  
Alexander G. Liu ◽  
Chuan Yang ◽  
Duncan McIlroy ◽  
Bruce Levell ◽  
...  

The Conception and St. John’s Groups of southeastern Newfoundland contain some of the oldest known fossils of the Ediacaran macrobiota. The Mistaken Point Ecological Reserve UNESCO World Heritage Site is an internationally recognized locality for such fossils and hosts early evidence for both total group metazoan body fossils and metazoan-style locomotion. The Mistaken Point Ecological Reserve sedimentary succession includes ∼1500 m of fossil-bearing strata containing numerous dateable volcanogenic horizons, and therefore offers a crucial window into the rise and diversification of early animals. Here we present six stratigraphically coherent radioisotopic ages derived from zircons from volcanic tuffites of the Conception and St. John’s Groups at Mistaken Point Ecological Reserve. The oldest architecturally complex macrofossils, from the upper Drook Formation, have an age of 574.17 ± 0.66 Ma (including tracer calibration and decay constant uncertainties). The youngest rangeomorph fossils from Mistaken Point Ecological Reserve, in the Fermeuse Formation, have a maximum age of 564.13 ± 0.65 Ma. Fossils of the famous “E” Surface are confirmed to be 565.00 ± 0.64 Ma, while exceptionally preserved specimens on the “Brasier” Surface in the Briscal Formation are dated at 567.63 ± 0.66 Ma. We use our new ages to construct an age-depth model for the sedimentary succession, constrain sedimentary accumulation rates, and convert stratigraphic fossil ranges into the time domain to facilitate integration with time-calibrated data from other successions. Combining this age model with compiled stratigraphic ranges for all named macrofossils within the Mistaken Point Ecological Reserve succession, spanning 76 discrete fossil-bearing horizons, enables recognition and interrogation of potential evolutionary signals. Peak taxonomic diversity is recognized within the Mistaken Point and Trepassey Formations, and uniterminal rangeomorphs with undisplayed branching architecture appear several million years before multiterminal, displayed forms. Together, our combined stratigraphic, paleontological, and geochronological approach offers a holistic, time-calibrated record of evolution during the mid−late Ediacaran Period and a framework within which to consider other geochemical, environmental, and evolutionary data sets.


2005 ◽  
Vol 9 (4) ◽  
pp. 313-321 ◽  
Author(s):  
R. R. Shrestha ◽  
S. Theobald ◽  
F. Nestmann

Abstract. Artificial neural networks (ANNs) provide a quick and flexible means of developing flood flow simulation models. An important criterion for the wider applicability of the ANNs is the ability to generalise the events outside the range of training data sets. With respect to flood flow simulation, the ability to extrapolate beyond the range of calibrated data sets is of crucial importance. This study explores methods for improving generalisation of the ANNs using three different flood events data sets from the Neckar River in Germany. An ANN-based model is formulated to simulate flows at certain locations in the river reach, based on the flows at upstream locations. Network training data sets consist of time series of flows from observation stations. Simulated flows from a one-dimensional hydrodynamic numerical model are integrated for network training and validation, at a river section where no measurements are available. Network structures with different activation functions are considered for improving generalisation. The training algorithm involved backpropagation with the Levenberg-Marquardt approximation. The ability of the trained networks to extrapolate is assessed using flow data beyond the range of the training data sets. The results of this study indicate that the ANN in a suitable configuration can extend forecasting capability to a certain extent beyond the range of calibrated data sets.


2007 ◽  
Vol 62 (5) ◽  
pp. 696-704 ◽  
Author(s):  
Diana Förster ◽  
Armin Wagner ◽  
Christian B. Hübschle ◽  
Carsten Paulmann ◽  
Peter Luger

Abstract The charge density of the tripeptide L-alanyl-glycyl-L-alanine was determined from three X-ray data sets measured at different experimental setups and under different conditions. Two of the data sets were measured with synchrotron radiation (beamline F1 of Hasylab/DESY, Germany and beamline X10SA of SLS, Paul-Scherer-Institute, Switzerland) at temperatures around 100 K while a third data set was measured under home laboratory conditions (MoKα radiation) at a low temperature of 20 K. The multipole refinement strategy to derive the experimental charge density was the same in all cases, so that the obtained charge density properties could directly be compared. While the general analysis of the three data sets suggested a small preference for one of the synchrotron data sets (Hasylab F1), a comparison of topological and atomic properties gave in no case an indication for a preference of any of the three data sets. It follows that even the 4 h data set measured at the SLS performed equally well compared to the data sets of substantially longer exposure time.


Author(s):  
CHANGHUA YU ◽  
MICHAEL T. MANRY ◽  
JIANG LI

In the neural network literature, many preprocessing techniques, such as feature de-correlation, input unbiasing and normalization, are suggested to accelerate multilayer perceptron training. In this paper, we show that a network trained with an original data set and one trained with a linear transformation of the original data will go through the same training dynamics, as long as they start from equivalent states. Thus preprocessing techniques may not be helpful and are merely equivalent to using a different weight set to initialize the network. Theoretical analyses of such preprocessing approaches are given for conjugate gradient, back propagation and the Newton method. In addition, an efficient Newton-like training algorithm is proposed for hidden layer training. Experiments on various data sets confirm the theoretical analyses and verify the improvement of the new algorithm.


Geophysics ◽  
1994 ◽  
Vol 59 (5) ◽  
pp. 712-721 ◽  
Author(s):  
Umberto Spagnolini

The spectral analysis of magnetotelluric (MT) data for impedance tensor estimation requires the stationarity of measured magnetic (H) and electric (E) fields. However, it is well known that noise biases timedomain tensor estimates obtained via an iterative search by a descent algorithm to determine the least‐mean‐square residual between measured and estimated E data obtained from H data. To limit the noise that slows down, or even prevents convergence, the steepest descent step size is based upon the statistics of the residual (Bayes’ estimation). With respect to uncorrelated noise, the time‐domain technique is more robust than frequency‐domain techniques. Furthermore, the technique requires only short‐time stationarity. The time‐domain technique is applied to data sets (Lincoln Line sites) from the EMSLAB Juan de Fuca project (Electromagnetic Sounding of the Lithosphere and Asthenosphere Beneath the Juan de Fuca Plate), as well as to data from a southern Italian site. The results of EMSLAB data analysis are comparable to those obtained by robust remote reference processing where larger data sets were used.


Author(s):  
Hossam Eldin Ali ◽  
Yacoub M. Najjar

A backpropagation artificial neural network (ANN) algorithm with one hidden layer was used as a new numerical approach to characterize the soil liquefaction potential. For this purpose, 61 field data sets representing various earthquake sites from around the world were used. To develop the most accurate prediction model for liquefaction potential, alternating combinations of input parameters were used during the training and testing phases of the developed network. The accuracy of the designed network was validated against an additional 44 records not used previously in either the network training or testing stages. The prediction accuracy of the neural network approach–based model is compared with predictions obtained by using fuzzy logic and statistically based approaches. Overall, the ANN model outperformed all other investigated approaches.


Author(s):  
Alexander Matei ◽  
Stefan Ulbrich

AbstractDynamic processes have always been of profound interest for scientists and engineers alike. Often, the mathematical models used to describe and predict time-variant phenomena are uncertain in the sense that governing relations between model parameters, state variables and the time domain are incomplete. In this paper we adopt a recently proposed algorithm for the detection of model uncertainty and apply it to dynamic models. This algorithm combines parameter estimation, optimum experimental design and classical hypothesis testing within a probabilistic frequentist framework. The best setup of an experiment is defined by optimal sensor positions and optimal input configurations which both are the solution of a PDE-constrained optimization problem. The data collected by this optimized experiment then leads to variance-minimal parameter estimates. We develop efficient adjoint-based methods to solve this optimization problem with SQP-type solvers. The crucial test which a model has to pass is conducted over the claimed true values of the model parameters which are estimated from pairwise distinct data sets. For this hypothesis test, we divide the data into k equally-sized parts and follow a k-fold cross-validation procedure. We demonstrate the usefulness of our approach in simulated experiments with a vibrating linear-elastic truss.


Sign in / Sign up

Export Citation Format

Share Document