Neural Network Training By Gradient Descent Algorithms: Application on the Solar Cell

In machine learning, the transition from hand-designed features to learned features has been a huge success. Regardless, optimization methods are still created by hand. In this study, we illustrate how an optimization method's design can be recast as a learning problem, allowing the algorithm to automatically learn to exploit structure in the problems of interest. On the tasks for which they are taught, our learning algorithms, implemented by LSTMs, beat generic, hand-designed competitors, and they also adapt well to other challenges with comparable structure. We show this on a variety of tasks, including simple convex problems, neural network training, and visual styling with neural art.

Download Full-text

Exploring one pass learning for deep neural network training with averaged stochastic gradient descent

2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) ◽

10.1109/icassp.2014.6854928 ◽

2014 ◽

Cited By ~ 5

Author(s):

Zhao You ◽

Xiaorui Wang ◽

Bo Xu

Keyword(s):

Neural Network ◽

Gradient Descent ◽

Deep Neural Network ◽

Stochastic Gradient ◽

Stochastic Gradient Descent ◽

Neural Network Training ◽

Network Training

Download Full-text

Comparative Evaluation of Predicting Energy Consumption of Absorption Heat Pump with Multilayer Shallow Neural Network Training Algorithms

Buildings ◽

10.3390/buildings12010013 ◽

2021 ◽

Vol 12 (1) ◽

pp. 13

Author(s):

Jee-Heon Kim ◽

Nam-Chul Seong ◽

Won-Chang Choi

Keyword(s):

Neural Network ◽

Energy Consumption ◽

Conjugate Gradient ◽

Air Conditioning ◽

Gradient Descent ◽

Predictive Performance ◽

Neural Network Training ◽

Network Algorithms ◽

Network Training ◽

Performance Evaluation Index

The performance of various multilayer neural network algorithms to predict the energy consumption of an absorption chiller in an air conditioning system under the same conditions was compared and evaluated in this study. Each prediction model was created using 12 representative multilayer shallow neural network algorithms. As training data, about a month of actual operation data during the heating period was used, and the predictive performance of 12 algorithms according to the training size was evaluated. The prediction results indicate that the error rates using the measured values are 0.09% minimum, 5.76% maximum, and 1.94 standard deviation (SD) for the Levenberg–Marquardt backpropagation model and 0.41% minimum, 5.05% maximum, and 1.68 SD for the Bayesian regularization backpropagation model. The conjugate gradient with Polak–Ribiére updates backpropagation model yielded lower values than the other two models, with 0.31% minimum, 5.73% maximum, and 1.76 SD. Based on the results for the predictive performance evaluation index, CvRMSE, all other models (conjugate gradient with Fletcher–Reeves updates backpropagation, one-step secant backpropagation, gradient descent with momentum and adaptive learning rate backpropagation, gradient descent with momentum backpropagation) except for the gradient descent backpropagation model yielded results that satisfy ASHRAE (American Society of Heating, Refrigerating and Air-Conditioning Engineers) Guideline 14. The results of this study confirm that the prediction performance may differ for each multilayer neural network training algorithm. Therefore, selecting the appropriate model to fit the characteristics of a specific project is essential.

Download Full-text

Correspondence between neuroevolution and gradient descent

Nature Communications ◽

10.1038/s41467-021-26568-2 ◽

2021 ◽

Vol 12 (1) ◽

Author(s):

Stephen Whitelam ◽

Viktor Selin ◽

Sang-Won Park ◽

Isaac Tamblyn

Keyword(s):

Neural Network ◽

Numerical Simulation ◽

Neural Networks ◽

Loss Function ◽

Gradient Descent ◽

Deep Neural Networks ◽

Gaussian White Noise ◽

Training Methods ◽

Neural Network Training ◽

Network Training

AbstractWe show analytically that training a neural network by conditioned stochastic mutation or neuroevolution of its weights is equivalent, in the limit of small mutations, to gradient descent on the loss function in the presence of Gaussian white noise. Averaged over independent realizations of the learning process, neuroevolution is equivalent to gradient descent on the loss function. We use numerical simulation to show that this correspondence can be observed for finite mutations, for shallow and deep neural networks. Our results provide a connection between two families of neural-network training methods that are usually considered to be fundamentally different.

Download Full-text

Accelerating deep neural network training with inconsistent stochastic gradient descent

Neural Networks ◽

10.1016/j.neunet.2017.06.003 ◽

2017 ◽

Vol 93 ◽

pp. 219-229 ◽

Cited By ~ 42

Author(s):

Linnan Wang ◽

Yi Yang ◽

Renqiang Min ◽

Srimat Chakradhar

Keyword(s):

Neural Network ◽

Gradient Descent ◽

Deep Neural Network ◽

Stochastic Gradient ◽

Stochastic Gradient Descent ◽

Neural Network Training ◽

Network Training

Download Full-text

Polarization-Based Haze Removal Using Self-Supervised Network

Frontiers in Physics ◽

10.3389/fphy.2021.789232 ◽

2022 ◽

Vol 9 ◽

Author(s):

Yingjie Shi ◽

Enlai Guo ◽

Lianfa Bai ◽

Jing Han

Keyword(s):

Neural Network ◽

Gradient Descent ◽

Transmission Model ◽

Neural Network Training ◽

Original Image ◽

Atmospheric Transmission ◽

Atmospheric Scattering ◽

Haze Removal ◽

Network Training ◽

The Neural Network

Atmospheric scattering caused by suspended particles in the air severely degrades the scene radiance. This paper proposes a method to remove haze by using a neural network that combines scene polarization information. The neural network is self-supervised and online globally optimization can be achieved by using the atmospheric transmission model and gradient descent. Therefore, the proposed method does not require any haze-free image as the constraint for neural network training. The proposed approach is far superior to supervised algorithms in the performance of dehazing and is highly robust to the scene. It is proved that this method can significantly improve the contrast of the original image, and the detailed information of the scene can be effectively enhanced.

Download Full-text

Dynamic learning rate neural network training and composite structural damage detection

AIAA Journal ◽

10.2514/3.13701 ◽

1997 ◽

Vol 35 ◽

pp. 1522-1527

Author(s):

H. Luo ◽

S. Hanagud

Keyword(s):

Neural Network ◽

Damage Detection ◽

Structural Damage ◽

Learning Rate ◽

Neural Network Training ◽

Structural Damage Detection ◽

Dynamic Learning ◽

Network Training

Download Full-text

Weight regularisation in particle swarm optimisation neural network training

2014 IEEE Symposium on Swarm Intelligence ◽

10.1109/sis.2014.7011773 ◽

2014 ◽

Cited By ~ 7

Author(s):

Anna Rakitianskaia ◽

Andries Engelbrecht

Keyword(s):

Neural Network ◽

Particle Swarm ◽

Particle Swarm Optimisation ◽

Neural Network Training ◽

Network Training

Download Full-text

A Geometric Perspective on Information Plane Analysis

Entropy ◽

10.3390/e23060711 ◽

2021 ◽

Vol 23 (6) ◽

pp. 711

Author(s):

Mina Basirat ◽

Bernhard C. Geiger ◽

Peter M. Roth

Keyword(s):

Neural Network ◽

Mutual Information ◽

Geometric Interpretation ◽

Neural Network Training ◽

Neural Network Learning ◽

Network Learning ◽

Plane Analysis ◽

Network Training ◽

Hidden Layer ◽

The Impact

Information plane analysis, describing the mutual information between the input and a hidden layer and between a hidden layer and the target over time, has recently been proposed to analyze the training of neural networks. Since the activations of a hidden layer are typically continuous-valued, this mutual information cannot be computed analytically and must thus be estimated, resulting in apparently inconsistent or even contradicting results in the literature. The goal of this paper is to demonstrate how information plane analysis can still be a valuable tool for analyzing neural network training. To this end, we complement the prevailing binning estimator for mutual information with a geometric interpretation. With this geometric interpretation in mind, we evaluate the impact of regularization and interpret phenomena such as underfitting and overfitting. In addition, we investigate neural network learning in the presence of noisy data and noisy labels.

Download Full-text