Power Function Error Initialization Can Improve Convergence of Backpropagation Learning in Neural Networks for Classification

2021 ◽  
pp. 1-33
Author(s):  
Andreas Knoblauch

Abstract Abstract supervised learning corresponds to minimizing a loss or cost function expressing the differences between model predictions yn and the target values tn given by the training data. In neural networks, this means backpropagating error signals through the transposed weight matrixes from the output layer toward the input layer. For this, error signals in the output layer are typically initialized by the difference yn - tn, which is optimal for several commonly used loss functions like cross-entropy or sum of squared errors. Here I evaluate a more general error initialization method using power functions |yn - tn|q for q>0, corresponding to a new family of loss functions that generalize cross-entropy. Surprisingly, experiments on various learning tasks reveal that a proper choice of q can significantly improve the speed and convergence of backpropagation learning, in particular in deep and recurrent neural networks. The results suggest two main reasons for the observed improvements. First, compared to cross-entropy, the new loss functions provide better fits to the distribution of error signals in the output layer and therefore maximize the model's likelihood more efficiently. Second, the new error initialization procedure may often provide a better gradient-to-loss ratio over a broad range of neural output activity, thereby avoiding flat loss landscapes with vanishing gradients.

2019 ◽  
Vol 141 (12) ◽  
Author(s):  
Dehao Liu ◽  
Yan Wang

Abstract Training machine learning tools such as neural networks require the availability of sizable data, which can be difficult for engineering and scientific applications where experiments or simulations are expensive. In this work, a novel multi-fidelity physics-constrained neural network is proposed to reduce the required amount of training data, where physical knowledge is applied to constrain neural networks, and multi-fidelity networks are constructed to improve training efficiency. A low-cost low-fidelity physics-constrained neural network is used as the baseline model, whereas a limited amount of data from a high-fidelity physics-constrained neural network is used to train a second neural network to predict the difference between the two models. The proposed framework is demonstrated with two-dimensional heat transfer, phase transition, and dendritic growth problems, which are fundamental in materials modeling. Physics is described by partial differential equations. With the same set of training data, the prediction error of physics-constrained neural network can be one order of magnitude lower than that of the classical artificial neural network without physical constraints. The accuracy of the prediction is comparable to those from direct numerical solutions of equations.


2021 ◽  
Author(s):  
Bhasker Sri Harsha Suri ◽  
Manish Srivastava ◽  
Kalidas Yeturu

Neural networks suffer from catastrophic forgetting problem when deployed in a continual learning scenario where new batches of data arrive over time; however they are of different distributions from the previous data used for training the neural network. For assessing the performance of a model in a continual learning scenario, two aspects are important (i) to compute the difference in data distribution between a new and old batch of data and (ii) to understand the retention and learning behavior of deployed neural networks. Current techniques indicate the novelty of a new data batch by comparing its statistical properties with that of the old batch in the input space. However, it is still an open area of research to consider the perspective of a deployed neural network’s ability to generalize on the unseen data samples. In this work, we report a dataset distance measuring technique that indicates the novelty of a new batch of data while considering the deployed neural network’s perspective. We propose the construction of perspective histograms which are a vector representation of the data batches based on the correctness and confidence in the prediction of the deployed model. We have successfully tested the hypothesis empirically on image data coming MNIST Digits, MNIST Fashion, CIFAR10, for its ability to detect data perturbations of type rotation, Gaussian blur, and translation. Upon new data, given a model and its training data, we have proposed and evaluated four new scoring schemes, retention score (R), learning score (L), Oscore and SP-score for studying how much the model can retain its performance on past data, how much it can learn new data, the combined expression for the magnitude of retention and learning and stability-plasticity characteristics respectively. The scoring schemes have been evaluated MNIST Digits and MNIST Fashion data sets on different types of neural network architectures based on the number of parameters, activation functions, and learning loss functions, and an instance of a typical analysis report is presented. Machine learning model maintenance is a reality in production systems in the industry, and we hope our proposed methodology offers a solution to the need of the day in this aspect.


Author(s):  
Dehao Liu ◽  
Yan Wang

Abstract Training machine learning tools such as neural networks requires the availability of sizable data, which can be difficult for engineering and scientific applications where experiments or simulations are expensive. In this work, a novel multi-fidelity physics-constrained neural network is proposed to reduce the required amount of training data, where physical knowledge is applied to constrain neural networks, and multi-fidelity networks are constructed to improve training efficiency. A low-cost low-fidelity physics-constrained neural network is used as the baseline model, whereas a limited amount of data from a high-fidelity simulation is used to train a second neural network to predict the difference between the two models. The proposed framework is demonstrated with two-dimensional heat transfer and phase transition problems, which are fundamental in materials modeling. Physics is described by partial differential equations. With the same set of training data, the prediction error of physics-constrained neural network can be one order of magnitude lower than that of a classical artificial neural network without physical constraints. The accuracy of the prediction is comparable to those from direct numerical solutions of equations.


2021 ◽  
Author(s):  
Bhasker Sri Harsha Suri ◽  
Manish Srivastava ◽  
Kalidas Yeturu

Neural networks suffer from catastrophic forgetting problem when deployed in a continual learning scenario where new batches of data arrive over time; however they are of different distributions from the previous data used for training the neural network. For assessing the performance of a model in a continual learning scenario, two aspects are important (i) to compute the difference in data distribution between a new and old batch of data and (ii) to understand the retention and learning behavior of deployed neural networks. Current techniques indicate the novelty of a new data batch by comparing its statistical properties with that of the old batch in the input space. However, it is still an open area of research to consider the perspective of a deployed neural network’s ability to generalize on the unseen data samples. In this work, we report a dataset distance measuring technique that indicates the novelty of a new batch of data while considering the deployed neural network’s perspective. We propose the construction of perspective histograms which are a vector representation of the data batches based on the correctness and confidence in the prediction of the deployed model. We have successfully tested the hypothesis empirically on image data coming MNIST Digits, MNIST Fashion, CIFAR10, for its ability to detect data perturbations of type rotation, Gaussian blur, and translation. Upon new data, given a model and its training data, we have proposed and evaluated four new scoring schemes, retention score (R), learning score (L), Oscore and SP-score for studying how much the model can retain its performance on past data, how much it can learn new data, the combined expression for the magnitude of retention and learning and stability-plasticity characteristics respectively. The scoring schemes have been evaluated MNIST Digits and MNIST Fashion data sets on different types of neural network architectures based on the number of parameters, activation functions, and learning loss functions, and an instance of a typical analysis report is presented. Machine learning model maintenance is a reality in production systems in the industry, and we hope our proposed methodology offers a solution to the need of the day in this aspect.


Author(s):  
D. Clermont ◽  
M. Dorozynski ◽  
D. Wittich ◽  
F. Rottensteiner

Abstract. This paper proposes several methods for training a Convolutional Neural Network (CNN) for learning the similarity between images of silk fabrics based on multiple semantic properties of the fabrics. In the context of the EU H2020 project SILKNOW (http://silknow.eu/), two variants of training were developed, one based on a Siamese CNN and one based on a triplet architecture. We propose different definitions of similarity and different loss functions for both training strategies, some of them also allowing the use of incomplete information about the training data. We assess the quality of the trained model by using the learned image features in a k-NN classification. We achieve overall accuracies of 93–95% and average F1-scores of 87–92%.


2020 ◽  
Vol 9 (1) ◽  
pp. 41-49
Author(s):  
Johanes Roisa Prabowo ◽  
Rukun Santoso ◽  
Hasbi Yasin

House is one aspect of the welfare of society that must be met, because house is the main need for human life besides clothing and food. The condition of the house as a good shelter can be known from the structure and facilities of buildings. This research aims to analyze the classification of house conditions is livable or not livable. The method used is artificial neural networks (ANN). ANN is a system information processing that has characteristics similar to biological neural networks. In this research the optimization method used is the conjugate gradient algorithm. The data used are data of Survei Sosial Ekonomi Nasional (Susenas) March 2018 Kor Keterangan Perumahan for Cilacap Regency. The data is divided into training data and testing data with the proportion that gives the highest average accuracy is 90% for training data and 10% for testing data. The best architecture obtained a model consisting of 8 neurons in input layer, 10 neurons in hidden layer and 1 neuron in output layer. The activation function used are bipolar sigmoid in the hidden layer and binary sigmoid in the output layer. The results of the analysis showed that ANN works very well for classification on house conditions in Cilacap Regency with an average accuracy of 98.96% at the training stage and 97.58% at the testing stage.Keywords: House, Classification, Artificial Neural Networks, Conjugate Gradient


Author(s):  
Lei Feng ◽  
Senlin Shu ◽  
Zhuoyi Lin ◽  
Fengmao Lv ◽  
Li Li ◽  
...  

Trained with the standard cross entropy loss, deep neural networks can achieve great performance on correctly labeled data. However, if the training data is corrupted with label noise, deep models tend to overfit the noisy labels, thereby achieving poor generation performance. To remedy this issue, several loss functions have been proposed and demonstrated to be robust to label noise. Although most of the robust loss functions stem from Categorical Cross Entropy (CCE) loss, they fail to embody the intrinsic relationships between CCE and other loss functions. In this paper, we propose a general framework dubbed Taylor cross entropy loss to train deep models in the presence of label noise. Specifically, our framework enables to weight the extent of fitting the training labels by controlling the order of Taylor Series for CCE, hence it can be robust to label noise. In addition, our framework clearly reveals the intrinsic relationships between CCE and other loss functions, such as Mean Absolute Error (MAE) and Mean Squared Error (MSE). Moreover, we present a detailed theoretical analysis to certify the robustness of this framework. Extensive experimental results on benchmark datasets demonstrate that our proposed approach significantly outperforms the state-of-the-art counterparts.


2021 ◽  
Author(s):  
Gabriel Jonas Duarte ◽  
Tamara Arruda Pereira ◽  
Erik Jhones Nascimento ◽  
Diego Mesquita ◽  
Amauri Holanda Souza Junior

Graph neural networks (GNNs) have become the de facto approach for supervised learning on graph data.To train these networks, most practitioners employ the categorical cross-entropy (CE) loss. We can attribute this largely to the probabilistic interpretability of models trained using CE, since it corresponds to the negative log of the categorical/softmax likelihood.We can attribute this largely to the probabilistic interpretation of CE, since it corresponds to the negative log of the categorical/softmax likelihood.Nonetheless, recent works have shown that deep learning models can benefit from adopting other loss functions. For instance, neural networks trained with symmetric losses (e.g., mean absolute error) are robust to label noise. Nonetheless, loss functions are a modeling choice and other training criteria can be employed — e.g., hinge loss and mean absolute error (MAE). Perhaps surprisingly, the effect of using different losses on GNNs has not been explored. In this preliminary work, we gauge the impact of different loss functions to the performance of GNNs for node classification under i) noisy labels and ii) different sample sizes. In contrast to findings on Euclidean domains, our results for GNNs show that there is no significant difference between models trained with CE and other classical loss functions on both aforementioned scenarios.


Sign in / Sign up

Export Citation Format

Share Document