Power Function Error Initialization Can Improve Convergence of Backpropagation Learning in Neural Networks for Classification

Abstract Training machine learning tools such as neural networks require the availability of sizable data, which can be difficult for engineering and scientific applications where experiments or simulations are expensive. In this work, a novel multi-fidelity physics-constrained neural network is proposed to reduce the required amount of training data, where physical knowledge is applied to constrain neural networks, and multi-fidelity networks are constructed to improve training efficiency. A low-cost low-fidelity physics-constrained neural network is used as the baseline model, whereas a limited amount of data from a high-fidelity physics-constrained neural network is used to train a second neural network to predict the difference between the two models. The proposed framework is demonstrated with two-dimensional heat transfer, phase transition, and dendritic growth problems, which are fundamental in materials modeling. Physics is described by partial differential equations. With the same set of training data, the prediction error of physics-constrained neural network can be one order of magnitude lower than that of the classical artificial neural network without physical constraints. The accuracy of the prediction is comparable to those from direct numerical solutions of equations.

Download Full-text

Methods for maintenance of neural networks in continual learning scenarios

10.36227/techrxiv.14565078.v1 ◽

2021 ◽

Author(s):

Bhasker Sri Harsha Suri ◽

Manish Srivastava ◽

Kalidas Yeturu

Keyword(s):

Neural Network ◽

Neural Networks ◽

Production Systems ◽

Image Data ◽

Training Data ◽

Learning Loss ◽

Unseen Data ◽

Scoring Schemes ◽

The Difference ◽

Continual Learning

Neural networks suffer from catastrophic forgetting problem when deployed in a continual learning scenario where new batches of data arrive over time; however they are of different distributions from the previous data used for training the neural network. For assessing the performance of a model in a continual learning scenario, two aspects are important (i) to compute the difference in data distribution between a new and old batch of data and (ii) to understand the retention and learning behavior of deployed neural networks. Current techniques indicate the novelty of a new data batch by comparing its statistical properties with that of the old batch in the input space. However, it is still an open area of research to consider the perspective of a deployed neural network’s ability to generalize on the unseen data samples. In this work, we report a dataset distance measuring technique that indicates the novelty of a new batch of data while considering the deployed neural network’s perspective. We propose the construction of perspective histograms which are a vector representation of the data batches based on the correctness and confidence in the prediction of the deployed model. We have successfully tested the hypothesis empirically on image data coming MNIST Digits, MNIST Fashion, CIFAR10, for its ability to detect data perturbations of type rotation, Gaussian blur, and translation. Upon new data, given a model and its training data, we have proposed and evaluated four new scoring schemes, retention score (R), learning score (L), Oscore and SP-score for studying how much the model can retain its performance on past data, how much it can learn new data, the combined expression for the magnitude of retention and learning and stability-plasticity characteristics respectively. The scoring schemes have been evaluated MNIST Digits and MNIST Fashion data sets on different types of neural network architectures based on the number of parameters, activation functions, and learning loss functions, and an instance of a typical analysis report is presented. Machine learning model maintenance is a reality in production systems in the industry, and we hope our proposed methodology offers a solution to the need of the day in this aspect.

Download Full-text

Multi-Fidelity Physics-Constrained Neural Network and its Application in Materials Modeling

Volume 2A: 45th Design Automation Conference ◽

10.1115/detc2019-98115 ◽

2019 ◽

Author(s):

Dehao Liu ◽

Yan Wang

Keyword(s):

Neural Network ◽

Neural Networks ◽

Numerical Solutions ◽

Low Cost ◽

Training Data ◽

Physical Constraints ◽

Physical Knowledge ◽

Materials Modeling ◽

Order Of Magnitude ◽

The Difference

Abstract Training machine learning tools such as neural networks requires the availability of sizable data, which can be difficult for engineering and scientific applications where experiments or simulations are expensive. In this work, a novel multi-fidelity physics-constrained neural network is proposed to reduce the required amount of training data, where physical knowledge is applied to constrain neural networks, and multi-fidelity networks are constructed to improve training efficiency. A low-cost low-fidelity physics-constrained neural network is used as the baseline model, whereas a limited amount of data from a high-fidelity simulation is used to train a second neural network to predict the difference between the two models. The proposed framework is demonstrated with two-dimensional heat transfer and phase transition problems, which are fundamental in materials modeling. Physics is described by partial differential equations. With the same set of training data, the prediction error of physics-constrained neural network can be one order of magnitude lower than that of a classical artificial neural network without physical constraints. The accuracy of the prediction is comparable to those from direct numerical solutions of equations.

Download Full-text

Methods for maintenance of neural networks in continual learning scenarios

10.36227/techrxiv.14565078 ◽

2021 ◽

Author(s):

Bhasker Sri Harsha Suri ◽

Manish Srivastava ◽

Kalidas Yeturu

Keyword(s):

Neural Network ◽

Neural Networks ◽

Production Systems ◽

Image Data ◽

Training Data ◽

Learning Loss ◽

Unseen Data ◽

Scoring Schemes ◽

The Difference ◽

Continual Learning

Neural networks suffer from catastrophic forgetting problem when deployed in a continual learning scenario where new batches of data arrive over time; however they are of different distributions from the previous data used for training the neural network. For assessing the performance of a model in a continual learning scenario, two aspects are important (i) to compute the difference in data distribution between a new and old batch of data and (ii) to understand the retention and learning behavior of deployed neural networks. Current techniques indicate the novelty of a new data batch by comparing its statistical properties with that of the old batch in the input space. However, it is still an open area of research to consider the perspective of a deployed neural network’s ability to generalize on the unseen data samples. In this work, we report a dataset distance measuring technique that indicates the novelty of a new batch of data while considering the deployed neural network’s perspective. We propose the construction of perspective histograms which are a vector representation of the data batches based on the correctness and confidence in the prediction of the deployed model. We have successfully tested the hypothesis empirically on image data coming MNIST Digits, MNIST Fashion, CIFAR10, for its ability to detect data perturbations of type rotation, Gaussian blur, and translation. Upon new data, given a model and its training data, we have proposed and evaluated four new scoring schemes, retention score (R), learning score (L), Oscore and SP-score for studying how much the model can retain its performance on past data, how much it can learn new data, the combined expression for the magnitude of retention and learning and stability-plasticity characteristics respectively. The scoring schemes have been evaluated MNIST Digits and MNIST Fashion data sets on different types of neural network architectures based on the number of parameters, activation functions, and learning loss functions, and an instance of a typical analysis report is presented. Machine learning model maintenance is a reality in production systems in the industry, and we hope our proposed methodology offers a solution to the need of the day in this aspect.

Download Full-text

An efficient learning method for layered neural networks based on selection of training data and input characteristics of an output layer unit

Electronics and Communications in Japan ◽

10.1002/ecj.10365 ◽

2012 ◽

Vol 95 (4) ◽

pp. 57-67

Author(s):

Isao Taguchi ◽

Yasuo Sugai

Keyword(s):

Neural Networks ◽

Training Data ◽

Learning Method ◽

Output Layer ◽

Efficient Learning ◽

Selection Of

Download Full-text

ASSESSING THE SEMANTIC SIMILARITY OF IMAGES OF SILK FABRICS USING CONVOLUTIONAL NEURAL NETWORKS

ISPRS Annals of Photogrammetry Remote Sensing and Spatial Information Sciences ◽

10.5194/isprs-annals-v-2-2020-641-2020 ◽

2020 ◽

Vol V-2-2020 ◽

pp. 641-648

Author(s):

D. Clermont ◽

M. Dorozynski ◽

D. Wittich ◽

F. Rottensteiner

Keyword(s):

Neural Network ◽

Neural Networks ◽

Convolutional Neural Network ◽

Image Features ◽

Training Data ◽

Loss Functions ◽

Silk Fabrics ◽

Semantic Properties ◽

The Eu

Abstract. This paper proposes several methods for training a Convolutional Neural Network (CNN) for learning the similarity between images of silk fabrics based on multiple semantic properties of the fabrics. In the context of the EU H2020 project SILKNOW (http://silknow.eu/), two variants of training were developed, one based on a Siamese CNN and one based on a triplet architecture. We propose different definitions of similarity and different loss functions for both training strategies, some of them also allowing the use of incomplete information about the training data. We assess the quality of the trained model by using the learned image features in a k-NN classification. We achieve overall accuracies of 93–95% and average F1-scores of 87–92%.

Download Full-text

IMPLEMENTASI JARINGAN SYARAF TIRUAN BACKPROPAGATION DENGAN ALGORITMA CONJUGATE GRADIENT UNTUK KLASIFIKASI KONDISI RUMAH (Studi Kasus di Kabupaten Cilacap Tahun 2018)

Jurnal Gaussian ◽

10.14710/j.gauss.v9i1.27522 ◽

2020 ◽

Vol 9 (1) ◽

pp. 41-49

Author(s):

Johanes Roisa Prabowo ◽

Rukun Santoso ◽

Hasbi Yasin

Keyword(s):

Neural Networks ◽

Artificial Neural Networks ◽

Conjugate Gradient ◽

Training Data ◽

Gradient Algorithm ◽

Output Layer ◽

Average Accuracy ◽

Testing Data ◽

Artificial Neural ◽

Hidden Layer

House is one aspect of the welfare of society that must be met, because house is the main need for human life besides clothing and food. The condition of the house as a good shelter can be known from the structure and facilities of buildings. This research aims to analyze the classification of house conditions is livable or not livable. The method used is artificial neural networks (ANN). ANN is a system information processing that has characteristics similar to biological neural networks. In this research the optimization method used is the conjugate gradient algorithm. The data used are data of Survei Sosial Ekonomi Nasional (Susenas) March 2018 Kor Keterangan Perumahan for Cilacap Regency. The data is divided into training data and testing data with the proportion that gives the highest average accuracy is 90% for training data and 10% for testing data. The best architecture obtained a model consisting of 8 neurons in input layer, 10 neurons in hidden layer and 1 neuron in output layer. The activation function used are bipolar sigmoid in the hidden layer and binary sigmoid in the output layer. The results of the analysis showed that ANN works very well for classification on house conditions in Cilacap Regency with an average accuracy of 98.96% at the training stage and 97.58% at the testing stage.Keywords: House, Classification, Artificial Neural Networks, Conjugate Gradient

Download Full-text

Neural Networks with Multidimensional Cross-Entropy Loss Functions

Computational Data and Social Networks - Lecture Notes in Computer Science ◽

10.1007/978-3-030-34980-6_5 ◽

2019 ◽

pp. 57-62 ◽

Cited By ~ 3

Author(s):

Alexander Semenov ◽

Vladimir Boginski ◽

Eduardo L. Pasiliao

Keyword(s):

Neural Networks ◽

Cross Entropy ◽

Loss Functions ◽

Entropy Loss

Download Full-text

Can Cross Entropy Loss Be Robust to Label Noise?

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2020/305 ◽

2020 ◽

Author(s):

Lei Feng ◽

Senlin Shu ◽

Zhuoyi Lin ◽

Fengmao Lv ◽

Li Li ◽

...

Keyword(s):

Mean Squared Error ◽

Absolute Error ◽

Training Data ◽

Cross Entropy ◽

Loss Functions ◽

Label Noise ◽

Entropy Loss ◽

Squared Error ◽

Great Performance ◽

Benchmark Datasets

Trained with the standard cross entropy loss, deep neural networks can achieve great performance on correctly labeled data. However, if the training data is corrupted with label noise, deep models tend to overfit the noisy labels, thereby achieving poor generation performance. To remedy this issue, several loss functions have been proposed and demonstrated to be robust to label noise. Although most of the robust loss functions stem from Categorical Cross Entropy (CCE) loss, they fail to embody the intrinsic relationships between CCE and other loss functions. In this paper, we propose a general framework dubbed Taylor cross entropy loss to train deep models in the presence of label noise. Specifically, our framework enables to weight the extent of fitting the training labels by controlling the order of Taylor Series for CCE, hence it can be robust to label noise. In addition, our framework clearly reveals the intrinsic relationships between CCE and other loss functions, such as Mean Absolute Error (MAE) and Mean Squared Error (MSE). Moreover, we present a detailed theoretical analysis to certify the robustness of this framework. Extensive experimental results on benchmark datasets demonstrate that our proposed approach significantly outperforms the state-of-the-art counterparts.

Download Full-text

How do loss functions impact the performance of graph neural networks?

10.21528/cbic2021-161 ◽

2021 ◽

Author(s):

Gabriel Jonas Duarte ◽

Tamara Arruda Pereira ◽

Erik Jhones Nascimento ◽

Diego Mesquita ◽

Amauri Holanda Souza Junior

Keyword(s):

Neural Networks ◽

Mean Absolute Error ◽

Absolute Error ◽

Cross Entropy ◽

Loss Functions ◽

Hinge Loss ◽

Significant Difference ◽

Node Classification ◽

Graph Neural Networks ◽

The Impact

Graph neural networks (GNNs) have become the de facto approach for supervised learning on graph data.To train these networks, most practitioners employ the categorical cross-entropy (CE) loss. We can attribute this largely to the probabilistic interpretability of models trained using CE, since it corresponds to the negative log of the categorical/softmax likelihood.We can attribute this largely to the probabilistic interpretation of CE, since it corresponds to the negative log of the categorical/softmax likelihood.Nonetheless, recent works have shown that deep learning models can benefit from adopting other loss functions. For instance, neural networks trained with symmetric losses (e.g., mean absolute error) are robust to label noise. Nonetheless, loss functions are a modeling choice and other training criteria can be employed — e.g., hinge loss and mean absolute error (MAE). Perhaps surprisingly, the effect of using different losses on GNNs has not been explored. In this preliminary work, we gauge the impact of different loss functions to the performance of GNNs for node classification under i) noisy labels and ii) different sample sizes. In contrast to findings on Euclidean domains, our results for GNNs show that there is no significant difference between models trained with CE and other classical loss functions on both aforementioned scenarios.

Download Full-text