scholarly journals Methods for maintenance of neural networks in continual learning scenarios

Author(s):  
Bhasker Sri Harsha Suri ◽  
Manish Srivastava ◽  
Kalidas Yeturu

Neural networks suffer from catastrophic forgetting problem when deployed in a continual learning scenario where new batches of data arrive over time; however they are of different distributions from the previous data used for training the neural network. For assessing the performance of a model in a continual learning scenario, two aspects are important (i) to compute the difference in data distribution between a new and old batch of data and (ii) to understand the retention and learning behavior of deployed neural networks. Current techniques indicate the novelty of a new data batch by comparing its statistical properties with that of the old batch in the input space. However, it is still an open area of research to consider the perspective of a deployed neural network’s ability to generalize on the unseen data samples. In this work, we report a dataset distance measuring technique that indicates the novelty of a new batch of data while considering the deployed neural network’s perspective. We propose the construction of perspective histograms which are a vector representation of the data batches based on the correctness and confidence in the prediction of the deployed model. We have successfully tested the hypothesis empirically on image data coming MNIST Digits, MNIST Fashion, CIFAR10, for its ability to detect data perturbations of type rotation, Gaussian blur, and translation. Upon new data, given a model and its training data, we have proposed and evaluated four new scoring schemes, retention score (R), learning score (L), Oscore and SP-score for studying how much the model can retain its performance on past data, how much it can learn new data, the combined expression for the magnitude of retention and learning and stability-plasticity characteristics respectively. The scoring schemes have been evaluated MNIST Digits and MNIST Fashion data sets on different types of neural network architectures based on the number of parameters, activation functions, and learning loss functions, and an instance of a typical analysis report is presented. Machine learning model maintenance is a reality in production systems in the industry, and we hope our proposed methodology offers a solution to the need of the day in this aspect.

2021 ◽  
Author(s):  
Bhasker Sri Harsha Suri ◽  
Manish Srivastava ◽  
Kalidas Yeturu

Neural networks suffer from catastrophic forgetting problem when deployed in a continual learning scenario where new batches of data arrive over time; however they are of different distributions from the previous data used for training the neural network. For assessing the performance of a model in a continual learning scenario, two aspects are important (i) to compute the difference in data distribution between a new and old batch of data and (ii) to understand the retention and learning behavior of deployed neural networks. Current techniques indicate the novelty of a new data batch by comparing its statistical properties with that of the old batch in the input space. However, it is still an open area of research to consider the perspective of a deployed neural network’s ability to generalize on the unseen data samples. In this work, we report a dataset distance measuring technique that indicates the novelty of a new batch of data while considering the deployed neural network’s perspective. We propose the construction of perspective histograms which are a vector representation of the data batches based on the correctness and confidence in the prediction of the deployed model. We have successfully tested the hypothesis empirically on image data coming MNIST Digits, MNIST Fashion, CIFAR10, for its ability to detect data perturbations of type rotation, Gaussian blur, and translation. Upon new data, given a model and its training data, we have proposed and evaluated four new scoring schemes, retention score (R), learning score (L), Oscore and SP-score for studying how much the model can retain its performance on past data, how much it can learn new data, the combined expression for the magnitude of retention and learning and stability-plasticity characteristics respectively. The scoring schemes have been evaluated MNIST Digits and MNIST Fashion data sets on different types of neural network architectures based on the number of parameters, activation functions, and learning loss functions, and an instance of a typical analysis report is presented. Machine learning model maintenance is a reality in production systems in the industry, and we hope our proposed methodology offers a solution to the need of the day in this aspect.


2019 ◽  
Vol 141 (12) ◽  
Author(s):  
Dehao Liu ◽  
Yan Wang

Abstract Training machine learning tools such as neural networks require the availability of sizable data, which can be difficult for engineering and scientific applications where experiments or simulations are expensive. In this work, a novel multi-fidelity physics-constrained neural network is proposed to reduce the required amount of training data, where physical knowledge is applied to constrain neural networks, and multi-fidelity networks are constructed to improve training efficiency. A low-cost low-fidelity physics-constrained neural network is used as the baseline model, whereas a limited amount of data from a high-fidelity physics-constrained neural network is used to train a second neural network to predict the difference between the two models. The proposed framework is demonstrated with two-dimensional heat transfer, phase transition, and dendritic growth problems, which are fundamental in materials modeling. Physics is described by partial differential equations. With the same set of training data, the prediction error of physics-constrained neural network can be one order of magnitude lower than that of the classical artificial neural network without physical constraints. The accuracy of the prediction is comparable to those from direct numerical solutions of equations.


Author(s):  
Dehao Liu ◽  
Yan Wang

Abstract Training machine learning tools such as neural networks requires the availability of sizable data, which can be difficult for engineering and scientific applications where experiments or simulations are expensive. In this work, a novel multi-fidelity physics-constrained neural network is proposed to reduce the required amount of training data, where physical knowledge is applied to constrain neural networks, and multi-fidelity networks are constructed to improve training efficiency. A low-cost low-fidelity physics-constrained neural network is used as the baseline model, whereas a limited amount of data from a high-fidelity simulation is used to train a second neural network to predict the difference between the two models. The proposed framework is demonstrated with two-dimensional heat transfer and phase transition problems, which are fundamental in materials modeling. Physics is described by partial differential equations. With the same set of training data, the prediction error of physics-constrained neural network can be one order of magnitude lower than that of a classical artificial neural network without physical constraints. The accuracy of the prediction is comparable to those from direct numerical solutions of equations.


Author(s):  
Kate A. Smith

Neural networks are simple computational tools for examining data and developing models that help to identify interesting patterns or structures. The data used to develop these models is known as training data. Once a neural network has been exposed to the training data, and has learnt the patterns that exist in that data, it can be applied to new data thereby achieving a variety of outcomes. Neural networks can be used to: • learn to predict future events based on the patterns that have been observed in the historical training data; • learn to classify unseen data into pre-defined groups based on characteristics observed in the training data; • learn to cluster the training data into natural groups based on the similarity of characteristics in the training data.


1992 ◽  
Vol 26 (9-11) ◽  
pp. 2461-2464 ◽  
Author(s):  
R. D. Tyagi ◽  
Y. G. Du

A steady-statemathematical model of an activated sludgeprocess with a secondary settler was developed. With a limited number of training data samples obtained from the simulation at steady state, a feedforward neural network was established which exhibits an excellent capability for the operational prediction and determination.


Author(s):  
Ramesh Adhikari ◽  
Suresh Pokharel

Data augmentation is widely used in image processing and pattern recognition problems in order to increase the richness in diversity of available data. It is commonly used to improve the classification accuracy of images when the available datasets are limited. Deep learning approaches have demonstrated an immense breakthrough in medical diagnostics over the last decade. A significant amount of datasets are needed for the effective training of deep neural networks. The appropriate use of data augmentation techniques prevents the model from over-fitting and thus increases the generalization capability of the network while testing afterward on unseen data. However, it remains a huge challenge to obtain such a large dataset from rare diseases in the medical field. This study presents the synthetic data augmentation technique using Generative Adversarial Networks to evaluate the generalization capability of neural networks using existing data more effectively. In this research, the convolutional neural network (CNN) model is used to classify the X-ray images of the human chest in both normal and pneumonia conditions; then, the synthetic images of the X-ray from the available dataset are generated by using the deep convolutional generative adversarial network (DCGAN) model. Finally, the CNN model is trained again with the original dataset and augmented data generated using the DCGAN model. The classification performance of the CNN model is improved by 3.2% when the augmented data were used along with the originally available dataset.


2020 ◽  
Vol 10 (6) ◽  
pp. 2104
Author(s):  
Michał Tomaszewski ◽  
Paweł Michalski ◽  
Jakub Osuchowski

This article presents an analysis of the effectiveness of object detection in digital images with the application of a limited quantity of input. The possibility of using a limited set of learning data was achieved by developing a detailed scenario of the task, which strictly defined the conditions of detector operation in the considered case of a convolutional neural network. The described solution utilizes known architectures of deep neural networks in the process of learning and object detection. The article presents comparisons of results from detecting the most popular deep neural networks while maintaining a limited training set composed of a specific number of selected images from diagnostic video. The analyzed input material was recorded during an inspection flight conducted along high-voltage lines. The object detector was built for a power insulator. The main contribution of the presented papier is the evidence that a limited training set (in our case, just 60 training frames) could be used for object detection, assuming an outdoor scenario with low variability of environmental conditions. The decision of which network will generate the best result for such a limited training set is not a trivial task. Conducted research suggests that the deep neural networks will achieve different levels of effectiveness depending on the amount of training data. The most beneficial results were obtained for two convolutional neural networks: the faster region-convolutional neural network (faster R-CNN) and the region-based fully convolutional network (R-FCN). Faster R-CNN reached the highest AP (average precision) at a level of 0.8 for 60 frames. The R-FCN model gained a worse AP result; however, it can be noted that the relationship between the number of input samples and the obtained results has a significantly lower influence than in the case of other CNN models, which, in the authors’ assessment, is a desired feature in the case of a limited training set.


2012 ◽  
Vol 16 (4) ◽  
pp. 1151-1169 ◽  
Author(s):  
A. El-Shafie ◽  
A. Noureldin ◽  
M. Taha ◽  
A. Hussain ◽  
M. Mukhlisin

Abstract. Rainfall is considered as one of the major components of the hydrological process; it takes significant part in evaluating drought and flooding events. Therefore, it is important to have an accurate model for rainfall forecasting. Recently, several data-driven modeling approaches have been investigated to perform such forecasting tasks as multi-layer perceptron neural networks (MLP-NN). In fact, the rainfall time series modeling involves an important temporal dimension. On the other hand, the classical MLP-NN is a static and has a memoryless network architecture that is effective for complex nonlinear static mapping. This research focuses on investigating the potential of introducing a neural network that could address the temporal relationships of the rainfall series. Two different static neural networks and one dynamic neural network, namely the multi-layer perceptron neural network (MLP-NN), radial basis function neural network (RBFNN) and input delay neural network (IDNN), respectively, have been examined in this study. Those models had been developed for the two time horizons for monthly and weekly rainfall forecasting at Klang River, Malaysia. Data collected over 12 yr (1997–2008) on a weekly basis and 22 yr (1987–2008) on a monthly basis were used to develop and examine the performance of the proposed models. Comprehensive comparison analyses were carried out to evaluate the performance of the proposed static and dynamic neural networks. Results showed that the MLP-NN neural network model is able to follow trends of the actual rainfall, however, not very accurately. RBFNN model achieved better accuracy than the MLP-NN model. Moreover, the forecasting accuracy of the IDNN model was better than that of static network during both training and testing stages, which proves a consistent level of accuracy with seen and unseen data.


2020 ◽  
Vol 9 (1) ◽  
pp. 7-10
Author(s):  
Hendry Fonda

ABSTRACT Riau batik is known since the 18th century and is used by royal kings. Riau Batik is made by using a stamp that is mixed with coloring and then printed on fabric. The fabric used is usually silk. As its development, comparing Javanese  batik with riau batik Riau is very slowly accepted by the public. Convolutional Neural Networks (CNN) is a combination of artificial neural networks and deeplearning methods. CNN consists of one or more convolutional layers, often with a subsampling layer followed by one or more fully connected layers as a standard neural network. In the process, CNN will conduct training and testing of Riau batik so that a collection of batik models that have been classified based on the characteristics that exist in Riau batik can be determined so that images are Riau batik and non-Riau batik. Classification using CNN produces Riau batik and not Riau batik with an accuracy of 65%. Accuracy of 65% is due to basically many of the same motifs between batik and other batik with the difference lies in the color of the absorption in the batik riau. Kata kunci: Batik; Batik Riau; CNN; Image; Deep Learning   ABSTRAK   Batik Riau dikenal sejak abad ke 18 dan digunakan oleh bangsawan raja. Batik Riau dibuat dengan menggunakan cap yang dicampur dengan pewarna kemudian dicetak di kain. Kain yang digunakan biasanya sutra. Seiring perkembangannya, dibandingkan batik Jawa maka batik Riau sangat lambat diterima oleh masyarakat. Convolutional Neural Networks (CNN) merupakan kombinasi dari jaringan syaraf tiruan dan metode deeplearning. CNN terdiri dari satu atau lebih lapisan konvolutional, seringnya dengan suatu lapisan subsampling yang diikuti oleh satu atau lebih lapisan yang terhubung penuh sebagai standar jaringan syaraf. Dalam prosesnya CNN akan melakukan training dan testing terhadap batik Riau sehingga didapat kumpulan model batik yang telah terklasi    fikasi berdasarkan ciri khas yang ada pada batik Riau sehingga dapat ditentukan gambar (image) yang merupakan batik Riau dan yang bukan merupakan batik Riau. Klasifikasi menggunakan CNN menghasilkan batik riau dan bukan batik riau dengan akurasi 65%. Akurasi 65% disebabkan pada dasarnya banyak motif yang sama antara batik riau dengan batik lainnya dengan perbedaan terletak pada warna cerap pada batik riau. Kata kunci: Batik; Batik Riau; CNN; Image; Deep Learning


Author(s):  
Silviani E Rumagit ◽  
Azhari SN

AbstrakLatar Belakang penelitian ini dibuat dimana semakin meningkatnya kebutuhan listrik di setiap kelompok tarif. Yang dimaksud dengan kelompok tarif dalam penelitian ini adalah kelompok tarif sosial, kelompok tarif rumah tangga, kelompok tarif bisnis, kelompok tarif industri dan kelompok tarif pemerintah. Prediksi merupakan kebutuhan penting bagi penyedia tenaga listrik dalam mengambil keputusan berkaitan dengan ketersediaan energi listik. Dalam melakukan prediksi dapat dilakukan dengan metode statistik maupun kecerdasan buatan.            ARIMA merupakan salah satu metode statistik yang banyak digunakan untuk prediksi dimana ARIMA mengikuti model autoregressive (AR) moving average (MA). Syarat dari ARIMA adalah data harus stasioner, data yang tidak stasioner harus distasionerkan dengan differencing. Selain metode statistik, prediksi juga dapat dilakukan dengan teknik kecerdasan buatan, dimana dalam penelitian ini jaringan syaraf tiruan backpropagation dipilih untuk melakukan prediksi. Dari hasil pengujian yang dilakukan selisih MSE ARIMA, JST dan penggabungan ARIMA, jaringan syaraf tiruan tidak berbeda secara signifikan. Kata Kunci— ARIMA, jaringan syaraf tiruan, kelompok tarif.  AbstractBackground this research was made where the increasing demand for electricity in each group. The meaning this group is social, the household, business, industry groups and the government fare. Prediction is an important requirement for electricity providers in making decisions related to the availability of electric energy. In doing predictions can be made by statistical methods and artificial intelligence.            ARIMA is a statistical method that is widely used to predict where the ARIMA modeled autoregressive (AR) moving average (MA). Terms of ARIMA is the data must be stationary, the data is not stationary should be stationary  use differencing. In addition to the statistical method, predictions can also be done by artificial intelligence techniques, which in this study selected Backpropagation neural network to predict. From the results of tests made the difference in MSE ARIMA, ANN and merging ARIMA, artificial neural networks are not significantly different. Keyword—ARIMA, neural network, tarif groups


Sign in / Sign up

Export Citation Format

Share Document