VECA: A Method for Detecting Overfitting in Neural Networks (Student Abstract)

Liangzhu Ge; Yuexian Hou; Yaju Jiang; Shuai Yao; Chao Yang

doi:10.1609/aaai.v34i10.7167

VECA: A Method for Detecting Overfitting in Neural Networks (Student Abstract)

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i10.7167 ◽

2020 ◽

Vol 34 (10) ◽

pp. 13791-13792

Author(s):

Liangzhu Ge ◽

Yuexian Hou ◽

Yaju Jiang ◽

Shuai Yao ◽

Chao Yang

Keyword(s):

Neural Networks ◽

Strong Correlation ◽

Good Predictor ◽

Deep Neural Networks ◽

Training Data ◽

Training Process ◽

Generalization Performance ◽

Validation Set ◽

Fully Connected ◽

Fully Connected Networks

Despite their widespread applications, deep neural networks often tend to overfit the training data. Here, we propose a measure called VECA (Variance of Eigenvalues of Covariance matrix of Activation matrix) and demonstrate that VECA is a good predictor of networks' generalization performance during the training process. Experiments performed on fully-connected networks and convolutional neural networks trained on benchmark image datasets show a strong correlation between test loss and VECA, which suggest that we can calculate the VECA to estimate generalization performance without sacrificing training data to be used as a validation set.

Download Full-text

Notes on the Symmetries of 2-Layer ReLU-Networks

Proceedings of the Northern Lights Deep Learning Workshop ◽

10.7557/18.5150 ◽

2020 ◽

Vol 1 ◽

pp. 6

Author(s):

Henning Petzka ◽

Martin Trimmel ◽

Cristian Sminchisescu

Keyword(s):

Neural Networks ◽

Deep Neural Networks ◽

Complete Characterization ◽

Activation Functions ◽

Network Function ◽

Fully Connected ◽

Fully Connected Networks

Symmetries in neural networks allow different weight configurations leading to the same network function. For odd activation functions, the set of transformations mapping between such configurations have been studied extensively, but less is known for neural networks with ReLU activation functions. We give a complete characterization for fully-connected networks with two layers. Apart from two well-known transformations, only degenerated situations allow additional transformations that leave the network function unchanged. Reduction steps can remove only part of the degenerated cases. Finally, we present a non-degenerate situation for deep neural networks leading to new transformations leaving the network function intact.

Download Full-text

Generalization Error Bounds of Gradient Descent for Learning Over-Parameterized Deep ReLU Networks

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i04.5736 ◽

2020 ◽

Vol 34 (04) ◽

pp. 3349-3356

Author(s):

Yuan Cao ◽

Quanquan Gu

Keyword(s):

Neural Networks ◽

Error Bounds ◽

Gradient Descent ◽

Deep Neural Networks ◽

Empirical Studies ◽

Training Data ◽

Generalization Error ◽

Generalization Performance ◽

Gradient Based ◽

Random Initialization

Empirical studies show that gradient-based methods can learn deep neural networks (DNNs) with very good generalization performance in the over-parameterization regime, where DNNs can easily fit a random labeling of the training data. Very recently, a line of work explains in theory that with over-parameterization and proper random initialization, gradient-based methods can find the global minima of the training loss for DNNs. However, existing generalization error bounds are unable to explain the good generalization performance of over-parameterized DNNs. The major limitation of most existing generalization bounds is that they are based on uniform convergence and are independent of the training algorithm. In this work, we derive an algorithm-dependent generalization error bound for deep ReLU networks, and show that under certain assumptions on the data distribution, gradient descent (GD) with proper random initialization is able to train a sufficiently over-parameterized DNN to achieve arbitrarily small generalization error. Our work sheds light on explaining the good generalization performance of over-parameterized deep neural networks.

Download Full-text

Topological measurement of deep neural networks using persistent homology

Annals of Mathematics and Artificial Intelligence ◽

10.1007/s10472-021-09761-3 ◽

2021 ◽

Author(s):

Satoru Watanabe ◽

Hayato Yamana

Keyword(s):

Neural Networks ◽

Deep Neural Networks ◽

Persistent Homology ◽

Topological Data Analysis ◽

Data Sets ◽

One Dimensional ◽

Novel Approach ◽

The One ◽

Fully Connected ◽

Fully Connected Networks

AbstractThe inner representation of deep neural networks (DNNs) is indecipherable, which makes it difficult to tune DNN models, control their training process, and interpret their outputs. In this paper, we propose a novel approach to investigate the inner representation of DNNs through topological data analysis (TDA). Persistent homology (PH), one of the outstanding methods in TDA, was employed for investigating the complexities of trained DNNs. We constructed clique complexes on trained DNNs and calculated the one-dimensional PH of DNNs. The PH reveals the combinational effects of multiple neurons in DNNs at different resolutions, which is difficult to be captured without using PH. Evaluations were conducted using fully connected networks (FCNs) and networks combining FCNs and convolutional neural networks (CNNs) trained on the MNIST and CIFAR-10 data sets. Evaluation results demonstrate that the PH of DNNs reflects both the excess of neurons and problem difficulty, making PH one of the prominent methods for investigating the inner representation of DNNs.

Download Full-text

VOVU: A Method for Predicting Generalization in Deep Neural Networks

Mathematical Problems in Engineering ◽

10.1155/2021/6170662 ◽

2021 ◽

Vol 2021 ◽

pp. 1-10

Author(s):

Juan Wang ◽

Liangzhu Ge ◽

Guorui Liu ◽

Guoyan Li

Keyword(s):

Neural Networks ◽

Deep Neural Networks ◽

Main Idea ◽

Training Data ◽

Early Stopping ◽

Training Set ◽

The Neural Network ◽

Validation Set ◽

Hidden Layer ◽

Fitting Ability

During the development of deep neural networks (DNNs), it is difficult to trade off the performance of fitting ability and generalization ability in training set and unknown data (such as test set). The current solution is to reduce the complexity of the objective function, using regularization methods. In this paper, we propose a method called VOVU (Variance Of Variance of Units in the last hidden layer) to maximize the optimization of the balance between fitting power and generalization during monitoring the training process. The main idea is to give full play to the predictability of the variance of the hidden layer units in the complexity of the neural network model and use it as a generalization evaluation index. In particular, we take full advantage of the last layer of hidden layers since it has the greatest impact. The algorithm was tested on Fashion-MNIST and CIFAR-10. The experimental results demonstrate that VOVU and test loss are highly positively correlated. This implies that a smaller VOVU indicates that the network has better generalization. VOVU can serve as an alternative method for early stopping and a good predictor of the generalization performance in DNNs. Specially, when the sample size is limited, VOVU will be a better choice because it does not require dividing training data as validation set.

Download Full-text

Selective Poisoning Attack on Deep Neural Networks †

Symmetry ◽

10.3390/sym11070892 ◽

2019 ◽

Vol 11 (7) ◽

pp. 892 ◽

Cited By ~ 1

Author(s):

Hyun Kwon ◽

Hyunsoo Yoon ◽

Ki-Woong Park

Keyword(s):

Machine Learning ◽

Neural Networks ◽

Pattern Recognition ◽

Unmanned Aerial Vehicles ◽

Computer Technology ◽

Deep Neural Networks ◽

Training Data ◽

Training Process ◽

Nuclear Facilities ◽

Aerial Vehicles

Studies related to pattern recognition and visualization using computer technology have been introduced. In particular, deep neural networks (DNNs) provide good performance for image, speech, and pattern recognition. However, a poisoning attack is a serious threat to a DNN’s security. A poisoning attack reduces the accuracy of a DNN by adding malicious training data during the training process. In some situations, it may be necessary to drop a specifically chosen class of accuracy from the model. For example, if an attacker specifically disallows nuclear facilities to be selectively recognized, it may be necessary to intentionally prevent unmanned aerial vehicles from correctly recognizing nuclear-related facilities. In this paper, we propose a selective poisoning attack that reduces the accuracy of only the chosen class in the model. The proposed method achieves this by training malicious data corresponding to only the chosen class while maintaining the accuracy of the remaining classes. For the experiment, we used tensorflow as the machine-learning library as well as MNIST, Fashion-MNIST, and CIFAR10 as the datasets. Experimental results show that the proposed method can reduce the accuracy of the chosen class by 43.2%, 41.7%, and 55.3% in MNIST, Fashion-MNIST, and CIFAR10, respectively, while maintaining the accuracy of the remaining classes.

Download Full-text

Prediction Models for Truck Accidents at Freeway Ramps in Washington State Using Regression and Artificial Intelligence Techniques

Transportation Research Record Journal of the Transportation Research Board ◽

10.3141/1635-04 ◽

1998 ◽

Vol 1635 (1) ◽

pp. 30-36 ◽

Cited By ~ 9

Author(s):

Wael H. Awad ◽

Bruce N. Janson

Keyword(s):

Artificial Intelligence ◽

Neural Networks ◽

Linear Regression ◽

Prediction Models ◽

Washington State ◽

Training Data ◽

Coefficient Of Determination ◽

Training Process ◽

Truck Accidents ◽

High Level

Three different modeling approaches were applied to explain truck accidents at interchanges in Washington State during a 27-month period. Three models were developed for each ramp type including linear regression, neural networks, and a hybrid system using fuzzy logic and neural networks. The study showed that linear regression was able to predict accident frequencies that fell within one standard deviation from the overall mean of the dependent variable. However, the coefficient of determination was very low in all cases. The other two artificial intelligence (AI) approaches showed a high level of performance in identifying different patterns of accidents in the training data and presented a better fit when compared to the regression model. However, the ability of these AI models to predict test data that were not included in the training process showed unsatisfactory results.

Download Full-text

Diversity oriented Deep Reinforcement Learning for targeted molecule generation

Journal of Cheminformatics ◽

10.1186/s13321-021-00498-z ◽

2021 ◽

Vol 13 (1) ◽

Author(s):

Tiago Pereira ◽

Maryam Abbasi ◽

Bernardete Ribeiro ◽

Joel P. Arrais

Keyword(s):

Neural Networks ◽

Deep Learning ◽

Reinforcement Learning ◽

Deep Neural Networks ◽

Chemical Space ◽

Biological Properties ◽

Training Process ◽

Training Strategy ◽

Inhibitory Power ◽

Exploratory Strategy

AbstractIn this work, we explore the potential of deep learning to streamline the process of identifying new potential drugs through the computational generation of molecules with interesting biological properties. Two deep neural networks compose our targeted generation framework: the Generator, which is trained to learn the building rules of valid molecules employing SMILES strings notation, and the Predictor which evaluates the newly generated compounds by predicting their affinity for the desired target. Then, the Generator is optimized through Reinforcement Learning to produce molecules with bespoken properties. The innovation of this approach is the exploratory strategy applied during the reinforcement training process that seeks to add novelty to the generated compounds. This training strategy employs two Generators interchangeably to sample new SMILES: the initially trained model that will remain fixed and a copy of the previous one that will be updated during the training to uncover the most promising molecules. The evolution of the reward assigned by the Predictor determines how often each one is employed to select the next token of the molecule. This strategy establishes a compromise between the need to acquire more information about the chemical space and the need to sample new molecules, with the experience gained so far. To demonstrate the effectiveness of the method, the Generator is trained to design molecules with an optimized coefficient of partition and also high inhibitory power against the Adenosine $$A_{2A}$$ A 2 A and $$\kappa$$ κ opioid receptors. The results reveal that the model can effectively adjust the newly generated molecules towards the wanted direction. More importantly, it was possible to find promising sets of unique and diverse molecules, which was the main purpose of the newly implemented strategy.

Download Full-text

Evaluation of Power Insulator Detection Efficiency with the Use of Limited Training Dataset

Applied Sciences ◽

10.3390/app10062104 ◽

2020 ◽

Vol 10 (6) ◽

pp. 2104

Author(s):

Michał Tomaszewski ◽

Paweł Michalski ◽

Jakub Osuchowski

Keyword(s):

Neural Network ◽

Neural Networks ◽

Object Detection ◽

Convolutional Neural Network ◽

Deep Neural Networks ◽

Detection Efficiency ◽

Training Data ◽

Training Dataset ◽

Training Set ◽

Convolutional Network

This article presents an analysis of the effectiveness of object detection in digital images with the application of a limited quantity of input. The possibility of using a limited set of learning data was achieved by developing a detailed scenario of the task, which strictly defined the conditions of detector operation in the considered case of a convolutional neural network. The described solution utilizes known architectures of deep neural networks in the process of learning and object detection. The article presents comparisons of results from detecting the most popular deep neural networks while maintaining a limited training set composed of a specific number of selected images from diagnostic video. The analyzed input material was recorded during an inspection flight conducted along high-voltage lines. The object detector was built for a power insulator. The main contribution of the presented papier is the evidence that a limited training set (in our case, just 60 training frames) could be used for object detection, assuming an outdoor scenario with low variability of environmental conditions. The decision of which network will generate the best result for such a limited training set is not a trivial task. Conducted research suggests that the deep neural networks will achieve different levels of effectiveness depending on the amount of training data. The most beneficial results were obtained for two convolutional neural networks: the faster region-convolutional neural network (faster R-CNN) and the region-based fully convolutional network (R-FCN). Faster R-CNN reached the highest AP (average precision) at a level of 0.8 for 60 frames. The R-FCN model gained a worse AP result; however, it can be noted that the relationship between the number of input samples and the obtained results has a significantly lower influence than in the case of other CNN models, which, in the authors’ assessment, is a desired feature in the case of a limited training set.

Download Full-text

Binary and Multiclass Text Classification by Means of Separable Convolutional Neural Network

Inventions ◽

10.3390/inventions6040070 ◽

2021 ◽

Vol 6 (4) ◽

pp. 70

Author(s):

Elena Solovyeva ◽

Ali Abdullah

Keyword(s):

Neural Network ◽

Neural Networks ◽

Convolutional Neural Network ◽

Recurrent Neural Networks ◽

Low Cost ◽

Computational Cost ◽

High Accuracy ◽

Activation Functions ◽

Fully Connected ◽

Fully Connected Networks

In this paper, the structure of a separable convolutional neural network that consists of an embedding layer, separable convolutional layers, convolutional layer and global average pooling is represented for binary and multiclass text classifications. The advantage of the proposed structure is the absence of multiple fully connected layers, which is used to increase the classification accuracy but raises the computational cost. The combination of low-cost separable convolutional layers and a convolutional layer is proposed to gain high accuracy and, simultaneously, to reduce the complexity of neural classifiers. Advantages are demonstrated at binary and multiclass classifications of written texts by means of the proposed networks under the sigmoid and Softmax activation functions in convolutional layer. At binary and multiclass classifications, the accuracy obtained by separable convolutional neural networks is higher in comparison with some investigated types of recurrent neural networks and fully connected networks.

Download Full-text

Towards better performance with heterogeneous training data in acoustic modeling using deep neural networks

10.21437/interspeech.2014-214 ◽

2014 ◽

Author(s):

Yan Huang ◽

Malcolm Slaney ◽

Michael L. Seltzer ◽

Yifan Gong

Keyword(s):

Neural Networks ◽

Deep Neural Networks ◽

Training Data ◽

Acoustic Modeling

Download Full-text