scholarly journals Convergence Behavior of DNNs with Mutual-Information-Based Regularization

Entropy ◽  
2020 ◽  
Vol 22 (7) ◽  
pp. 727 ◽  
Author(s):  
Hlynur Jónsson ◽  
Giovanni Cherubini ◽  
Evangelos Eleftheriou

Information theory concepts are leveraged with the goal of better understanding and improving Deep Neural Networks (DNNs). The information plane of neural networks describes the behavior during training of the mutual information at various depths between input/output and hidden-layer variables. Previous analysis revealed that most of the training epochs are spent on compressing the input, in some networks where finiteness of the mutual information can be established. However, the estimation of mutual information is nontrivial for high-dimensional continuous random variables. Therefore, the computation of the mutual information for DNNs and its visualization on the information plane mostly focused on low-complexity fully connected networks. In fact, even the existence of the compression phase in complex DNNs has been questioned and viewed as an open problem. In this paper, we present the convergence of mutual information on the information plane for a high-dimensional VGG-16 Convolutional Neural Network (CNN) by resorting to Mutual Information Neural Estimation (MINE), thus confirming and extending the results obtained with low-dimensional fully connected networks. Furthermore, we demonstrate the benefits of regularizing a network, especially for a large number of training epochs, by adopting mutual information estimates as additional terms in the loss function characteristic of the network. Experimental results show that the regularization stabilizes the test accuracy and significantly reduces its variance.

Inventions ◽  
2021 ◽  
Vol 6 (4) ◽  
pp. 70
Author(s):  
Elena Solovyeva ◽  
Ali Abdullah

In this paper, the structure of a separable convolutional neural network that consists of an embedding layer, separable convolutional layers, convolutional layer and global average pooling is represented for binary and multiclass text classifications. The advantage of the proposed structure is the absence of multiple fully connected layers, which is used to increase the classification accuracy but raises the computational cost. The combination of low-cost separable convolutional layers and a convolutional layer is proposed to gain high accuracy and, simultaneously, to reduce the complexity of neural classifiers. Advantages are demonstrated at binary and multiclass classifications of written texts by means of the proposed networks under the sigmoid and Softmax activation functions in convolutional layer. At binary and multiclass classifications, the accuracy obtained by separable convolutional neural networks is higher in comparison with some investigated types of recurrent neural networks and fully connected networks.


Author(s):  
Stanislav Fort ◽  
Adam Scherlis

We explore the loss landscape of fully-connected and convolutional neural networks using random, low-dimensional hyperplanes and hyperspheres. Evaluating the Hessian, H, of the loss function on these hypersurfaces, we observe 1) an unusual excess of the number of positive eigenvalues of H, and 2) a large value of Tr(H)/||H|| at a well defined range of configuration space radii, corresponding to a thick, hollow, spherical shell we refer to as the Goldilocks zone. We observe this effect for fully-connected neural networks over a range of network widths and depths on MNIST and CIFAR-10 datasets with the ReLU and tanh non-linearities, and a similar effect for convolutional networks. Using our observations, we demonstrate a close connection between the Goldilocks zone, measures of local convexity/prevalence of positive curvature, and the suitability of a network initialization. We show that the high and stable accuracy reached when optimizing on random, low-dimensional hypersurfaces is directly related to the overlap between the hypersurface and the Goldilocks zone, and as a corollary demonstrate that the notion of intrinsic dimension is initialization-dependent. We note that common initialization techniques initialize neural networks in this particular region of unusually high convexity/prevalence of positive curvature, and offer a geometric intuition for their success. Furthermore, we demonstrate that initializing a neural network at a number of points and selecting for high measures of local convexity such as Tr(H)/||H||, number of positive eigenvalues of H, or low initial loss, leads to statistically significantly faster training on MNIST. Based on our observations, we hypothesize that the Goldilocks zone contains an unusually high density of suitable initialization configurations.


2020 ◽  
pp. 105971232092291
Author(s):  
Guido Schillaci ◽  
Antonio Pico Villalpando ◽  
Verena V Hafner ◽  
Peter Hanappe ◽  
David Colliaux ◽  
...  

This work presents an architecture that generates curiosity-driven goal-directed exploration behaviours for an image sensor of a microfarming robot. A combination of deep neural networks for offline unsupervised learning of low-dimensional features from images and of online learning of shallow neural networks representing the inverse and forward kinematics of the system have been used. The artificial curiosity system assigns interest values to a set of pre-defined goals and drives the exploration towards those that are expected to maximise the learning progress. We propose the integration of an episodic memory in intrinsic motivation systems to face catastrophic forgetting issues, typically experienced when performing online updates of artificial neural networks. Our results show that adopting an episodic memory system not only prevents the computational models from quickly forgetting knowledge that has been previously acquired but also provides new avenues for modulating the balance between plasticity and stability of the models.


Electronics ◽  
2020 ◽  
Vol 9 (9) ◽  
pp. 1508
Author(s):  
Kun Zhang ◽  
Yuanjie Zheng ◽  
Xiaobo Deng ◽  
Weikuan Jia ◽  
Jian Lian ◽  
...  

The goal of the few-shot learning method is to learn quickly from a low-data regime. Structured output tasks like segmentation are challenging for few-shot learning, due to their being high-dimensional and statistically dependent. For this problem, we propose improved guided networks and combine them with a fully connected conditional random field (CRF). The guided network extracts task representations from annotated support images through feature fusion to do fast, accurate inference on new unannotated query images. By bringing together few-shot learning methods and fully connected CRFs, our method can do accurate object segmentation by overcoming poor localization properties of deep convolutional neural networks and can quickly updating tasks, without further optimization, when faced with new data. Our guided network is at the forefront of accuracy for the terms of annotation volume and time.


1996 ◽  
Vol 06 (11) ◽  
pp. 2055-2067 ◽  
Author(s):  
THOMAS WENNEKERS ◽  
FRANK PASEMANN

The relationship between certain types of high-dimensional neural networks and low-dimensional prototypical equations (neuromodules) is investigated. The high-dimensional systems consist of finitely many pools containing identical, dissipative and nonlinear single-units operating in discrete time. Under the assumption of random connections inside and between pools, the system can be reduced to a set of only a few equations, which — asymptotically in time and system size — describe the behavior of every single unit arbitrarily well. This result can be viewed as synchronization of the single units in each pool. It is stated as a theorem on systems of nonlinear coupled maps, which gives explicit conditions on the single unit dynamics and the nature of the random connections. As an application we compare a 2-pool network with the corresponding two-dimensional dynamics. The bifurcation diagrams of both systems become very similar even for moderate system size (N=50) and large disorder in the connection strengths (50% of mean), despite the fact, that the systems exhibit fairly complex behavior (quasiperiodicity, chaos, coexisting attractors).


Author(s):  
Shuqin Gu ◽  
Yuexian Hou ◽  
Lipeng Zhang ◽  
Yazhou Zhang

Although Deep Neural Networks (DNNs) have achieved excellent performance in many tasks, improving the generalization capacity of DNNs still remains a challenge. In this work, we propose a novel regularizer named Ensemble-based Decorrelation Method (EDM), which is motivated by the idea of the ensemble learning to improve generalization capacity of DNNs. EDM can be applied to hidden layers in fully connected neural networks or convolutional neural networks. We treat each hidden layer as an ensemble of several base learners through dividing all the hidden units into several non-overlap groups, and each group will be viewed as a base learner. EDM encourages DNNs to learn more diverse representations by minimizing the covariance between all base learners during the training step. Experimental results on MNIST and CIFAR datasets demonstrate that EDM can effectively reduce the overfitting and improve the generalization capacity of DNNs  


2020 ◽  
Vol 1 ◽  
pp. 6
Author(s):  
Henning Petzka ◽  
Martin Trimmel ◽  
Cristian Sminchisescu

Symmetries in neural networks allow different weight configurations leading to the same network function. For odd activation functions, the set of transformations mapping between such configurations have been studied extensively, but less is known for neural networks with ReLU activation functions. We give a complete characterization for fully-connected networks with two layers. Apart from two well-known transformations, only degenerated situations allow additional transformations that leave the network function unchanged. Reduction steps can remove only part of the degenerated cases. Finally, we present a non-degenerate situation for deep neural networks leading to new transformations leaving the network function intact.


2019 ◽  
Vol 213 ◽  
pp. 371-391 ◽  
Author(s):  
Louis P. Romero ◽  
Stefano Ambrogio ◽  
Massimo Giordano ◽  
Giorgio Cristiano ◽  
Martina Bodini ◽  
...  

This paper explores the impact of device failures, NVM conductances that may contribute read current but which cannot be programmed, on DNN training and test accuracy.


2020 ◽  
Vol 34 (10) ◽  
pp. 13791-13792
Author(s):  
Liangzhu Ge ◽  
Yuexian Hou ◽  
Yaju Jiang ◽  
Shuai Yao ◽  
Chao Yang

Despite their widespread applications, deep neural networks often tend to overfit the training data. Here, we propose a measure called VECA (Variance of Eigenvalues of Covariance matrix of Activation matrix) and demonstrate that VECA is a good predictor of networks' generalization performance during the training process. Experiments performed on fully-connected networks and convolutional neural networks trained on benchmark image datasets show a strong correlation between test loss and VECA, which suggest that we can calculate the VECA to estimate generalization performance without sacrificing training data to be used as a validation set.


Sign in / Sign up

Export Citation Format

Share Document