Convergence Behavior of DNNs with Mutual-Information-Based Regularization

Information theory concepts are leveraged with the goal of better understanding and improving Deep Neural Networks (DNNs). The information plane of neural networks describes the behavior during training of the mutual information at various depths between input/output and hidden-layer variables. Previous analysis revealed that most of the training epochs are spent on compressing the input, in some networks where finiteness of the mutual information can be established. However, the estimation of mutual information is nontrivial for high-dimensional continuous random variables. Therefore, the computation of the mutual information for DNNs and its visualization on the information plane mostly focused on low-complexity fully connected networks. In fact, even the existence of the compression phase in complex DNNs has been questioned and viewed as an open problem. In this paper, we present the convergence of mutual information on the information plane for a high-dimensional VGG-16 Convolutional Neural Network (CNN) by resorting to Mutual Information Neural Estimation (MINE), thus confirming and extending the results obtained with low-dimensional fully connected networks. Furthermore, we demonstrate the benefits of regularizing a network, especially for a large number of training epochs, by adopting mutual information estimates as additional terms in the loss function characteristic of the network. Experimental results show that the regularization stabilizes the test accuracy and significantly reduces its variance.

Download Full-text

Binary and Multiclass Text Classification by Means of Separable Convolutional Neural Network

Inventions ◽

10.3390/inventions6040070 ◽

2021 ◽

Vol 6 (4) ◽

pp. 70

Author(s):

Elena Solovyeva ◽

Ali Abdullah

Keyword(s):

Neural Network ◽

Neural Networks ◽

Convolutional Neural Network ◽

Recurrent Neural Networks ◽

Low Cost ◽

Computational Cost ◽

High Accuracy ◽

Activation Functions ◽

Fully Connected ◽

Fully Connected Networks

In this paper, the structure of a separable convolutional neural network that consists of an embedding layer, separable convolutional layers, convolutional layer and global average pooling is represented for binary and multiclass text classifications. The advantage of the proposed structure is the absence of multiple fully connected layers, which is used to increase the classification accuracy but raises the computational cost. The combination of low-cost separable convolutional layers and a convolutional layer is proposed to gain high accuracy and, simultaneously, to reduce the complexity of neural classifiers. Advantages are demonstrated at binary and multiclass classifications of written texts by means of the proposed networks under the sigmoid and Softmax activation functions in convolutional layer. At binary and multiclass classifications, the accuracy obtained by separable convolutional neural networks is higher in comparison with some investigated types of recurrent neural networks and fully connected networks.

Download Full-text

The Goldilocks Zone: Towards Better Understanding of Neural Network Loss Landscapes

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33013574 ◽

2019 ◽

Vol 33 ◽

pp. 3574-3581

Author(s):

Stanislav Fort ◽

Adam Scherlis

Keyword(s):

Neural Network ◽

Neural Networks ◽

Configuration Space ◽

Loss Function ◽

Positive Curvature ◽

Local Convexity ◽

Convolutional Networks ◽

Hollow Spherical Shell ◽

Low Dimensional ◽

Fully Connected

We explore the loss landscape of fully-connected and convolutional neural networks using random, low-dimensional hyperplanes and hyperspheres. Evaluating the Hessian, H, of the loss function on these hypersurfaces, we observe 1) an unusual excess of the number of positive eigenvalues of H, and 2) a large value of Tr(H)/||H|| at a well defined range of configuration space radii, corresponding to a thick, hollow, spherical shell we refer to as the Goldilocks zone. We observe this effect for fully-connected neural networks over a range of network widths and depths on MNIST and CIFAR-10 datasets with the ReLU and tanh non-linearities, and a similar effect for convolutional networks. Using our observations, we demonstrate a close connection between the Goldilocks zone, measures of local convexity/prevalence of positive curvature, and the suitability of a network initialization. We show that the high and stable accuracy reached when optimizing on random, low-dimensional hypersurfaces is directly related to the overlap between the hypersurface and the Goldilocks zone, and as a corollary demonstrate that the notion of intrinsic dimension is initialization-dependent. We note that common initialization techniques initialize neural networks in this particular region of unusually high convexity/prevalence of positive curvature, and offer a geometric intuition for their success. Furthermore, we demonstrate that initializing a neural network at a number of points and selecting for high measures of local convexity such as Tr(H)/||H||, number of positive eigenvalues of H, or low initial loss, leads to statistically significantly faster training on MNIST. Based on our observations, we hypothesize that the Goldilocks zone contains an unusually high density of suitable initialization configurations.

Download Full-text

Intrinsic motivation and episodic memories for robot exploration of high-dimensional sensory spaces

Adaptive Behavior ◽

10.1177/1059712320922916 ◽

2020 ◽

pp. 105971232092291

Author(s):

Guido Schillaci ◽

Antonio Pico Villalpando ◽

Verena V Hafner ◽

Peter Hanappe ◽

David Colliaux ◽

...

Keyword(s):

Neural Networks ◽

Episodic Memory ◽

Intrinsic Motivation ◽

Computational Models ◽

Deep Neural Networks ◽

Image Sensor ◽

Forward Kinematics ◽

High Dimensional ◽

Episodic Memories ◽

Low Dimensional

This work presents an architecture that generates curiosity-driven goal-directed exploration behaviours for an image sensor of a microfarming robot. A combination of deep neural networks for offline unsupervised learning of low-dimensional features from images and of online learning of shallow neural networks representing the inverse and forward kinematics of the system have been used. The artificial curiosity system assigns interest values to a set of pre-defined goals and drives the exploration towards those that are expected to maximise the learning progress. We propose the integration of an episodic memory in intrinsic motivation systems to face catastrophic forgetting issues, typically experienced when performing online updates of artificial neural networks. Our results show that adopting an episodic memory system not only prevents the computational models from quickly forgetting knowledge that has been previously acquired but also provides new avenues for modulating the balance between plasticity and stability of the models.

Download Full-text

Guided Networks for Few-Shot Image Segmentation and Fully Connected CRFs

Electronics ◽

10.3390/electronics9091508 ◽

2020 ◽

Vol 9 (9) ◽

pp. 1508

Author(s):

Kun Zhang ◽

Yuanjie Zheng ◽

Xiaobo Deng ◽

Weikuan Jia ◽

Jian Lian ◽

...

Keyword(s):

Neural Networks ◽

Feature Fusion ◽

Object Segmentation ◽

Conditional Random Field ◽

High Dimensional ◽

Learning Method ◽

Deep Convolutional Neural Networks ◽

Structured Output ◽

Fully Connected ◽

Task Representations

The goal of the few-shot learning method is to learn quickly from a low-data regime. Structured output tasks like segmentation are challenging for few-shot learning, due to their being high-dimensional and statistically dependent. For this problem, we propose improved guided networks and combine them with a fully connected conditional random field (CRF). The guided network extracts task representations from annotated support images through feature fusion to do fast, accurate inference on new unannotated query images. By bringing together few-shot learning methods and fully connected CRFs, our method can do accurate object segmentation by overcoming poor localization properties of deep convolutional neural networks and can quickly updating tasks, without further optimization, when faced with new data. Our guided network is at the forefront of accuracy for the terms of annotation volume and time.

Download Full-text

SYNCHRONOUS CHAOS IN HIGH-DIMENSIONAL MODULAR NEURAL NETWORKS

International Journal of Bifurcation and Chaos ◽

10.1142/s0218127496001338 ◽

1996 ◽

Vol 06 (11) ◽

pp. 2055-2067 ◽

Cited By ~ 12

Author(s):

THOMAS WENNEKERS ◽

FRANK PASEMANN

Keyword(s):

Neural Networks ◽

Single Unit ◽

System Size ◽

Single Units ◽

High Dimensional ◽

Coexisting Attractors ◽

Modular Neural Networks ◽

Coupled Maps ◽

Low Dimensional ◽

The Relationship

The relationship between certain types of high-dimensional neural networks and low-dimensional prototypical equations (neuromodules) is investigated. The high-dimensional systems consist of finitely many pools containing identical, dissipative and nonlinear single-units operating in discrete time. Under the assumption of random connections inside and between pools, the system can be reduced to a set of only a few equations, which — asymptotically in time and system size — describe the behavior of every single unit arbitrarily well. This result can be viewed as synchronization of the single units in each pool. It is stated as a theorem on systems of nonlinear coupled maps, which gives explicit conditions on the single unit dynamics and the nature of the random connections. As an application we compare a 2-pool network with the corresponding two-dimensional dynamics. The bifurcation diagrams of both systems become very similar even for moderate system size (N=50) and large disorder in the connection strengths (50% of mean), despite the fact, that the systems exhibit fairly complex behavior (quasiperiodicity, chaos, coexisting attractors).

Download Full-text

Regularizing Deep Neural Networks with an Ensemble-based Decorrelation Method

Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2018/301 ◽

2018 ◽

Author(s):

Shuqin Gu ◽

Yuexian Hou ◽

Lipeng Zhang ◽

Yazhou Zhang

Keyword(s):

Neural Networks ◽

Ensemble Learning ◽

Convolutional Neural Networks ◽

Deep Neural Networks ◽

Experimental Results ◽

Excellent Performance ◽

Hidden Layer ◽

Base Learner ◽

Fully Connected

Although Deep Neural Networks (DNNs) have achieved excellent performance in many tasks, improving the generalization capacity of DNNs still remains a challenge. In this work, we propose a novel regularizer named Ensemble-based Decorrelation Method (EDM), which is motivated by the idea of the ensemble learning to improve generalization capacity of DNNs. EDM can be applied to hidden layers in fully connected neural networks or convolutional neural networks. We treat each hidden layer as an ensemble of several base learners through dividing all the hidden units into several non-overlap groups, and each group will be viewed as a base learner. EDM encourages DNNs to learn more diverse representations by minimizing the covariance between all base learners during the training step. Experimental results on MNIST and CIFAR datasets demonstrate that EDM can effectively reduce the overfitting and improve the generalization capacity of DNNs

Download Full-text

Notes on the Symmetries of 2-Layer ReLU-Networks

Proceedings of the Northern Lights Deep Learning Workshop ◽

10.7557/18.5150 ◽

2020 ◽

Vol 1 ◽

pp. 6

Author(s):

Henning Petzka ◽

Martin Trimmel ◽

Cristian Sminchisescu

Keyword(s):

Neural Networks ◽

Deep Neural Networks ◽

Complete Characterization ◽

Activation Functions ◽

Network Function ◽

Fully Connected ◽

Fully Connected Networks

Symmetries in neural networks allow different weight configurations leading to the same network function. For odd activation functions, the set of transformations mapping between such configurations have been studied extensively, but less is known for neural networks with ReLU activation functions. We give a complete characterization for fully-connected networks with two layers. Apart from two well-known transformations, only degenerated situations allow additional transformations that leave the network function unchanged. Reduction steps can remove only part of the degenerated cases. Finally, we present a non-degenerate situation for deep neural networks leading to new transformations leaving the network function intact.

Download Full-text

Training fully connected networks with resistive memories: impact of device failures

Faraday Discussions ◽

10.1039/c8fd00107c ◽

2019 ◽

Vol 213 ◽

pp. 371-391 ◽

Cited By ~ 9

Author(s):

Louis P. Romero ◽

Stefano Ambrogio ◽

Massimo Giordano ◽

Giorgio Cristiano ◽

Martina Bodini ◽

...

Keyword(s):

Test Accuracy ◽

Fully Connected ◽

The Impact ◽

Fully Connected Networks

This paper explores the impact of device failures, NVM conductances that may contribute read current but which cannot be programmed, on DNN training and test accuracy.

Download Full-text

VECA: A Method for Detecting Overfitting in Neural Networks (Student Abstract)

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i10.7167 ◽

2020 ◽

Vol 34 (10) ◽

pp. 13791-13792

Author(s):

Liangzhu Ge ◽

Yuexian Hou ◽

Yaju Jiang ◽

Shuai Yao ◽

Chao Yang

Keyword(s):

Neural Networks ◽

Strong Correlation ◽

Good Predictor ◽

Deep Neural Networks ◽

Training Data ◽

Training Process ◽

Generalization Performance ◽

Validation Set ◽

Fully Connected ◽

Fully Connected Networks

Despite their widespread applications, deep neural networks often tend to overfit the training data. Here, we propose a measure called VECA (Variance of Eigenvalues of Covariance matrix of Activation matrix) and demonstrate that VECA is a good predictor of networks' generalization performance during the training process. Experiments performed on fully-connected networks and convolutional neural networks trained on benchmark image datasets show a strong correlation between test loss and VECA, which suggest that we can calculate the VECA to estimate generalization performance without sacrificing training data to be used as a validation set.

Download Full-text

A low-complexity visual tracking approach with single hidden layer neural networks

2014 13th International Conference on Control Automation Robotics & Vision (ICARCV) ◽

10.1109/icarcv.2014.7064408 ◽

2014 ◽

Cited By ~ 1

Author(s):

Liang Dai ◽

Yuesheng Zhu ◽

Guibo Luo ◽

Chao He

Keyword(s):

Neural Networks ◽

Visual Tracking ◽

Low Complexity ◽

Hidden Layer

Download Full-text