Learning and memory properties in fully connected networks

We consider a message passing model with n nodes, each connected to all other nodes by a link that can deliver a message of B bits in a time unit (typically, B = O(log n)). We assume that each node has an input of size L bits (typically, L = O(n log n)) and the nodes cooperate in order to compute some function (i.e., perform a distributed task). We are interested in the number of rounds required to compute the function. We give two results regarding this model. First, we show that most boolean functions require ‸ L/B ‹ − 1 rounds to compute deterministically, and that even if we consider randomized protocols that are allowed to err, the expected running time remains [Formula: see text] for most boolean function. Second, trying to find explicit functions that require superconstant time, we consider the pointer chasing problem. In this problem, each node i is given an array Ai of length n whose entries are in [n], and the task is to find, for any [Formula: see text], the value of [Formula: see text]. We give a deterministic O(log n/ log log n) round protocol for this function using message size B = O(log n), a slight but non-trivial improvement over the O(log n) bound provided by standard “pointer doubling.” The question of an explicit function (or functionality) that requires super constant number of rounds in this setting remains, however, open.

Download Full-text

Binary and Multiclass Text Classification by Means of Separable Convolutional Neural Network

Inventions ◽

10.3390/inventions6040070 ◽

2021 ◽

Vol 6 (4) ◽

pp. 70

Author(s):

Elena Solovyeva ◽

Ali Abdullah

Keyword(s):

Neural Network ◽

Neural Networks ◽

Convolutional Neural Network ◽

Recurrent Neural Networks ◽

Low Cost ◽

Computational Cost ◽

High Accuracy ◽

Activation Functions ◽

Fully Connected ◽

Fully Connected Networks

In this paper, the structure of a separable convolutional neural network that consists of an embedding layer, separable convolutional layers, convolutional layer and global average pooling is represented for binary and multiclass text classifications. The advantage of the proposed structure is the absence of multiple fully connected layers, which is used to increase the classification accuracy but raises the computational cost. The combination of low-cost separable convolutional layers and a convolutional layer is proposed to gain high accuracy and, simultaneously, to reduce the complexity of neural classifiers. Advantages are demonstrated at binary and multiclass classifications of written texts by means of the proposed networks under the sigmoid and Softmax activation functions in convolutional layer. At binary and multiclass classifications, the accuracy obtained by separable convolutional neural networks is higher in comparison with some investigated types of recurrent neural networks and fully connected networks.

Download Full-text

Convergence Behavior of DNNs with Mutual-Information-Based Regularization

Entropy ◽

10.3390/e22070727 ◽

2020 ◽

Vol 22 (7) ◽

pp. 727 ◽

Cited By ~ 1

Author(s):

Hlynur Jónsson ◽

Giovanni Cherubini ◽

Evangelos Eleftheriou

Keyword(s):

Neural Networks ◽

Mutual Information ◽

Low Complexity ◽

High Dimensional ◽

Test Accuracy ◽

Compression Phase ◽

Hidden Layer ◽

Low Dimensional ◽

Fully Connected ◽

Fully Connected Networks

Information theory concepts are leveraged with the goal of better understanding and improving Deep Neural Networks (DNNs). The information plane of neural networks describes the behavior during training of the mutual information at various depths between input/output and hidden-layer variables. Previous analysis revealed that most of the training epochs are spent on compressing the input, in some networks where finiteness of the mutual information can be established. However, the estimation of mutual information is nontrivial for high-dimensional continuous random variables. Therefore, the computation of the mutual information for DNNs and its visualization on the information plane mostly focused on low-complexity fully connected networks. In fact, even the existence of the compression phase in complex DNNs has been questioned and viewed as an open problem. In this paper, we present the convergence of mutual information on the information plane for a high-dimensional VGG-16 Convolutional Neural Network (CNN) by resorting to Mutual Information Neural Estimation (MINE), thus confirming and extending the results obtained with low-dimensional fully connected networks. Furthermore, we demonstrate the benefits of regularizing a network, especially for a large number of training epochs, by adopting mutual information estimates as additional terms in the loss function characteristic of the network. Experimental results show that the regularization stabilizes the test accuracy and significantly reduces its variance.

Download Full-text

Fully-connected networks with local connections

Applied Physics A ◽

10.1007/s00339-009-5124-3 ◽

2009 ◽

Vol 95 (4) ◽

pp. 999-1004

Author(s):

P. E. Kornilovitch ◽

R. N. Bicknell ◽

J. S. Yeo

Keyword(s):

Fully Connected ◽

Fully Connected Networks

Download Full-text

Notes on the Symmetries of 2-Layer ReLU-Networks

Proceedings of the Northern Lights Deep Learning Workshop ◽

10.7557/18.5150 ◽

2020 ◽

Vol 1 ◽

pp. 6

Author(s):

Henning Petzka ◽

Martin Trimmel ◽

Cristian Sminchisescu

Keyword(s):

Neural Networks ◽

Deep Neural Networks ◽

Complete Characterization ◽

Activation Functions ◽

Network Function ◽

Fully Connected ◽

Fully Connected Networks

Symmetries in neural networks allow different weight configurations leading to the same network function. For odd activation functions, the set of transformations mapping between such configurations have been studied extensively, but less is known for neural networks with ReLU activation functions. We give a complete characterization for fully-connected networks with two layers. Apart from two well-known transformations, only degenerated situations allow additional transformations that leave the network function unchanged. Reduction steps can remove only part of the degenerated cases. Finally, we present a non-degenerate situation for deep neural networks leading to new transformations leaving the network function intact.

Download Full-text

Training fully connected networks with resistive memories: impact of device failures

Faraday Discussions ◽

10.1039/c8fd00107c ◽

2019 ◽

Vol 213 ◽

pp. 371-391 ◽

Cited By ~ 9

Author(s):

Louis P. Romero ◽

Stefano Ambrogio ◽

Massimo Giordano ◽

Giorgio Cristiano ◽

Martina Bodini ◽

...

Keyword(s):

Test Accuracy ◽

Fully Connected ◽

The Impact ◽

Fully Connected Networks

This paper explores the impact of device failures, NVM conductances that may contribute read current but which cannot be programmed, on DNN training and test accuracy.

Download Full-text

Bifurcations in time-delay fully-connected networks with symmetry

MATEC Web of Conferences ◽

10.1051/matecconf/20141605005 ◽

2014 ◽

Vol 16 ◽

pp. 05005

Author(s):

Diego Paolo Ferruzzo Correa ◽

José Roberto Castilho Piqueira

Keyword(s):

Time Delay ◽

Fully Connected ◽

Fully Connected Networks

Download Full-text

(Invited) Analog Memory Fully Connected Networks for Deep Neural Network Accelerated Training

ECS Meeting Abstracts ◽

10.1149/ma2018-02/18/732 ◽

2018 ◽

Keyword(s):

Neural Network ◽

Deep Neural Network ◽

Analog Memory ◽

Fully Connected ◽

Fully Connected Networks

Download Full-text

VECA: A Method for Detecting Overfitting in Neural Networks (Student Abstract)

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i10.7167 ◽

2020 ◽

Vol 34 (10) ◽

pp. 13791-13792

Author(s):

Liangzhu Ge ◽

Yuexian Hou ◽

Yaju Jiang ◽

Shuai Yao ◽

Chao Yang

Keyword(s):

Neural Networks ◽

Strong Correlation ◽

Good Predictor ◽

Deep Neural Networks ◽

Training Data ◽

Training Process ◽

Generalization Performance ◽

Validation Set ◽

Fully Connected ◽

Fully Connected Networks

Despite their widespread applications, deep neural networks often tend to overfit the training data. Here, we propose a measure called VECA (Variance of Eigenvalues of Covariance matrix of Activation matrix) and demonstrate that VECA is a good predictor of networks' generalization performance during the training process. Experiments performed on fully-connected networks and convolutional neural networks trained on benchmark image datasets show a strong correlation between test loss and VECA, which suggest that we can calculate the VECA to estimate generalization performance without sacrificing training data to be used as a validation set.

Download Full-text