Learning and memory properties in fully connected networks

Author(s):  
A. D. Bruce ◽  
A. Canning ◽  
B. Forrest ◽  
E. Gardner ◽  
D. J. Wallace
2016 ◽  
Vol 26 (01) ◽  
pp. 1650004 ◽  
Author(s):  
Benny Applebaum ◽  
Dariusz R. Kowalski ◽  
Boaz Patt-Shamir ◽  
Adi Rosén

We consider a message passing model with n nodes, each connected to all other nodes by a link that can deliver a message of B bits in a time unit (typically, B = O(log n)). We assume that each node has an input of size L bits (typically, L = O(n log n)) and the nodes cooperate in order to compute some function (i.e., perform a distributed task). We are interested in the number of rounds required to compute the function. We give two results regarding this model. First, we show that most boolean functions require ‸ L/B ‹ − 1 rounds to compute deterministically, and that even if we consider randomized protocols that are allowed to err, the expected running time remains [Formula: see text] for most boolean function. Second, trying to find explicit functions that require superconstant time, we consider the pointer chasing problem. In this problem, each node i is given an array Ai of length n whose entries are in [n], and the task is to find, for any [Formula: see text], the value of [Formula: see text]. We give a deterministic O(log n/ log log n) round protocol for this function using message size B = O(log n), a slight but non-trivial improvement over the O(log n) bound provided by standard “pointer doubling.” The question of an explicit function (or functionality) that requires super constant number of rounds in this setting remains, however, open.


Inventions ◽  
2021 ◽  
Vol 6 (4) ◽  
pp. 70
Author(s):  
Elena Solovyeva ◽  
Ali Abdullah

In this paper, the structure of a separable convolutional neural network that consists of an embedding layer, separable convolutional layers, convolutional layer and global average pooling is represented for binary and multiclass text classifications. The advantage of the proposed structure is the absence of multiple fully connected layers, which is used to increase the classification accuracy but raises the computational cost. The combination of low-cost separable convolutional layers and a convolutional layer is proposed to gain high accuracy and, simultaneously, to reduce the complexity of neural classifiers. Advantages are demonstrated at binary and multiclass classifications of written texts by means of the proposed networks under the sigmoid and Softmax activation functions in convolutional layer. At binary and multiclass classifications, the accuracy obtained by separable convolutional neural networks is higher in comparison with some investigated types of recurrent neural networks and fully connected networks.


Entropy ◽  
2020 ◽  
Vol 22 (7) ◽  
pp. 727 ◽  
Author(s):  
Hlynur Jónsson ◽  
Giovanni Cherubini ◽  
Evangelos Eleftheriou

Information theory concepts are leveraged with the goal of better understanding and improving Deep Neural Networks (DNNs). The information plane of neural networks describes the behavior during training of the mutual information at various depths between input/output and hidden-layer variables. Previous analysis revealed that most of the training epochs are spent on compressing the input, in some networks where finiteness of the mutual information can be established. However, the estimation of mutual information is nontrivial for high-dimensional continuous random variables. Therefore, the computation of the mutual information for DNNs and its visualization on the information plane mostly focused on low-complexity fully connected networks. In fact, even the existence of the compression phase in complex DNNs has been questioned and viewed as an open problem. In this paper, we present the convergence of mutual information on the information plane for a high-dimensional VGG-16 Convolutional Neural Network (CNN) by resorting to Mutual Information Neural Estimation (MINE), thus confirming and extending the results obtained with low-dimensional fully connected networks. Furthermore, we demonstrate the benefits of regularizing a network, especially for a large number of training epochs, by adopting mutual information estimates as additional terms in the loss function characteristic of the network. Experimental results show that the regularization stabilizes the test accuracy and significantly reduces its variance.


2009 ◽  
Vol 95 (4) ◽  
pp. 999-1004
Author(s):  
P. E. Kornilovitch ◽  
R. N. Bicknell ◽  
J. S. Yeo

2020 ◽  
Vol 1 ◽  
pp. 6
Author(s):  
Henning Petzka ◽  
Martin Trimmel ◽  
Cristian Sminchisescu

Symmetries in neural networks allow different weight configurations leading to the same network function. For odd activation functions, the set of transformations mapping between such configurations have been studied extensively, but less is known for neural networks with ReLU activation functions. We give a complete characterization for fully-connected networks with two layers. Apart from two well-known transformations, only degenerated situations allow additional transformations that leave the network function unchanged. Reduction steps can remove only part of the degenerated cases. Finally, we present a non-degenerate situation for deep neural networks leading to new transformations leaving the network function intact.


2019 ◽  
Vol 213 ◽  
pp. 371-391 ◽  
Author(s):  
Louis P. Romero ◽  
Stefano Ambrogio ◽  
Massimo Giordano ◽  
Giorgio Cristiano ◽  
Martina Bodini ◽  
...  

This paper explores the impact of device failures, NVM conductances that may contribute read current but which cannot be programmed, on DNN training and test accuracy.


2014 ◽  
Vol 16 ◽  
pp. 05005
Author(s):  
Diego Paolo Ferruzzo Correa ◽  
José Roberto Castilho Piqueira

2020 ◽  
Vol 34 (10) ◽  
pp. 13791-13792
Author(s):  
Liangzhu Ge ◽  
Yuexian Hou ◽  
Yaju Jiang ◽  
Shuai Yao ◽  
Chao Yang

Despite their widespread applications, deep neural networks often tend to overfit the training data. Here, we propose a measure called VECA (Variance of Eigenvalues of Covariance matrix of Activation matrix) and demonstrate that VECA is a good predictor of networks' generalization performance during the training process. Experiments performed on fully-connected networks and convolutional neural networks trained on benchmark image datasets show a strong correlation between test loss and VECA, which suggest that we can calculate the VECA to estimate generalization performance without sacrificing training data to be used as a validation set.


Sign in / Sign up

Export Citation Format

Share Document