Individual differences among deep neural network models

AbstractDeep neural networks (DNNs) excel at visual recognition tasks and are increasingly used as a modelling framework for neural computations in the primate brain. However, each DNN instance, just like each individual brain, has a unique connectivity and representational profile. Here, we investigate individual differences among DNN instances that arise from varying only the random initialization of the network weights. Using representational similarity analysis, we demonstrate that this minimal change in initial conditions prior to training leads to substantial differences in intermediate and higher-level network representations, despite achieving indistinguishable network-level classification performance. We locate the origins of the effects in an under-constrained alignment of category exemplars, rather than a misalignment of category centroids. Furthermore, while network regularization can increase the consistency of learned representations, considerable differences remain. These results suggest that computational neuroscientists working with DNNs should base their inferences on multiple networks instances instead of single off-the-shelf networks.

Download Full-text

Individual differences among deep neural network models

Nature Communications ◽

10.1038/s41467-020-19632-w ◽

2020 ◽

Vol 11 (1) ◽

Author(s):

Johannes Mehrer ◽

Courtney J. Spoerer ◽

Nikolaus Kriegeskorte ◽

Tim C. Kietzmann

Keyword(s):

Individual Differences ◽

Visual Recognition ◽

Initial Conditions ◽

Network Models ◽

Classification Performance ◽

Neural Network Models ◽

Modeling Framework ◽

Neural Computations ◽

Neural Information ◽

Multiple Network

AbstractDeep neural networks (DNNs) excel at visual recognition tasks and are increasingly used as a modeling framework for neural computations in the primate brain. Just like individual brains, each DNN has a unique connectivity and representational profile. Here, we investigate individual differences among DNN instances that arise from varying only the random initialization of the network weights. Using tools typically employed in systems neuroscience, we show that this minimal change in initial conditions prior to training leads to substantial differences in intermediate and higher-level network representations despite similar network-level classification performance. We locate the origins of the effects in an under-constrained alignment of category exemplars, rather than misaligned category centroids. These results call into question the common practice of using single networks to derive insights into neural information processing and rather suggest that computational neuroscientists working with DNNs may need to base their inferences on groups of multiple network instances.

Download Full-text

Evaluation of Pre-Trained Convolutional Neural Network Models for Object Recognition

International Journal of Engineering & Technology ◽

10.14419/ijet.v7i3.15.17509 ◽

2018 ◽

Vol 7 (3.15) ◽

pp. 95 ◽

Cited By ~ 1

Author(s):

M Zabir ◽

N Fazira ◽

Zaidah Ibrahim ◽

Nurbaity Sabri

Keyword(s):

Neural Network ◽

Convolutional Neural Network ◽

Large Scale ◽

Visual Recognition ◽

Error Function ◽

Network Models ◽

Neural Network Models ◽

California Institute Of Technology ◽

Institute Of Technology ◽

Similar Accuracy

This paper aims to evaluate the accuracy performance of pre-trained Convolutional Neural Network (CNN) models, namely AlexNet and GoogLeNet accompanied by one custom CNN. AlexNet and GoogLeNet have been proven for their good capabilities as these network models had entered ImageNet Large Scale Visual Recognition Challenge (ILSVRC) and produce relatively good results. The evaluation results in this research are based on the accuracy, loss and time taken of the training and validation processes. The dataset used is Caltech101 by California Institute of Technology (Caltech) that contains 101 object categories. The result reveals that custom CNN architecture produces 91.05% accuracy whereas AlexNet and GoogLeNet achieve similar accuracy which is 99.65%. GoogLeNet consistency arrives at an early training stage and provides minimum error function compared to the other two models.

Download Full-text

Hybrid Model Structure for Diabetic Retinopathy Classification

Journal of Healthcare Engineering ◽

10.1155/2020/8840174 ◽

2020 ◽

Vol 2020 ◽

pp. 1-9

Author(s):

Hao Liu ◽

Keqiang Yue ◽

Siyi Cheng ◽

Chengming Pan ◽

Jie Sun ◽

...

Keyword(s):

Diabetic Retinopathy ◽

Hybrid Model ◽

Network Models ◽

Classification Performance ◽

Cross Entropy ◽

Model Structure ◽

Training Process ◽

Neural Network Models ◽

Entropy Loss ◽

Model Structures

Diabetic retinopathy (DR) is one of the most common complications of diabetes and the main cause of blindness. The progression of the disease can be prevented by early diagnosis of DR. Due to differences in the distribution of medical conditions and low labor efficiency, the best time for diagnosis and treatment was missed, which results in impaired vision. Using neural network models to classify and diagnose DR can improve efficiency and reduce costs. In this work, an improved loss function and three hybrid model structures Hybrid-a, Hybrid-f, and Hybrid-c were proposed to improve the performance of DR classification models. EfficientNetB4, EfficientNetB5, NASNetLarge, Xception, and InceptionResNetV2 CNNs were chosen as the basic models. These basic models were trained using enhance cross-entropy loss and cross-entropy loss, respectively. The output of the basic models was used to train the hybrid model structures. Experiments showed that enhance cross-entropy loss can effectively accelerate the training process of the basic models and improve the performance of the models under various evaluation metrics. The proposed hybrid model structures can also improve DR classification performance. Compared with the best-performing results in the basic models, the accuracy of DR classification was improved from 85.44% to 86.34%, the sensitivity was improved from 98.48% to 98.77%, the specificity was improved from 71.82% to 74.76%, the precision was improved from 90.27% to 91.37%, and the F1 score was improved from 93.62% to 93.9% by using hybrid model structures.

Download Full-text

Exactly satisfying initial conditions neural network models for numerical treatment of first Painlevé equation

Applied Soft Computing ◽

10.1016/j.asoc.2014.10.009 ◽

2015 ◽

Vol 26 ◽

pp. 244-256 ◽

Cited By ~ 36

Author(s):

Muhammad Asif Zahoor Raja ◽

Junaid Ali Khan ◽

A.M. Siddiqui ◽

D. Behloul ◽

T. Haroon ◽

...

Keyword(s):

Neural Network ◽

Initial Conditions ◽

Network Models ◽

Painlevé Equation ◽

Numerical Treatment ◽

Neural Network Models ◽

Painleve Equation

Download Full-text

Recurrent neural networks can explain flexible trading of speed and accuracy in biological vision

10.1101/677237 ◽

2019 ◽

Cited By ~ 6

Author(s):

Courtney J Spoerer ◽

Tim C Kietzmann ◽

Johannes Mehrer ◽

Ian Charest ◽

Nikolaus Kriegeskorte

Keyword(s):

Neural Network ◽

Neural Networks ◽

Computer Vision ◽

Visual Recognition ◽

Network Models ◽

Neural Network Models ◽

Biological Vision ◽

Visual Systems ◽

Confidence Threshold ◽

Recurrent Processing

AbstractDeep feedforward neural network models of vision dominate in both computational neuroscience and engineering. The primate visual system, by contrast, contains abundant recurrent connections. Recurrent signal flow enables recycling of limited computational resources over time, and so might boost the performance of a physically finite brain or model. Here we show: (1) Recurrent convolutional neural network models outperform feedforward convolutional models matched in their number of parameters in large-scale visual recognition tasks on natural images. (2) Setting a confidence threshold, at which recurrent computations terminate and a decision is made, enables flexible trading of speed for accuracy. At a given confidence threshold, the model expends more time and energy on images that are harder to recognise, without requiring additional parameters for deeper computations. (3) The recurrent model’s reaction time for an image predicts the human reaction time for the same image better than several parameter-matched and state-of-the-art feedforward models. (4) Across confidence thresholds, the recurrent model emulates the behaviour of feedforward control models in that it achieves the same accuracy at approximately the same computational cost (mean number of floating-point operations). However, the recurrent model can be run longer (higher confidence threshold) and then outperforms parameter-matched feedforward comparison models. These results suggest that recurrent connectivity, a hallmark of biological visual systems, may be essential for understanding the accuracy, flexibility, and dynamics of human visual recognition.Author summaryDeep neural networks provide the best current models of biological vision and achieve the highest performance in computer vision. Inspired by the primate brain, these models transform the image signals through a sequence of stages, leading to recognition. Unlike brains in which outputs of a given computation are fed back into the same computation, these models do not process signals recurrently. The ability to recycle limited neural resources by processing information recurrently could explain the accuracy and flexibility of biological visual systems, which computer vision systems cannot yet match. Here we report that recurrent processing can improve recognition performance compared to similarly complex feedforward networks. Recurrent processing also enabled models to behave more flexibly and trade off speed for accuracy. Like humans, the recurrent network models can compute longer when an object is hard to recognise, which boosts their accuracy. The model’s recognition times predicted human recognition times for the same images. The performance and flexibility of recurrent neural network models illustrates that modeling biological vision can help us improve computer vision.

Download Full-text

Generation of scale-invariant sequential activity in linear recurrent networks

10.1101/580522 ◽

2019 ◽

Cited By ~ 1

Author(s):

Yue Liu ◽

Marc W. Howard

Keyword(s):

Neural Network ◽

Recurrent Neural Network ◽

Initial Conditions ◽

A Priori ◽

Network Models ◽

Neural Mechanism ◽

Natural World ◽

Neural Network Models ◽

Scale Invariant ◽

Wide Range

AbstractSequential neural activity has been observed in many parts of the brain and has been proposed as a neural mechanism for memory. The natural world expresses temporal relationships at a wide range of scales. Because we cannot know the relevant scales a priori it is desirable that memory, and thus the generated sequences, are scale-invariant. Although recurrent neural network models have been proposed as a mechanism for generating sequences, the requirements for scale-invariant sequences are not known. This paper reports the constraints that enable a linear recurrent neural network model to generate scale-invariant sequential activity. A straightforward eigendecomposition analysis results in two independent conditions that are required for scaleinvariance for connectivity matrices with real, distinct eigenvalues. First, the eigenvalues of the network must be geometrically spaced. Second, the eigenvectors must be related to one another via translation. These constraints are easily generalizable for matrices that have complex and distinct eigenvalues. Analogous albeit less compact constraints hold for matrices with degenerate eigenvalues. These constraints, along with considerations on initial conditions, provide a general recipe to build linear recurrent neural networks that support scale-invariant sequential activity.

Download Full-text

Neural classifiers with limited connectivity and recurrent readouts

10.1101/157289 ◽

2017 ◽

Author(s):

Lyudmila Kushnir ◽

Stefano Fusi

Keyword(s):

Neural Network ◽

Long Range ◽

Network Models ◽

Classification Performance ◽

Network Architectures ◽

Huge Number ◽

Neural Network Models ◽

Random Patterns ◽

Scalable Network ◽

The Brain

AbstractFor many neural network models in which neurons are trained to classify inputs like perceptrons, the number of inputs that can be classified is limited by the connectivity of each neuron, even when the total number of neurons is very large. This poses the problem of how the biological brain can take advantage of its huge number of neurons given that the connectivity is sparse. One solution is to combine multiple perceptrons together, as in committee machines. The number of classifiable random patterns would then grow linearly with the number of perceptrons, even when each perceptron has limited connectivity. However, the problem is moved to the downstream readout neurons, which would need a number of connections that is as large as the number of perceptrons. Here we propose a different approach in which the readout is implemented by connecting multiple perceptrons in a recurrent attractor neural network. We prove analytically that the number of classifiable random patterns can grow unboundedly with the number of perceptrons, even when the connectivity of each perceptron remains finite. Most importantly, both the recurrent connectivity and the connectivity of downstream readouts also remain finite. Our study shows that feed-forward neural classifiers with numerous long range afferent connections can be replaced by recurrent networks with sparse long range connectivity without sacrificing the classification performance. Our strategy could be used to design more general scalable network architectures with limited connectivity, which resemble more closely the brain neural circuits which are dominated by recurrent connectivity.

Download Full-text

Hybrid Dense Network with Dual Attention for Hyperspectral Image Classification

Remote Sensing ◽

10.3390/rs13234921 ◽

2021 ◽

Vol 13 (23) ◽

pp. 4921

Author(s):

Jinling Zhao ◽

Lei Hu ◽

Yingying Dong ◽

Linsheng Huang

Keyword(s):

Classification Accuracy ◽

Hyperspectral Image ◽

Network Models ◽

Classification Performance ◽

Dense Network ◽

Feature Maps ◽

Neural Network Models ◽

Practical Applications ◽

Training Samples ◽

3D Cnn

Hyperspectral images (HSIs) have been widely used in many fields of application, but it is still extremely challenging to obtain higher classification accuracy, especially when facing a smaller number of training samples in practical applications. It is very time-consuming and laborious to acquire enough labeled samples. Consequently, an efficient hybrid dense network was proposed based on a dual-attention mechanism, due to limited training samples and unsatisfactory classification accuracy. The stacked autoencoder was first used to reduce the dimensions of HSIs. A hybrid dense network framework with two feature-extraction branches was then established in order to extract abundant spectral–spatial features from HSIs, based on the 3D and 2D convolutional neural network models. In addition, spatial attention and channel attention were jointly introduced in order to achieve selective learning of features derived from HSIs. The feature maps were further refined, and more important features could be retained. To improve computational efficiency and prevent the overfitting, the batch normalization layer and the dropout layer were adopted. The Indian Pines, Pavia University, and Salinas datasets were selected to evaluate the classification performance; 5%, 1%, and 1% of classes were randomly selected as training samples, respectively. In comparison with the REF-SVM, 3D-CNN, HybridSN, SSRN, and R-HybridSN, the overall accuracy of our proposed method could still reach 96.80%, 98.28%, and 98.85%, respectively. Our results show that this method can achieve a satisfactory classification performance even in the case of fewer training samples.

Download Full-text

Generation of Scale-Invariant Sequential Activity in Linear Recurrent Networks

Neural Computation ◽

10.1162/neco_a_01288 ◽

2020 ◽

Vol 32 (7) ◽

pp. 1379-1407

Author(s):

Yue Liu ◽

Marc W. Howard

Keyword(s):

Neural Network ◽

Recurrent Neural Network ◽

Initial Conditions ◽

A Priori ◽

Network Models ◽

Neural Mechanism ◽

Natural World ◽

Neural Network Models ◽

Scale Invariant ◽

Wide Range

Sequential neural activity has been observed in many parts of the brain and has been proposed as a neural mechanism for memory. The natural world expresses temporal relationships at a wide range of scales. Because we cannot know the relevant scales a priori, it is desirable that memory, and thus the generated sequences, is scale invariant. Although recurrent neural network models have been proposed as a mechanism for generating sequences, the requirements for scale-invariant sequences are not known. This letter reports the constraints that enable a linear recurrent neural network model to generate scale-invariant sequential activity. A straightforward eigendecomposition analysis results in two independent conditions that are required for scale invariance for connectivity matrices with real, distinct eigenvalues. First, the eigenvalues of the network must be geometrically spaced. Second, the eigenvectors must be related to one another via translation. These constraints are easily generalizable for matrices that have complex and distinct eigenvalues. Analogous albeit less compact constraints hold for matrices with degenerate eigenvalues. These constraints, along with considerations on initial conditions, provide a general recipe to build linear recurrent neural networks that support scale-invariant sequential activity.

Download Full-text

Binary Classification from Positive Data with Skewed Confidence

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2020/460 ◽

2020 ◽

Author(s):

Kazuhiko Shinoda ◽

Hirotaka Kaji ◽

Masashi Sugiyama

Keyword(s):

Linear Models ◽

Negative Impact ◽

Binary Classification ◽

Network Models ◽

Classification Performance ◽

Misclassification Rate ◽

Benchmark Problems ◽

Neural Network Models ◽

Positive Data ◽

Parameterized Model

Positive-confidence (Pconf) classification [Ishida et al., 2018] is a promising weakly-supervised learning method which trains a binary classifier only from positive data equipped with confidence. However, in practice, the confidence may be skewed by bias arising in an annotation process. The Pconf classifier cannot be properly learned with skewed confidence, and consequently, the classification performance might be deteriorated. In this paper, we introduce the parameterized model of the skewed confidence, and propose the method for selecting the hyperparameter which cancels out the negative impact of the skewed confidence under the assumption that we have the misclassification rate of positive samples as a prior knowledge. We demonstrate the effectiveness of the proposed method through a synthetic experiment with simple linear models and benchmark problems with neural network models. We also apply our method to drivers’ drowsiness prediction to show that it works well with a real-world problem where confidence is obtained based on manual annotation.

Download Full-text