Recurrent convolutional neural networks: a better model of biological object recognition

Feedforward neural networks provide the dominant model of how the brain performs visual object recognition. However, these networks lack the lateral and feedback connections, and the resulting recurrent neuronal dynamics, of the ventral visual pathway in the human and nonhuman primate brain. Here we investigate recurrent convolutional neural networks with bottom-up (B), lateral (L), and top-down (T) connections. Combining these types of connections yields four architectures (B, BT, BL, and BLT), which we systematically test and compare. We hypothesized that recurrent dynamics might improve recognition performance in the challenging scenario of partial occlusion. We introduce two novel occluded object recognition tasks to test the efficacy of the models, digit clutter (where multiple target digits occlude one another) and digit debris (where target digits are occluded by digit fragments). We find that recurrent neural networks outperform feedforward control models (approximately matched in parametric complexity) at recognising objects, both in the absence of occlusion and in all occlusion conditions. Recurrent networks were also found to be more robust to the inclusion of additive Gaussian noise. Recurrent neural networks are better in two respects: (1) they are more neurobiologically realistic than their feedforward counterparts; (2) they are better in terms of their ability to recognise objects, especially under challenging conditions. This work shows that computer vision can benefit from using recurrent convolutional architectures and suggests that the ubiquitous recurrent connections in biological brains are essential for task performance.

Download Full-text

Occluded Visual Object Recognition Using Deep Conditional Generative Adversarial Nets and Feedforward Convolutional Neural Networks

2020 International Conference on Machine Vision and Image Processing (MVIP) ◽

10.1109/mvip49855.2020.9116887 ◽

2020 ◽

Author(s):

Vahid Reza Khazaie ◽

Alireza AkhavanPour ◽

Reza Ebrahimpour

Keyword(s):

Neural Networks ◽

Object Recognition ◽

Convolutional Neural Networks ◽

Visual Object ◽

Visual Object Recognition

Download Full-text

Deep Convolutional Neural Networks Based on Image Data Augmentation for Visual Object Recognition

Intelligent Data Engineering and Automated Learning – IDEAL 2019 - Lecture Notes in Computer Science ◽

10.1007/978-3-030-33607-3_51 ◽

2019 ◽

pp. 476-485

Author(s):

Khaoula Jayech

Keyword(s):

Neural Networks ◽

Object Recognition ◽

Convolutional Neural Networks ◽

Data Augmentation ◽

Image Data ◽

Visual Object ◽

Visual Object Recognition ◽

Deep Convolutional Neural Networks

Download Full-text

Faculty Opinions recommendation of Comparison of deep neural networks to spatio-temporal cortical dynamics of human visual object recognition reveals hierarchical correspondence.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.726413891.793534418 ◽

2017 ◽

Author(s):

Odelia Schwartz

Keyword(s):

Neural Networks ◽

Object Recognition ◽

Deep Neural Networks ◽

Visual Object ◽

Visual Object Recognition ◽

Cortical Dynamics ◽

Spatio Temporal

Download Full-text

Orthogonal Representations of Object Shape and Category in Deep Convolutional Neural Networks and Human Visual Cortex

10.1101/555193 ◽

2019 ◽

Author(s):

Astrid A. Zeman ◽

J. Brendan Ritchie ◽

Stefania Bracci ◽

Hans Op de Beeck

Keyword(s):

Neural Networks ◽

Visual Cortex ◽

Convolutional Neural Networks ◽

Network Performance ◽

Temporal Cortex ◽

Visual Object ◽

Visual Object Recognition ◽

Deep Convolutional Neural Networks ◽

Shape Information ◽

Category Information

AbstractDeep Convolutional Neural Networks (CNNs) are gaining traction as the benchmark model of visual object recognition, with performance now surpassing humans. While CNNs can accurately assign one image to potentially thousands of categories, network performance could be the result of layers that are tuned to represent the visual shape of objects, rather than object category, since both are often confounded in natural images. Using two stimulus sets that explicitly dissociate shape from category, we correlate these two types of information with each layer of multiple CNNs. We also compare CNN output with fMRI activation along the human visual ventral stream by correlating artificial with biological representations. We find that CNNs encode category information independently from shape, peaking at the final fully connected layer in all tested CNN architectures. Comparing CNNs with fMRI brain data, early visual cortex (V1) and early layers of CNNs encode shape information. Anterior ventral temporal cortex encodes category information, which correlates best with the final layer of CNNs. The interaction between shape and category that is found along the human visual ventral pathway is echoed in multiple deep networks. Our results suggest CNNs represent category information independently from shape, much like the human visual system.

Download Full-text

Cross-Modal Environment Self-Adaptation During Object Recognition in Artificial Cognitive Systems

10.21203/rs.3.rs-754574/v1 ◽

2021 ◽

Author(s):

David Miralles ◽

Guillem Garrofé ◽

Calota Parés ◽

Alejandro González ◽

Gerard Serra ◽

...

Keyword(s):

Object Recognition ◽

Large Scale ◽

Recognition Performance ◽

Visual Object ◽

Cognitive Systems ◽

Visual Object Recognition ◽

Haptic Information ◽

Visual Channel ◽

Object Exploration ◽

The Creation

Abstract The cognitive connection between the senses of touch and vision is probably the best-known case of cross-modality. Recent discoveries suggest that the mapping between both senses is learned rather than innate. These evidences open the door to a dynamic cross-modality that allows individuals to adaptively develop within their environment. Mimicking this aspect of human learning, we propose a new cross-modal mechanism that allows artificial cognitive systems (ACS) to adapt quickly to unforeseen perceptual anomalies generated by the environment or by the system itself. In this context, visual recognition systems have advanced remarkably in recent years thanks to the creation of large-scale datasets together with the advent of deep learning algorithms. However, such advances have not occurred on the haptic mode, mainly due to the lack of two-handed dexterous datasets that allow learning systems to process the tactile information of human object exploration. This data imbalance limits the creation of synchronized multimodal datasets that would enable the development of cross-modality in ACS during object exploration. In this work, we use a multimodal dataset recently generated from tactile sensors placed on a collection of objects that capture haptic data from human manipulation, together with the corresponding visual counterpart. Using this data, we create a cross-modal learning transfer mechanism capable of detecting both sudden and permanent anomalies in the visual channel and still maintain visual object recognition performance by retraining the visual mode for a few minutes using haptic information. Here we show the importance of cross-modality in perceptual awareness and its ecological capabilities to self-adapt to different environments.

Download Full-text

Minimal Recognizable Configurations Elicit Category-selective Responses in Higher Order Visual Cortex

Journal of Cognitive Neuroscience ◽

10.1162/jocn_a_01420 ◽

2019 ◽

Vol 31 (9) ◽

pp. 1354-1367

Author(s):

Yael Holzinger ◽

Shimon Ullman ◽

Daniel Harari ◽

Marlene Behrmann ◽

Galia Avidan

Keyword(s):

Visual Cortex ◽

Object Recognition ◽

Visual Recognition ◽

Recognition Performance ◽

Occipital Cortex ◽

Building Blocks ◽

Visual Object ◽

Visual Object Recognition ◽

Face Area ◽

Early Visual Cortex

Visual object recognition is performed effortlessly by humans notwithstanding the fact that it requires a series of complex computations, which are, as yet, not well understood. Here, we tested a novel account of the representations used for visual recognition and their neural correlates using fMRI. The rationale is based on previous research showing that a set of representations, termed “minimal recognizable configurations” (MIRCs), which are computationally derived and have unique psychophysical characteristics, serve as the building blocks of object recognition. We contrasted the BOLD responses elicited by MIRC images, derived from different categories (faces, objects, and places), sub-MIRCs, which are visually similar to MIRCs, but, instead, result in poor recognition and scrambled, unrecognizable images. Stimuli were presented in blocks, and participants indicated yes/no recognition for each image. We confirmed that MIRCs elicited higher recognition performance compared to sub-MIRCs for all three categories. Whereas fMRI activation in early visual cortex for both MIRCs and sub-MIRCs of each category did not differ from that elicited by scrambled images, high-level visual regions exhibited overall greater activation for MIRCs compared to sub-MIRCs or scrambled images. Moreover, MIRCs and sub-MIRCs from each category elicited enhanced activation in corresponding category-selective regions including fusiform face area and occipital face area (faces), lateral occipital cortex (objects), and parahippocampal place area and transverse occipital sulcus (places). These findings reveal the psychological and neural relevance of MIRCs and enable us to make progress in developing a more complete account of object recognition.

Download Full-text

Fusing bottom-up and top-down pathways in neural networks for visual object recognition

The 2010 International Joint Conference on Neural Networks (IJCNN) ◽

10.1109/ijcnn.2010.5596497 ◽

2010 ◽

Cited By ~ 3

Author(s):

Yuhua Zheng ◽

Yan Meng ◽

Yaochu Jin

Keyword(s):

Neural Networks ◽

Object Recognition ◽

Visual Object ◽

Visual Object Recognition ◽

Top Down ◽

Bottom Up

Download Full-text

Comparison of deep neural networks to spatio-temporal cortical dynamics of human visual object recognition reveals hierarchical correspondence

Scientific Reports ◽

10.1038/srep27755 ◽

2016 ◽

Vol 6 (1) ◽

Cited By ~ 233

Author(s):

Radoslaw Martin Cichy ◽

Aditya Khosla ◽

Dimitrios Pantazis ◽

Antonio Torralba ◽

Aude Oliva

Keyword(s):

Neural Networks ◽

Object Recognition ◽

Deep Neural Networks ◽

Visual Object ◽

Visual Object Recognition ◽

Cortical Dynamics ◽

Spatio Temporal

Download Full-text

Optimization of FireNet for Liver Lesion Classification

Electronics ◽

10.3390/electronics9081237 ◽

2020 ◽

Vol 9 (8) ◽

pp. 1237

Author(s):

Gedeon Kashala Kabe ◽

Yuqing Song ◽

Zhe Liu

Keyword(s):

Neural Networks ◽

Convolutional Neural Networks ◽

Liver Lesion ◽

Superior Performance ◽

Visual Object ◽

Visual Object Recognition ◽

Residual Function ◽

Learning Techniques ◽

Model Size

In recent years, deep learning techniques, and in particular convolutional neural networks (CNNs) methods have demonstrated a superior performance in image classification and visual object recognition. In this work, we propose a classification of four types of liver lesions, namely, hepatocellular carcinoma, metastases, hemangiomas, and healthy tissues using convolutional neural networks with a succinct model called FireNet. We improved speed for quick classification and decreased the model size and the number of parameters by using fire modules from SqueezeNet. We have used bypass connection by adding it around Fire modules for learning a residual function between input and output, and to solve the vanishing gradient problem. We have proposed a new Particle Swarm Optimization (NPSO) to optimize the network parameters in order to further boost the performance of the proposed FireNet. The experimental results show that the parameters of FireNet are 9.5 times smaller than GoogLeNet, 51.6 times smaller than AlexNet, and 75.8 smaller than ResNet. The size of FireNet is reduced 16.6 times smaller than GoogLeNet, 75 times smaller than AlexNet and 76.6 times smaller than ResNet. The final accuracy of our proposed FireNet model was 89.2%.

Download Full-text

Co-Training for Visual Object Recognition Based on Self-Supervised Models Using a Cross-Entropy Regularization

Entropy ◽

10.3390/e23040423 ◽

2021 ◽

Vol 23 (4) ◽

pp. 423

Author(s):

Gabriel Díaz ◽

Billy Peralta ◽

Luis Caro ◽

Orietta Nicolis

Keyword(s):

Neural Networks ◽

Deep Learning ◽

Object Recognition ◽

Training Model ◽

Cross Entropy ◽

Visual Object ◽

Visual Object Recognition ◽

Visual Objects ◽

Learning Techniques ◽

Proposed Model

Automatic recognition of visual objects using a deep learning approach has been successfully applied to multiple areas. However, deep learning techniques require a large amount of labeled data, which is usually expensive to obtain. An alternative is to use semi-supervised models, such as co-training, where multiple complementary views are combined using a small amount of labeled data. A simple way to associate views to visual objects is through the application of a degree of rotation or a type of filter. In this work, we propose a co-training model for visual object recognition using deep neural networks by adding layers of self-supervised neural networks as intermediate inputs to the views, where the views are diversified through the cross-entropy regularization of their outputs. Since the model merges the concepts of co-training and self-supervised learning by considering the differentiation of outputs, we called it Differential Self-Supervised Co-Training (DSSCo-Training). This paper presents some experiments using the DSSCo-Training model to well-known image datasets such as MNIST, CIFAR-100, and SVHN. The results indicate that the proposed model is competitive with the state-of-art models and shows an average relative improvement of 5% in accuracy for several datasets, despite its greater simplicity with respect to more recent approaches.

Download Full-text