Modeling Attention Control Using A Convolutional Neural Network Designed After The Ventral Visual Pathway

AbstractRecently we proposed that people represent object categories using category-consistent features (CCFs), those features that occur both frequently and consistently across a categorys exemplars [70]. Here we designed a Convolutional Neural Network (CNN) after the primate ventral stream (VsNet) and used it to extract CCFs from 68 categories of objects spanning a three-level category hierarchy. We evaluated VsNet against people searching for the same targets from the same 68 categories. Not only did VsNet replicate our previous report of stronger attention guidance to subordinate-level targets, with its more powerful CNN-CCFs it was able to predict attention control to individual target categories–the more CNN-CCFs extracted for a category, the faster gaze was directed to the target. We also probed VsNet to determine where in its network of layers these attention control signals originate. We found that CCFs extracted from VsNet’s V1 layer contributed most to guiding attention to targets cued at the subordinate (e.g., police car) and basic (e.g., car) levels, but that guidance to superordinate-cued (e.g., vehicle) targets was strongest using CCFs from the CIT+AIT layer. We also identified the image patches eliciting the strongest filter responses from areas V4 and higher and found that they depicted representative parts of an object category (e.g., advertisements appearing on top of taxi cabs). Finally, we found that VsNet better predicted attention control than comparable CNN models, despite having fewer convolutional filters. This work shows that a brain-inspired CNN can predict goal-directed attention control by extracting and using category-consistent features.

Download Full-text

Modelling attention control using a convolutional neural network designed after the ventral visual pathway

Visual Cognition ◽

10.1080/13506285.2019.1661927 ◽

2019 ◽

Vol 27 (5-8) ◽

pp. 416-434

Author(s):

Chen-Ping Yu ◽

Huidong Liu ◽

Dimitrios Samaras ◽

Gregory J. Zelinsky

Keyword(s):

Neural Network ◽

Convolutional Neural Network ◽

Visual Pathway ◽

Attention Control ◽

Ventral Visual Pathway

Download Full-text

Emergence of a compositional neural code for written words: Recycling of a convolutional neural network for reading

Proceedings of the National Academy of Sciences ◽

10.1073/pnas.2104779118 ◽

2021 ◽

Vol 118 (46) ◽

pp. e2104779118

Author(s):

T. Hannagan ◽

A. Agrawal ◽

L. Cohen ◽

S. Dehaene

Keyword(s):

Neural Network ◽

Convolutional Neural Network ◽

Language Processing ◽

Visual Pathway ◽

Neural Code ◽

Letter Recognition ◽

Invariant Representation ◽

Spoken Language Processing ◽

Visual Word Form Area ◽

Ventral Visual Pathway

The visual word form area (VWFA) is a region of human inferotemporal cortex that emerges at a fixed location in the occipitotemporal cortex during reading acquisition and systematically responds to written words in literate individuals. According to the neuronal recycling hypothesis, this region arises through the repurposing, for letter recognition, of a subpart of the ventral visual pathway initially involved in face and object recognition. Furthermore, according to the biased connectivity hypothesis, its reproducible localization is due to preexisting connections from this subregion to areas involved in spoken-language processing. Here, we evaluate those hypotheses in an explicit computational model. We trained a deep convolutional neural network of the ventral visual pathway, first to categorize pictures and then to recognize written words invariantly for case, font, and size. We show that the model can account for many properties of the VWFA, particularly when a subset of units possesses a biased connectivity to word output units. The network develops a sparse, invariant representation of written words, based on a restricted set of reading-selective units. Their activation mimics several properties of the VWFA, and their lesioning causes a reading-specific deficit. The model predicts that, in literate brains, written words are encoded by a compositional neural code with neurons tuned either to individual letters and their ordinal position relative to word start or word ending or to pairs of letters (bigrams).

Download Full-text

Modeling categorical search guidance using a convolutional neural network designed after the ventral visual pathway

Journal of Vision ◽

10.1167/17.10.88 ◽

2017 ◽

Vol 17 (10) ◽

pp. 88

Author(s):

Gregory Zelinsky ◽

Chen-Ping Yu

Keyword(s):

Neural Network ◽

Convolutional Neural Network ◽

Visual Pathway ◽

Ventral Visual Pathway ◽

Search Guidance

Download Full-text

Simulating the emergence of the Visual Word Form Area: Recycling a convolutional neural network for reading

10.1101/2021.02.15.431235 ◽

2021 ◽

Author(s):

T. Hannagan ◽

A. Agrawal ◽

L. Cohen ◽

S. Dehaene

Keyword(s):

Neural Network ◽

Convolutional Neural Network ◽

Language Processing ◽

Visual Pathway ◽

Visual Word ◽

Letter Recognition ◽

Word Form ◽

Invariant Representation ◽

Visual Word Form Area ◽

Ventral Visual Pathway

AbstractThe visual word form area (VWFA) is a region of human inferotemporal cortex that emerges at a fixed location in occipitotemporal cortex during reading acquisition, and systematically responds to written words in literate individuals. According to the neuronal recycling hypothesis, this region arises through the repurposing, for letter recognition, of a subpart of the ventral visual pathway initially involved in face and object recognition. Furthermore, according to the biased connectivity hypothesis, its universal localization is due to pre-existing connections from this subregion to areas involved in spoken language processing. Here, we evaluate those hypotheses in an explicit computational model. We trained a deep convolutional neural network of the ventral visual pathway, first to categorize pictures, and then to recognize written words invariantly for case, font and size. We show that the model can account for many properties of the VWFA, particularly when a subset of units possesses a biased connectivity to word output units. The network develops a sparse, invariant representation of written words, based on a restricted set of reading-selective units. Their activation mimics several properties of the VWFA, and their lesioning causes a reading-specific deficit. Our simulation fleshes out the neuronal recycling hypothesis, and make several testable predictions concerning the neural code for written words.

Download Full-text

Are Image Patches Beneficial for Initializing Convolutional Neural Network Models?

Proceedings of the 16th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications ◽

10.5220/0010206603460353 ◽

2021 ◽

Author(s):

Daniel Lehmann ◽

Marc Ebner

Keyword(s):

Neural Network ◽

Convolutional Neural Network ◽

Network Models ◽

Neural Network Models ◽

Image Patches

Download Full-text

Reconstructing feedback representations in ventral visual pathway with a generative adversarial autoencoder

10.1101/2020.07.23.218859 ◽

2020 ◽

Author(s):

Haider Al-Tahan ◽

Yalda Mohsenzadeh

Keyword(s):

Neural Network ◽

Visual Information ◽

Visual Pathway ◽

Functional Magnetic Resonance ◽

Low Level ◽

Ventral Visual Pathway ◽

High Level ◽

Feedback Connections ◽

The Brain ◽

Insight Into

AbstractWhile vision evokes a dense network of feedforward and feedback neural processes in the brain, visual processes are primarily modeled with feedforward hierarchical neural networks, leaving the computational role of feedback processes poorly understood. Here, we developed a generative autoencoder neural network model and adversarially trained it on a categorically diverse data set of images. We hypothesized that the feedback processes in the ventral visual pathway can be represented by reconstruction of the visual information performed by the generative model. We compared representational similarity of the activity patterns in the proposed model with temporal (magnetoencephalography) and spatial (functional magnetic resonance imaging) visual brain responses. The proposed generative model identified two segregated neural dynamics in the visual brain. A temporal hierarchy of processes transforming low level visual information into high level semantics in the feedforward sweep, and a temporally later dynamics of inverse processes reconstructing low level visual information from a high level latent representation in the feedback sweep. Our results append to previous studies on neural feedback processes by presenting a new insight into the algorithmic function and the information carried by the feedback processes in the ventral visual pathway.Author summaryIt has been shown that the ventral visual cortex consists of a dense network of regions with feedforward and feedback connections. The feedforward path processes visual inputs along a hierarchy of cortical areas that starts in early visual cortex (an area tuned to low level features e.g. edges/corners) and ends in inferior temporal cortex (an area that responds to higher level categorical contents e.g. faces/objects). Alternatively, the feedback connections modulate neuronal responses in this hierarchy by broadcasting information from higher to lower areas. In recent years, deep neural network models which are trained on object recognition tasks achieved human-level performance and showed similar activation patterns to the visual brain. In this work, we developed a generative neural network model that consists of encoding and decoding sub-networks. By comparing this computational model with the human brain temporal (magnetoencephalography) and spatial (functional magnetic resonance imaging) response patterns, we found that the encoder processes resemble the brain feedforward processing dynamics and the decoder shares similarity with the brain feedback processing dynamics. These results provide an algorithmic insight into the spatiotemporal dynamics of feedforward and feedback processes in biological vision.

Download Full-text

Inferring Emotion Tags from Object Images Using Convolutional Neural Network

Applied Sciences ◽

10.3390/app10155333 ◽

2020 ◽

Vol 10 (15) ◽

pp. 5333

Author(s):

Anam Manzoor ◽

Waqar Ahmad ◽

Muhammad Ehatisham-ul-Haq ◽

Abdul Hannan ◽

Muhammad Asif Khan ◽

...

Keyword(s):

Neural Network ◽

Convolutional Neural Network ◽

Human Behavior ◽

Real Life ◽

Vital Role ◽

Object Categories ◽

Different Types ◽

Gender Based ◽

High Level ◽

Human Behavior Analysis

Emotions are a fundamental part of human behavior and can be stimulated in numerous ways. In real-life, we come across different types of objects such as cake, crab, television, trees, etc., in our routine life, which may excite certain emotions. Likewise, object images that we see and share on different platforms are also capable of expressing or inducing human emotions. Inferring emotion tags from these object images has great significance as it can play a vital role in recommendation systems, image retrieval, human behavior analysis and, advertisement applications. The existing schemes for emotion tag perception are based on the visual features, like color and texture of an image, which are poorly affected by lightning conditions. The main objective of our proposed study is to address this problem by introducing a novel idea of inferring emotion tags from the images based on object-related features. In this aspect, we first created an emotion-tagged dataset from the publicly available object detection dataset (i.e., “Caltech-256”) using subject evaluation from 212 users. Next, we used a convolutional neural network-based model to automatically extract the high-level features from object images for recognizing nine (09) emotion categories, such as amusement, awe, anger, boredom, contentment, disgust, excitement, fear, and sadness. Experimental results on our emotion-tagged dataset endorse the success of our proposed idea in terms of accuracy, precision, recall, specificity, and F1-score. Overall, the proposed scheme achieved an accuracy rate of approximately 85% and 79% using top-level and bottom-level emotion tagging, respectively. We also performed a gender-based analysis for inferring emotion tags and observed that male and female subjects have discernment in emotions perception concerning different object categories.

Download Full-text