A Visual Encoding Model Based on Contrastive Self-Supervised Learning for Human Brain Activity along the Ventral Visual Stream

Visual encoding models are important computational models for understanding how information is processed along the visual stream. Many improved visual encoding models have been developed from the perspective of the model architecture and the learning objective, but these are limited to the supervised learning method. From the view of unsupervised learning mechanisms, this paper utilized a pre-trained neural network to construct a visual encoding model based on contrastive self-supervised learning for the ventral visual stream measured by functional magnetic resonance imaging (fMRI). We first extracted features using the ResNet50 model pre-trained in contrastive self-supervised learning (ResNet50-CSL model), trained a linear regression model for each voxel, and finally calculated the prediction accuracy of different voxels. Compared with the ResNet50 model pre-trained in a supervised classification task, the ResNet50-CSL model achieved an equal or even relatively better encoding performance in multiple visual cortical areas. Moreover, the ResNet50-CSL model performs hierarchical representation of input visual stimuli, which is similar to the human visual cortex in its hierarchical information processing. Our experimental results suggest that the encoding model based on contrastive self-supervised learning is a strong computational model to compete with supervised models, and contrastive self-supervised learning proves an effective learning method to extract human brain-like representations.

Download Full-text

Neural dynamics of real-world object vision that guide behaviour

10.1101/147298 ◽

2017 ◽

Cited By ~ 4

Author(s):

Radoslaw M. Cichy ◽

Nikolaus Kriegeskorte ◽

Kamila M. Jozwik ◽

Jasper J.F. van den Bosch ◽

Ian Charest

Keyword(s):

Human Brain ◽

Real World ◽

Brain Activity ◽

Stimulus Onset ◽

Visual Features ◽

Object Representations ◽

Neuronal Dynamics ◽

Visual Stream ◽

Semantic Categories ◽

Ventral Visual Stream

1AbstractVision involves complex neuronal dynamics that link the sensory stream to behaviour. To capture the richness and complexity of the visual world and the behaviour it entails, we used an ecologically valid task with a rich set of real-world object images. We investigated how human brain activity, resolved in space with functional MRI and in time with magnetoencephalography, links the sensory stream to behavioural responses. We found that behaviour-related brain activity emerged rapidly in the ventral visual pathway within 200ms of stimulus onset. The link between stimuli, brain activity, and behaviour could not be accounted for by either category membership or visual features (as provided by an artificial deep neural network model). Our results identify behaviourally-relevant brain activity during object vision, and suggest that object representations guiding behaviour are complex and can neither be explained by visual features or semantic categories alone. Our findings support the view that visual representations in the ventral visual stream need to be understood in terms of their relevance to behaviour, and highlight the importance of complex behavioural assessment for human brain mapping.

Download Full-text

Siamese Reconstruction Network: Accurate Image Reconstruction from Human Brain Activity by Learning to Compare

Applied Sciences ◽

10.3390/app9224749 ◽

2019 ◽

Vol 9 (22) ◽

pp. 4749

Author(s):

Lingyun Jiang ◽

Kai Qiao ◽

Linyuan Wang ◽

Chi Zhang ◽

Jian Chen ◽

...

Keyword(s):

Deep Learning ◽

Human Brain ◽

Brain Activity ◽

Feature Space ◽

Training Data ◽

Reconstruction Method ◽

Learning Method ◽

Training Samples ◽

Visual Reconstruction ◽

Relationship Of

Decoding human brain activities, especially reconstructing human visual stimuli via functional magnetic resonance imaging (fMRI), has gained increasing attention in recent years. However, the high dimensionality and small quantity of fMRI data impose restrictions on satisfactory reconstruction, especially for the reconstruction method with deep learning requiring huge amounts of labelled samples. When compared with the deep learning method, humans can recognize a new image because our human visual system is naturally capable of extracting features from any object and comparing them. Inspired by this visual mechanism, we introduced the mechanism of comparison into deep learning method to realize better visual reconstruction by making full use of each sample and the relationship of the sample pair by learning to compare. In this way, we proposed a Siamese reconstruction network (SRN) method. By using the SRN, we improved upon the satisfying results on two fMRI recording datasets, providing 72.5% accuracy on the digit dataset and 44.6% accuracy on the character dataset. Essentially, this manner can increase the training data about from n samples to 2n sample pairs, which takes full advantage of the limited quantity of training samples. The SRN learns to converge sample pairs of the same class or disperse sample pairs of different class in feature space.

Download Full-text

Searching through functional space reveals distributed visual, auditory, and semantic coding in the human brain

10.1101/2020.04.20.052175 ◽

2020 ◽

Author(s):

Sreejan Kumar ◽

Cameron T. Ellis ◽

Thomas O’Connell ◽

Marvin M Chun ◽

Nicholas B. Turk-Browne

Keyword(s):

Human Brain ◽

Language Processing ◽

Brain Function ◽

Computational Models ◽

Brain Activity ◽

Functional Space ◽

Semantic Features ◽

Brain Functions ◽

Auditory Features ◽

The Brain

AbstractThe extent to which brain functions are localized or distributed is a foundational question in neuroscience. In the human brain, common fMRI methods such as cluster correction, atlas parcellation, and anatomical searchlight are biased by design toward finding localized representations. Here we introduce the functional searchlight approach as an alternative to anatomical searchlight analysis, the most commonly used exploratory multivariate fMRI technique. Functional searchlight removes any anatomical bias by grouping voxels based only on functional similarity and ignoring anatomical proximity. We report evidence that visual and auditory features from deep neural networks and semantic features from a natural language processing model are more widely distributed across the brain than previously acknowledged. This approach provides a new way to evaluate and constrain computational models with brain activity and pushes our understanding of human brain function further along the spectrum from strict modularity toward distributed representation.

Download Full-text

Reinforcement learning mechanisms in the human brain: Insights from model-based fMRI

Neuroimaging of Human MemoryLinking cognitive processes to neural systems ◽

10.1093/acprof:oso/9780199217298.003.0004 ◽

2009 ◽

pp. 45-64 ◽

Cited By ~ 1

Author(s):

John P O'Doherty

Keyword(s):

Reinforcement Learning ◽

Human Brain ◽

Learning Mechanisms ◽

Model Based

Download Full-text

Activation Timecourse of Ventral Visual Stream Object-recognition Areas: High Density Electrical Mapping of Perceptual Closure Processes

Journal of Cognitive Neuroscience ◽

10.1162/089892900562372 ◽

2000 ◽

Vol 12 (4) ◽

pp. 615-621 ◽

Cited By ~ 161

Author(s):

Glen M. Doniger ◽

John J. Foxe ◽

Micah M. Murray ◽

Beth A. Higgins ◽

Joan Gay Snodgrass ◽

...

Keyword(s):

Object Recognition ◽

Partial Information ◽

Brain Activity ◽

Event Related Potentials ◽

High Density ◽

Visual Stream ◽

Density Mapping ◽

Related Potentials ◽

Ventral Visual Stream ◽

Perceptual Closure

Object recognition is achieved even in circumstances when only partial information is available to the observer. Perceptual closure processes are essential in enabling such recognitions to occur. We presented successively less fragmented images while recording high-density event-related potentials (ERPs), which permitted us to monitor brain activity during the perceptual closure processes leading up to object recognition. We reveal a bilateral ERP component (Ncl) that tracks these processes (onsets ∼ 230 msec, maximal at ∼290 msec). Scalp-current density mapping of the Ncl revealed bilateral occipito-temporal scalp foci, which are consistent with generators in the human ventral visual stream, and specifically the lateral-occipital or LO complex as defined by hemodynamic studies of object recognition.

Download Full-text

Study on Representation Invariances of CNNs and Human Visual Information Processing Based on Data Augmentation

Brain Sciences ◽

10.3390/brainsci10090602 ◽

2020 ◽

Vol 10 (9) ◽

pp. 602

Author(s):

Yibo Cui ◽

Chi Zhang ◽

Kai Qiao ◽

Linyuan Wang ◽

Bin Yan ◽

...

Keyword(s):

Information Processing ◽

Visual Information ◽

Data Augmentation ◽

Visual Information Processing ◽

Deep Convolutional Neural Networks ◽

Invariant Representation ◽

Visual Stream ◽

Visual Encoding ◽

Before And After ◽

Ventral Visual Stream

Representation invariance plays a significant role in the performance of deep convolutional neural networks (CNNs) and human visual information processing in various complicated image-based tasks. However, there has been abounding confusion concerning the representation invariance mechanisms of the two sophisticated systems. To investigate their relationship under common conditions, we proposed a representation invariance analysis approach based on data augmentation technology. Firstly, the original image library was expanded by data augmentation. The representation invariances of CNNs and the ventral visual stream were then studied by comparing the similarities of the corresponding layer features of CNNs and the prediction performance of visual encoding models based on functional magnetic resonance imaging (fMRI) before and after data augmentation. Our experimental results suggest that the architecture of CNNs, combinations of convolutional and fully-connected layers, developed representation invariance of CNNs. Remarkably, we found representation invariance belongs to all successive stages of the ventral visual stream. Hence, the internal correlation between CNNs and the human visual system in representation invariance was revealed. Our study promotes the advancement of invariant representation of computer vision and deeper comprehension of the representation invariance mechanism of human visual information processing.

Download Full-text

Searching through functional space reveals distributed visual, auditory, and semantic coding in the human brain

PLoS Computational Biology ◽

10.1371/journal.pcbi.1008457 ◽

2020 ◽

Vol 16 (12) ◽

pp. e1008457

Author(s):

Sreejan Kumar ◽

Cameron T. Ellis ◽

Thomas P. O’Connell ◽

Marvin M. Chun ◽

Nicholas B. Turk-Browne

Keyword(s):

Human Brain ◽

Language Processing ◽

Computational Models ◽

Brain Activity ◽

Functional Space ◽

Semantic Features ◽

Object Representations ◽

Brain Functions ◽

Improve Model ◽

Decoding Accuracy

The extent to which brain functions are localized or distributed is a foundational question in neuroscience. In the human brain, common fMRI methods such as cluster correction, atlas parcellation, and anatomical searchlight are biased by design toward finding localized representations. Here we introduce the functional searchlight approach as an alternative to anatomical searchlight analysis, the most commonly used exploratory multivariate fMRI technique. Functional searchlight removes any anatomical bias by grouping voxels based only on functional similarity and ignoring anatomical proximity. We report evidence that visual and auditory features from deep neural networks and semantic features from a natural language processing model, as well as object representations, are more widely distributed across the brain than previously acknowledged and that functional searchlight can improve model-based similarity and decoding accuracy. This approach provides a new way to evaluate and constrain computational models with brain activity and pushes our understanding of human brain function further along the spectrum from strict modularity toward distributed representation.

Download Full-text

A Robust Neural Fingerprint of Cinematic Shot-Scale

Projections ◽

10.3167/proj.2019.130303 ◽

2019 ◽

Vol 13 (3) ◽

pp. 23-52

Author(s):

András Bálint Kovács ◽

Gal Raz ◽

Giancarlo Valente ◽

Michele Svanera ◽

Sergio Benini

Keyword(s):

Time Series ◽

Apparent Distance ◽

Visual Pathway ◽

Machine Learning Method ◽

Manual Annotation ◽

Learning Method ◽

Visual Stream ◽

Ventral Visual Stream ◽

The One ◽

Movement Monitoring

This article provides evidence for the existence of a robust “brainprint” of cinematic shot-scales that generalizes across movies, genres, and viewers. We applied a machine-learning method on a dataset of 234 fMRI scans taken during the viewing of a movie excerpt. Based on a manual annotation of shot-scales in five movies, we generated a computational model that predicts time series of this feature. The model was then applied on fMRI data obtained from new participants who either watched excerpts from the movies or clips from new movies. The predicted shot-scale time series that were based on our model significantly correlated with the original annotation in all nine cases. The spatial structure of the model indicates that the empirical experience of cinematic close-ups correlates with the activation of the ventral visual stream, the centromedial amygdala, and components of the mentalization network, while the experience of long shots correlates with the activation of the dorsal visual pathway and the parahippocampus. The shot-scale brainprint is also in line with the notion that this feature is informed among other factors by perceived apparent distance. Based on related theoretical and empirical findings we suggest that the empirical experience of close and far shots implicates different mental models: concrete and contextualized perception dominated by recognition and visual and semantic memory on the one hand, and action-related processing supporting orientation and movement monitoring on the other.

Download Full-text

Recurrent computations for visual pattern completion

Proceedings of the National Academy of Sciences ◽

10.1073/pnas.1719397115 ◽

2018 ◽

Vol 115 (35) ◽

pp. 8835-8840 ◽

Cited By ~ 54

Author(s):

Hanlin Tang ◽

Martin Schrimpf ◽

William Lotter ◽

Charlotte Moerman ◽

Ana Paredes ◽

...

Keyword(s):

Computational Models ◽

Partial Information ◽

Recognition Performance ◽

Backward Masking ◽

Visual Stream ◽

Strong Argument ◽

Pattern Completion ◽

Occluded Objects ◽

Ventral Visual Stream

Making inferences from partial information constitutes a critical aspect of cognition. During visual perception, pattern completion enables recognition of poorly visible or occluded objects. We combined psychophysics, physiology, and computational models to test the hypothesis that pattern completion is implemented by recurrent computations and present three pieces of evidence that are consistent with this hypothesis. First, subjects robustly recognized objects even when they were rendered <15% visible, but recognition was largely impaired when processing was interrupted by backward masking. Second, invasive physiological responses along the human ventral cortex exhibited visually selective responses to partially visible objects that were delayed compared with whole objects, suggesting the need for additional computations. These physiological delays were correlated with the effects of backward masking. Third, state-of-the-art feed-forward computational architectures were not robust to partial visibility. However, recognition performance was recovered when the model was augmented with attractor-based recurrent connectivity. The recurrent model was able to predict which images of heavily occluded objects were easier or harder for humans to recognize, could capture the effect of introducing a backward mask on recognition behavior, and was consistent with the physiological delays along the human ventral visual stream. These results provide a strong argument of plausibility for the role of recurrent computations in making visual inferences from partial information.

Download Full-text

The functional neuroanatomy of face perception: from brain measurements to deep neural networks

Interface Focus ◽

10.1098/rsfs.2018.0013 ◽

2018 ◽

Vol 8 (4) ◽

pp. 20180013 ◽

Cited By ~ 22

Author(s):

Kalanit Grill-Spector ◽

Kevin S. Weiner ◽

Jesse Gomez ◽

Anthony Stigliani ◽

Vaidehi S. Natu

Keyword(s):

Neural Networks ◽

Computational Models ◽

Human Performance ◽

Deep Neural Networks ◽

Functional Neuroanatomy ◽

Functional Architecture ◽

Visual Stream ◽

Ventral Visual Stream ◽

The Brain ◽

New Framework

A central goal in neuroscience is to understand how processing within the ventral visual stream enables rapid and robust perception and recognition. Recent neuroscientific discoveries have significantly advanced understanding of the function, structure and computations along the ventral visual stream that serve as the infrastructure supporting this behaviour. In parallel, significant advances in computational models, such as hierarchical deep neural networks (DNNs), have brought machine performance to a level that is commensurate with human performance. Here, we propose a new framework using the ventral face network as a model system to illustrate how increasing the neural accuracy of present DNNs may allow researchers to test the computational benefits of the functional architecture of the human brain. Thus, the review (i) considers specific neural implementational features of the ventral face network, (ii) describes similarities and differences between the functional architecture of the brain and DNNs, and (iii) provides a hypothesis for the computational value of implementational features within the brain that may improve DNN performance. Importantly, this new framework promotes the incorporation of neuroscientific findings into DNNs in order to test the computational benefits of fundamental organizational features of the visual system.

Download Full-text