Causal evidence for a double dissociation between object- and scene-selective regions of visual cortex: A pre-registered TMS replication study

Mapping Intimacies ◽

10.1101/2020.07.24.219386 ◽

2020 ◽

Author(s):

Miles Wischnewski ◽

Marius V. Peelen

Keyword(s):

Visual Cortex ◽

Functional Neuroimaging ◽

Scene Recognition ◽

Small Sample ◽

Lateral Occipital Complex ◽

Double Dissociation ◽

Scene Processing ◽

High Level ◽

Organizing Principles ◽

Individual Objects

AbstractNatural scenes are characterized by individual objects as well as by global scene properties such as spatial layout. Functional neuroimaging research has shown that this distinction between object and scene processing is one of the main organizing principles of human high-level visual cortex. For example, object-selective regions, including the lateral occipital complex (LOC), were shown to represent object content (but not scene layout), while scene-selective regions, including the occipital place area (OPA), were shown to represent scene layout (but not object content). Causal evidence for a double dissociation between LOC and OPA in representing objects and scenes is currently limited, however. One TMS experiment, conducted in a relatively small sample (N=13), reported an interaction between LOC and OPA stimulation and object and scene recognition performance (Dilks et al., 2013). Here, we present a high-powered pre-registered replication of this study (N=72, including male and female human participants), using group-average fMRI coordinates to target LOC and OPA. Results revealed unambiguous evidence for a double dissociation between LOC and OPA: Relative to vertex stimulation, TMS over LOC selectively impaired the recognition of objects, while TMS over OPA selectively impaired the recognition of scenes. Furthermore, we found that these effects were stable over time and consistent across individual objects and scenes. These results show that LOC and OPA can be reliably and selectively targeted with TMS, even when defined based on group-average fMRI coordinates. More generally, they support the distinction between object and scene processing as an organizing principle of human high-level visual cortex.Significance StatementOur daily-life environments are characterized both by individual objects and by global scene properties. The distinction between object and scene processing features prominently in visual cognitive neuroscience, with fMRI studies showing that this distinction is one of the main organizing principles of human high-level visual cortex. However, causal evidence for the selective involvement of object- and scene-selective regions in processing their preferred category is less conclusive. Here, testing a large sample (N=72) using an established paradigm and a pre-registered protocol, we found that TMS over object-selective cortex (LOC) selectively impaired object recognition while TMS over scene-selective cortex (OPA) selectively impaired scene recognition. These results provide conclusive causal evidence for the distinction between object and scene processing in human visual cortex.

Download Full-text

Faculty Opinions recommendation of Neural representations of faces and limbs neighbor in human high-level visual cortex: evidence for a new organization principle.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.717989843.793473030 ◽

2013 ◽

Author(s):

Winrich Freiwald

Keyword(s):

Visual Cortex ◽

Neural Representations ◽

High Level

Download Full-text

Natural Language Description of Videos for Smart Surveillance

Applied Sciences ◽

10.3390/app11093730 ◽

2021 ◽

Vol 11 (9) ◽

pp. 3730

Author(s):

Aniqa Dilawari ◽

Muhammad Usman Ghani Khan ◽

Yasser D. Al-Otaibi ◽

Zahoor-ur Rehman ◽

Atta-ur Rahman ◽

...

Keyword(s):

Natural Language ◽

Feature Recognition ◽

Scene Recognition ◽

Video Data ◽

Surveillance Video ◽

Video Footage ◽

Parallel Pipeline ◽

September 11 Attacks ◽

Description Framework ◽

High Level

After the September 11 attacks, security and surveillance measures have changed across the globe. Now, surveillance cameras are installed almost everywhere to monitor video footage. Though quite handy, these cameras produce videos in a massive size and volume. The major challenge faced by security agencies is the effort of analyzing the surveillance video data collected and generated daily. Problems related to these videos are twofold: (1) understanding the contents of video streams, and (2) conversion of the video contents to condensed formats, such as textual interpretations and summaries, to save storage space. In this paper, we have proposed a video description framework on a surveillance dataset. This framework is based on the multitask learning of high-level features (HLFs) using a convolutional neural network (CNN) and natural language generation (NLG) through bidirectional recurrent networks. For each specific task, a parallel pipeline is derived from the base visual geometry group (VGG)-16 model. Tasks include scene recognition, action recognition, object recognition and human face specific feature recognition. Experimental results on the TRECViD, UET Video Surveillance (UETVS) and AGRIINTRUSION datasets depict that the model outperforms state-of-the-art methods by a METEOR (Metric for Evaluation of Translation with Explicit ORdering) score of 33.9%, 34.3%, and 31.2%, respectively. Our results show that our framework has distinct advantages over traditional rule-based models for the recognition and generation of natural language descriptions.

Download Full-text

Temporal Processing Capacity in High-Level Visual Cortex Is Domain Specific

Journal of Neuroscience ◽

10.1523/jneurosci.4822-14.2015 ◽

2015 ◽

Vol 35 (36) ◽

pp. 12412-12424 ◽

Cited By ~ 65

Author(s):

A. Stigliani ◽

K. S. Weiner ◽

K. Grill-Spector

Keyword(s):

Visual Cortex ◽

Temporal Processing ◽

Processing Capacity ◽

Domain Specific ◽

High Level

Download Full-text

Organization of high-level visual cortex in human infants

Nature Communications ◽

10.1038/ncomms13995 ◽

2017 ◽

Vol 8 (1) ◽

Cited By ~ 107

Author(s):

Ben Deen ◽

Hilary Richardson ◽

Daniel D. Dilks ◽

Atsushi Takahashi ◽

Boris Keil ◽

...

Keyword(s):

Visual Cortex ◽

Human Infants ◽

High Level

Download Full-text

The Rapid Extraction of Gist—Early Neural Correlates of High-level Visual Processing

Journal of Cognitive Neuroscience ◽

10.1162/jocn_a_00100 ◽

2012 ◽

Vol 24 (2) ◽

pp. 521-529 ◽

Cited By ~ 13

Author(s):

Frank Oppermann ◽

Uwe Hassler ◽

Jörg D. Jescheniak ◽

Thomas Gruber

Keyword(s):

Visual Processing ◽

Time Course ◽

Occipital Cortex ◽

Stimulus Onset ◽

Gamma Band ◽

Oscillatory Activity ◽

Rapid Extraction ◽

The Right ◽

High Level ◽

Individual Objects

The human cognitive system is highly efficient in extracting information from our visual environment. This efficiency is based on acquired knowledge that guides our attention toward relevant events and promotes the recognition of individual objects as they appear in visual scenes. The experience-based representation of such knowledge contains not only information about the individual objects but also about relations between them, such as the typical context in which individual objects co-occur. The present EEG study aimed at exploring the availability of such relational knowledge in the time course of visual scene processing, using oscillatory evoked gamma-band responses as a neural correlate for a currently activated cortical stimulus representation. Participants decided whether two simultaneously presented objects were conceptually coherent (e.g., mouse–cheese) or not (e.g., crown–mushroom). We obtained increased evoked gamma-band responses for coherent scenes compared with incoherent scenes beginning as early as 70 msec after stimulus onset within a distributed cortical network, including the right temporal, the right frontal, and the bilateral occipital cortex. This finding provides empirical evidence for the functional importance of evoked oscillatory activity in high-level vision beyond the visual cortex and, thus, gives new insights into the functional relevance of neuronal interactions. It also indicates the very early availability of experience-based knowledge that might be regarded as a fundamental mechanism for the rapid extraction of the gist of a scene.

Download Full-text

Visual search for object categories is predicted by the representational architecture of high-level visual cortex

Journal of Neurophysiology ◽

10.1152/jn.00569.2016 ◽

2017 ◽

Vol 117 (1) ◽

pp. 388-402 ◽

Cited By ~ 14

Author(s):

Michael A. Cohen ◽

George A. Alvarez ◽

Ken Nakayama ◽

Talia Konkle

Keyword(s):

Visual Cortex ◽

Visual Search ◽

Visual System ◽

Visual Processing ◽

Search Task ◽

Visual Search Task ◽

Visual Object ◽

Neural Responses ◽

Object Categories ◽

High Level

Visual search is a ubiquitous visual behavior, and efficient search is essential for survival. Different cognitive models have explained the speed and accuracy of search based either on the dynamics of attention or on similarity of item representations. Here, we examined the extent to which performance on a visual search task can be predicted from the stable representational architecture of the visual system, independent of attentional dynamics. Participants performed a visual search task with 28 conditions reflecting different pairs of categories (e.g., searching for a face among cars, body among hammers, etc.). The time it took participants to find the target item varied as a function of category combination. In a separate group of participants, we measured the neural responses to these object categories when items were presented in isolation. Using representational similarity analysis, we then examined whether the similarity of neural responses across different subdivisions of the visual system had the requisite structure needed to predict visual search performance. Overall, we found strong brain/behavior correlations across most of the higher-level visual system, including both the ventral and dorsal pathways when considering both macroscale sectors as well as smaller mesoscale regions. These results suggest that visual search for real-world object categories is well predicted by the stable, task-independent architecture of the visual system. NEW & NOTEWORTHY Here, we ask which neural regions have neural response patterns that correlate with behavioral performance in a visual processing task. We found that the representational structure across all of high-level visual cortex has the requisite structure to predict behavior. Furthermore, when directly comparing different neural regions, we found that they all had highly similar category-level representational structures. These results point to a ubiquitous and uniform representational structure in high-level visual cortex underlying visual object processing.

Download Full-text

Knowledge Integration Networks for Action Recognition

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i07.6983 ◽

2020 ◽

Vol 34 (07) ◽

pp. 12862-12869

Author(s):

Shiwen Zhang ◽

Sheng Guo ◽

Limin Wang ◽

Weilin Huang ◽

Matthew Scott

Keyword(s):

Action Recognition ◽

Large Scale ◽

Knowledge Integration ◽

Scene Recognition ◽

Teacher Networks ◽

Medium Level ◽

Meaningful Context ◽

Context Knowledge ◽

High Level ◽

Context Features

In this work, we propose Knowledge Integration Networks (referred as KINet) for video action recognition. KINet is capable of aggregating meaningful context features which are of great importance to identifying an action, such as human information and scene context. We design a three-branch architecture consisting of a main branch for action recognition, and two auxiliary branches for human parsing and scene recognition which allow the model to encode the knowledge of human and scene for action recognition. We explore two pre-trained models as teacher networks to distill the knowledge of human and scene for training the auxiliary tasks of KINet. Furthermore, we propose a two-level knowledge encoding mechanism which contains a Cross Branch Integration (CBI) module for encoding the auxiliary knowledge into medium-level convolutional features, and an Action Knowledge Graph (AKG) for effectively fusing high-level context information. This results in an end-to-end trainable framework where the three tasks can be trained collaboratively, allowing the model to compute strong context knowledge efficiently. The proposed KINet achieves the state-of-the-art performance on a large-scale action recognition benchmark Kinetics-400, with a top-1 accuracy of 77.8%. We further demonstrate that our KINet has strong capability by transferring the Kinetics-trained model to UCF-101, where it obtains 97.8% top-1 accuracy.

Download Full-text

Atypical topography of high-level visual cortex is associated with reading difficulty

Journal of Vision ◽

10.1167/19.10.34a ◽

2019 ◽

Vol 19 (10) ◽

pp. 34a

Author(s):

Emily Kubota ◽

Jason D Yeatman

Keyword(s):

Visual Cortex ◽

Reading Difficulty ◽

High Level

Download Full-text

Eccentricity drives developmental organization of human high-level visual cortex

Journal of Vision ◽

10.1167/18.10.1149 ◽

2018 ◽

Vol 18 (10) ◽

pp. 1149

Author(s):

Jesse Gomez ◽

Michael Barnett ◽

Kalanit Grill-Spector

Keyword(s):

Visual Cortex ◽

High Level

Download Full-text

A Practical Guide to Sparse k-Means Clustering for Studying Molecular Development of the Human Brain

Frontiers in Neuroscience ◽

10.3389/fnins.2021.668293 ◽

2021 ◽

Vol 15 ◽

Author(s):

Justin L. Balsor ◽

Keon Arbabi ◽

Desmond Singh ◽

Rachel Kwan ◽

Jonathan Zaslavsky ◽

...

Keyword(s):

Visual Cortex ◽

Human Brain ◽

Small Sample ◽

Postmortem Brain ◽

High Dimensional ◽

Clustering Methods ◽

Sample Sizes ◽

Human Visual Cortex ◽

Age Related ◽

Postmortem Brain Tissue

Studying the molecular development of the human brain presents unique challenges for selecting a data analysis approach. The rare and valuable nature of human postmortem brain tissue, especially for developmental studies, means the sample sizes are small (n), but the use of high throughput genomic and proteomic methods measure the expression levels for hundreds or thousands of variables [e.g., genes or proteins (p)] for each sample. This leads to a data structure that is high dimensional (p ≫ n) and introduces the curse of dimensionality, which poses a challenge for traditional statistical approaches. In contrast, high dimensional analyses, especially cluster analyses developed for sparse data, have worked well for analyzing genomic datasets where p ≫ n. Here we explore applying a lasso-based clustering method developed for high dimensional genomic data with small sample sizes. Using protein and gene data from the developing human visual cortex, we compared clustering methods. We identified an application of sparse k-means clustering [robust sparse k-means clustering (RSKC)] that partitioned samples into age-related clusters that reflect lifespan stages from birth to aging. RSKC adaptively selects a subset of the genes or proteins contributing to partitioning samples into age-related clusters that progress across the lifespan. This approach addresses a problem in current studies that could not identify multiple postnatal clusters. Moreover, clusters encompassed a range of ages like a series of overlapping waves illustrating that chronological- and brain-age have a complex relationship. In addition, a recently developed workflow to create plasticity phenotypes (Balsor et al., 2020) was applied to the clusters and revealed neurobiologically relevant features that identified how the human visual cortex changes across the lifespan. These methods can help address the growing demand for multimodal integration, from molecular machinery to brain imaging signals, to understand the human brain’s development.

Download Full-text