Image content is more important than Bouma’s Law for scene metamers

AbstractWe subjectively perceive our visual field with high fidelity, yet large peripheral distortions can go unnoticed and peripheral objects can be difficult to identify (crowding). A recent paper proposed a model of the mid-level ventral visual stream in which neural responses were averaged over an area of space that increased as a function of eccentricity (scaling). Human participants could not discriminate synthesised model images from each other (they were metamers) when scaling was about half the retinal eccentricity. This result implicated ventral visual area V2 and approximated “Bouma’s Law” of crowding. It has subsequently been interpreted as a link between crowding zones, receptive field scaling, and our rich perceptual experience. However, participants in this experiment never saw the original images. We find that participants can easily discriminate real and model-generated images at V2 scaling. Lower scale factors than even V1 receptive fields may be required to generate metamers. Efficiently explaining why scenes look as they do may require incorporating segmentation processes and global organisational constraints in addition to local pooling.

Download Full-text

Image content is more important than Bouma’s Law for scene metamers

eLife ◽

10.7554/elife.42512 ◽

2019 ◽

Vol 8 ◽

Cited By ~ 8

Author(s):

Thomas SA Wallis ◽

Christina M Funke ◽

Alexander S Ecker ◽

Leon A Gatys ◽

Felix A Wichmann ◽

...

Keyword(s):

Perceptual Experience ◽

Receptive Fields ◽

Natural Images ◽

Retinal Eccentricity ◽

High Fidelity ◽

Prior Work ◽

Visual Stream ◽

Area V2 ◽

Ventral Visual Stream ◽

Stream Model

We subjectively perceive our visual field with high fidelity, yet peripheral distortions can go unnoticed and peripheral objects can be difficult to identify (crowding). Prior work showed that humans could not discriminate images synthesised to match the responses of a mid-level ventral visual stream model when information was averaged in receptive fields with a scaling of about half their retinal eccentricity. This result implicated ventral visual area V2, approximated ‘Bouma’s Law’ of crowding, and has subsequently been interpreted as a link between crowding zones, receptive field scaling, and our perceptual experience. However, this experiment never assessed natural images. We find that humans can easily discriminate real and model-generated images at V2 scaling, requiring scales at least as small as V1 receptive fields to generate metamers. We speculate that explaining why scenes look as they do may require incorporating segmentation and global organisational constraints in addition to local pooling.

Download Full-text

Development differentially sculpts receptive fields across human visual cortex

10.1101/199901 ◽

2017 ◽

Author(s):

Jesse Gomez ◽

Vaidehi Natu ◽

Brianna Jeska ◽

Michael Barnett ◽

Kalanit Grill-Spector

Keyword(s):

Visual Cortex ◽

Visual Field ◽

Right Hemisphere ◽

Receptive Fields ◽

Visual Stream ◽

Differential Development ◽

Processing Information ◽

Ventral Visual Stream ◽

Visual Field Maps ◽

The Right

ABSTRACTReceptive fields (RFs) processing information in restricted parts of the visual field are a key property of neurons in the visual system. However, how RFs develop in humans is unknown. Using fMRI and population receptive field (pRF) modeling in children and adults, we determined where and how pRFs develop across the ventral visual stream. We find that pRF properties in visual field maps, V1 through VO1, are adult-like by age 5. However, pRF properties in face- and word-selective regions develop into adulthood, increasing the foveal representation and the visual field coverage for faces in the right hemisphere and words in the left hemisphere. Eye-tracking indicates that pRF changes are related to changing fixation patterns on words and faces across development. These findings suggest a link between viewing behavior of faces and words and the differential development of pRFs across visual cortex, potentially due to competition on foveal coverage.

Download Full-text

Statistical learning attenuates visual activity only for attended stimuli

10.1101/653782 ◽

2019 ◽

Author(s):

David Richter ◽

Floris P. de Lange

Keyword(s):

Statistical Learning ◽

Sensory Information ◽

Neural Responses ◽

Visual Stream ◽

Fmri Study ◽

Human Volunteers ◽

Statistical Regularities ◽

Ventral Visual Stream ◽

Sensory Attenuation ◽

And Behavior

AbstractPerception and behavior can be guided by predictions, which are often based on learned statistical regularities. Neural responses to expected stimuli are frequently found to be attenuated after statistical learning. However, whether this sensory attenuation following statistical learning occurs automatically or depends on attention remains unknown. In the present fMRI study, we exposed human volunteers to sequentially presented object stimuli, in which the first object predicted the identity of the second object. We observed a strong attenuation of neural activity for expected compared to unexpected stimuli in the ventral visual stream. Crucially, this sensory attenuation was only apparent when stimuli were attended, and vanished when attention was directed away from the predictable objects. These results put important constraints on neurocomputational theories that cast perception as a process of probabilistic integration of prior knowledge and sensory information.

Download Full-text

Familiarity increases processing speed in the visual system

10.1101/670489 ◽

2019 ◽

Author(s):

Mariya E. Manahova ◽

Eelke Spaak ◽

Floris P. de Lange

Keyword(s):

Visual System ◽

Brain Activity ◽

Spatial Dynamics ◽

Neural Response ◽

Visual Response ◽

Neural Responses ◽

Visual Stream ◽

Perceptual Performance ◽

Evoked Activity ◽

Human Participants

AbstractFamiliarity with a stimulus leads to an attenuated neural response to the stimulus. Alongside this attenuation, recent studies have also observed a truncation of stimulus-evoked activity for familiar visual input. One proposed function of this truncation is to rapidly put neurons in a state of readiness to respond to new input. Here, we examined this hypothesis by presenting human participants with target stimuli that were embedded in rapid streams of familiar or novel distractor stimuli at different speeds of presentation, while recording brain activity using magnetoencephalography (MEG) and measuring behavioral performance. We investigated the temporal and spatial dynamics of signal truncation and whether this phenomenon bears relationship to participants’ ability to categorize target items within a visual stream. Behaviorally, target categorization performance was markedly better when the target was embedded within familiar distractors, and this benefit became more pronounced with increasing speed of presentation. Familiar distractors showed a truncation of neural activity in the visual system, and this truncation was strongest for the fastest presentation speeds. Moreover, neural processing of the target was stronger when it was preceded by familiar distractors. Taken together, these findings suggest that truncation of neural responses for familiar items may result in stronger processing of relevant target information, resulting in superior perceptual performance.Significance statementThe visual response to familiar input is attenuated more rapidly than for novel input. Here we find that this truncation of the neural response for familiar input is strongest for very fast image presentations. We also find a tentative function for this truncation: the neural response to a target image that is embedded within distractors is much greater when the distractors are familiar than when they are novel. Similarly, target categorization performance is much better when the target is embedded within familiar distractors, and this advantage is most obvious for very fast image presentations. This suggests that neural truncation helps to rapidly put neurons in a state of readiness to respond to new input.

Download Full-text

Statistical learning attenuates visual activity only for attended stimuli

eLife ◽

10.7554/elife.47869 ◽

2019 ◽

Vol 8 ◽

Cited By ~ 14

Author(s):

David Richter ◽

Floris P de Lange

Keyword(s):

Statistical Learning ◽

Sensory Information ◽

Neural Responses ◽

Visual Stream ◽

Fmri Study ◽

Human Volunteers ◽

Statistical Regularities ◽

Ventral Visual Stream ◽

Sensory Attenuation ◽

And Behavior

Perception and behavior can be guided by predictions, which are often based on learned statistical regularities. Neural responses to expected stimuli are frequently found to be attenuated after statistical learning. However, whether this sensory attenuation following statistical learning occurs automatically or depends on attention remains unknown. In the present fMRI study, we exposed human volunteers to sequentially presented object stimuli, in which the first object predicted the identity of the second object. We observed a reliable attenuation of neural activity for expected compared to unexpected stimuli in the ventral visual stream. Crucially, this sensory attenuation was only apparent when stimuli were attended, and vanished when attention was directed away from the predictable objects. These results put important constraints on neurocomputational theories that cast perception as a process of probabilistic integration of prior knowledge and sensory information.

Download Full-text

Spatial receptive fields persist at the latest stages of the human ventral visual stream

Journal of Vision ◽

10.1167/14.10.717 ◽

2014 ◽

Vol 14 (10) ◽

pp. 717-717

Author(s):

K. Kay ◽

K. Weiner ◽

K. Grill-Spector

Keyword(s):

Receptive Fields ◽

Visual Stream ◽

Ventral Visual Stream

Download Full-text

Unsupervised deep learning identifies semantic disentanglement in single inferotemporal face patch neurons

Nature Communications ◽

10.1038/s41467-021-26751-5 ◽

2021 ◽

Vol 12 (1) ◽

Author(s):

Irina Higgins ◽

Le Chang ◽

Victoria Langston ◽

Demis Hassabis ◽

Christopher Summerfield ◽

...

Keyword(s):

State Of The Art ◽

Appearance Model ◽

Neural Responses ◽

Latent Factors ◽

Visual Stream ◽

Sensory Data ◽

Face Images ◽

Unsupervised Deep Learning ◽

Ventral Visual Stream ◽

The Brain

AbstractIn order to better understand how the brain perceives faces, it is important to know what objective drives learning in the ventral visual stream. To answer this question, we model neural responses to faces in the macaque inferotemporal (IT) cortex with a deep self-supervised generative model, β-VAE, which disentangles sensory data into interpretable latent factors, such as gender or age. Our results demonstrate a strong correspondence between the generative factors discovered by β-VAE and those coded by single IT neurons, beyond that found for the baselines, including the handcrafted state-of-the-art model of face perception, the Active Appearance Model, and deep classifiers. Moreover, β-VAE is able to reconstruct novel face images using signals from just a handful of cells. Together our results imply that optimising the disentangling objective leads to representations that closely resemble those in the IT at the single unit level. This points at disentangling as a plausible learning objective for the visual brain.

Download Full-text

Neural dynamics at successive stages of the ventral visual stream are consistent with hierarchical error signals

eLife ◽

10.7554/elife.42870 ◽

2018 ◽

Vol 7 ◽

Cited By ~ 12

Author(s):

Elias B Issa ◽

Charles F Cadieu ◽

James J DiCarlo

Keyword(s):

Visual Cortex ◽

Face Detection ◽

Neural Dynamics ◽

Prediction Errors ◽

Neural Responses ◽

Neural Models ◽

Top Down ◽

Visual Stream ◽

Static Images ◽

Ventral Visual Stream

Ventral visual stream neural responses are dynamic, even for static image presentations. However, dynamical neural models of visual cortex are lacking as most progress has been made modeling static, time-averaged responses. Here, we studied population neural dynamics during face detection across three cortical processing stages. Remarkably,~30 milliseconds after the initially evoked response, we found that neurons in intermediate level areas decreased their responses to typical configurations of their preferred face parts relative to their response for atypical configurations even while neurons in higher areas achieved and maintained a preference for typical configurations. These hierarchical neural dynamics were inconsistent with standard feedforward circuits. Rather, recurrent models computing prediction errors between stages captured the observed temporal signatures. This model of neural dynamics, which simply augments the standard feedforward model of online vision, suggests that neural responses to static images may encode top-down prediction errors in addition to bottom-up feature estimates.

Download Full-text

Spatial and temporal context jointly modulate the sensory response within the ventral visual stream

10.1101/2020.07.24.219709 ◽

2020 ◽

Cited By ~ 1

Author(s):

Tao He ◽

David Richter ◽

Zhiguo Wang ◽

Floris P. de Lange

Keyword(s):

Visual Perception ◽

Visual System ◽

Sensory Processing ◽

Spatial Context ◽

Sensory Response ◽

Temporal Context ◽

Neural Responses ◽

Visual Stream ◽

Ventral Visual Stream ◽

Temporal And Spatial

AbstractBoth spatial and temporal context play an important role in visual perception and behavior. Humans can extract statistical regularities from both forms of context to help processing the present and to construct expectations about the future. Numerous studies have found reduced neural responses to expected stimuli compared to unexpected stimuli, for both spatial and temporal regularities. However, it is largely unclear whether and how these forms of context interact. In the current fMRI study, thirty-three human volunteers were exposed to object stimuli that could be expected or surprising in terms of their spatial and temporal context. We found a reliable independent contribution of both spatial and temporal context in modulating the neural response. Specifically, neural responses to stimuli in expected compared to unexpected contexts were suppressed throughout the ventral visual stream. Interestingly, the modulation by spatial context was stronger in magnitude and more reliable than modulations by temporal context. These results suggest that while both spatial and temporal context serve as a prior that can modulate sensory processing in a similar fashion, predictions of spatial context may be a more powerful modulator in the visual system.Significance StatementBoth temporal and spatial context can affect visual perception, however it is largely unclear if and how these different forms of context interact in modulating sensory processing. When manipulating both temporal and spatial context expectations, we found that they jointly affected sensory processing, evident as a suppression of neural responses for expected compared to unexpected stimuli. Interestingly, the modulation by spatial context was stronger than that by temporal context. Together, our results suggest that spatial context may be a stronger modulator of neural responses than temporal context within the visual system. Thereby, the present study provides new evidence how different types of predictions jointly modulate perceptual processing.

Download Full-text