Feature blindness: a challenge for understanding and modelling visual object recognition

Mapping Intimacies ◽

10.1101/2021.10.20.465074 ◽

2021 ◽

Author(s):

Gaurav Malhotra ◽

Marin Dujmovic ◽

Jeffrey S Bowers

Keyword(s):

Object Recognition ◽

Statistical Inference ◽

Computational Cost ◽

Fundamental Difference ◽

Human Vision ◽

Visual Object ◽

Visual Object Recognition ◽

Global Features ◽

Diagnostic Features ◽

Inference Models

A central problem in vision sciences is to understand how humans recognise objects under novel viewing conditions. Recently, statistical inference models such as Convolutional Neural Networks (CNNs) seem to have reproduced this ability by incorporating some architectural constraints of biological vision systems into machine learning models. This has led to the proposal that, like CNNs, humans solve the problem of object recognition by performing a statistical inference over their observations. This hypothesis remains difficult to test as models and humans learn in vastly different environments. Accordingly, any differences in performance could be attributed to the training environment rather than reflect any fundamental difference between statistical inference models and human vision. To overcome these limitations, we conducted a series of experiments and simulations where humans and models had no prior experience with the stimuli. The stimuli contained multiple features that varied in the extent to which they predicted category membership. We observed that human participants frequently ignored features that were highly predictive and clearly visible. Instead, they learned to rely on global features such as colour or shape, even when these features were not the most predictive. When these features were absent they failed to learn the task entirely. By contrast, ideal inference models as well as CNNs always learned to categorise objects based on the most predictive feature. This was the case even when the CNN was pre-trained to have a shape-bias and the convolutional backbone was frozen. These results highlight a fundamental difference between statistical inference models and humans: while statistical inference models such as CNNs learn most diagnostic features with little regard for the computational cost of learning these features, humans are highly constrained by their limited cognitive capacities which results in a qualitatively different approach to object recognition.

Download Full-text

Characterizing the temporal dynamics of object recognition by deep neural networks : role of depth

10.1101/178541 ◽

2017 ◽

Cited By ~ 1

Author(s):

Kandan Ramakrishnan ◽

Iris I.A. Groen ◽

Arnold W.M. Smeulders ◽

H. Steven Scholte ◽

Sennay Ghebreab

Keyword(s):

Neural Networks ◽

Object Recognition ◽

Visual Processing ◽

Temporal Dynamics ◽

Occipital Cortex ◽

Stimulus Onset ◽

Human Vision ◽

Visual Object ◽

Visual Object Recognition ◽

Brain Responses

AbstractConvolutional neural networks (CNNs) have recently emerged as promising models of human vision based on their ability to predict hemodynamic brain responses to visual stimuli measured with functional magnetic resonance imaging (fMRI). However, the degree to which CNNs can predict temporal dynamics of visual object recognition reflected in neural measures with millisecond precision is less understood. Additionally, while deeper CNNs with higher numbers of layers perform better on automated object recognition, it is unclear if this also results into better correlation to brain responses. Here, we examined 1) to what extent CNN layers predict visual evoked responses in the human brain over time and 2) whether deeper CNNs better model brain responses. Specifically, we tested how well CNN architectures with 7 (CNN-7) and 15 (CNN-15) layers predicted electro-encephalography (EEG) responses to several thousands of natural images. Our results show that both CNN architectures correspond to EEG responses in a hierarchical spatio-temporal manner, with lower layers explaining responses early in time at electrodes overlying early visual cortex, and higher layers explaining responses later in time at electrodes overlying lateral-occipital cortex. While the explained variance of neural responses by individual layers did not differ between CNN-7 and CNN-15, combining the representations across layers resulted in improved performance of CNN-15 compared to CNN-7, but only after 150 ms after stimulus-onset. This suggests that CNN representations reflect both early (feed-forward) and late (feedback) stages of visual processing. Overall, our results show that depth of CNNs indeed plays a role in explaining time-resolved EEG responses.

Download Full-text

Diagnostic Features for Visual Object Recognition in Humans

Journal of Vision ◽

10.1167/19.10.30d ◽

2019 ◽

Vol 19 (10) ◽

pp. 30d

Author(s):

Quentin Wohlfarth ◽

Martin Arguin

Keyword(s):

Object Recognition ◽

Visual Object ◽

Visual Object Recognition ◽

Diagnostic Features

Download Full-text

Developmental Trajectory of Visual Object Recognition Revealed by fMRI

PsycEXTRA Dataset ◽

10.1037/e527342012-229 ◽

2007 ◽

Author(s):

K. Suzanne Scherf ◽

Marlene Behrmann ◽

Kate Humphreys ◽

Beatriz Luna

Keyword(s):

Object Recognition ◽

Developmental Trajectory ◽

Visual Object ◽

Visual Object Recognition

Download Full-text

Faculty Opinions recommendation of Comparison of deep neural networks to spatio-temporal cortical dynamics of human visual object recognition reveals hierarchical correspondence.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.726413891.793534418 ◽

2017 ◽

Author(s):

Odelia Schwartz

Keyword(s):

Neural Networks ◽

Object Recognition ◽

Deep Neural Networks ◽

Visual Object ◽

Visual Object Recognition ◽

Cortical Dynamics ◽

Spatio Temporal

Download Full-text

The Functional Architecture Of Visual Object Recognition: Cognitive And Neuropsychological Approaches

10.21236/ada259859 ◽

1992 ◽

Author(s):

Martha J. Farah

Keyword(s):

Object Recognition ◽

Visual Object ◽

Visual Object Recognition ◽

Functional Architecture

Download Full-text

Haptic object recognition based on shape relates to visual object recognition ability

Psychological Research ◽

10.1007/s00426-021-01560-z ◽

2021 ◽

Author(s):

Jason K. Chow ◽

Thomas J. Palmeri ◽

Isabel Gauthier

Keyword(s):

Object Recognition ◽

Visual Object ◽

Visual Object Recognition ◽

Recognition Ability ◽

Haptic Object

Download Full-text

Visual Object Recognition: Do We Know More Now Than We Did 20 Years Ago?

Annual Review of Psychology ◽

10.1146/annurev.psych.58.102904.190114 ◽

2007 ◽

Vol 58 (1) ◽

pp. 75-96 ◽

Cited By ~ 79

Author(s):

Jessie J. Peissig ◽

Michael J. Tarr

Keyword(s):

Object Recognition ◽

Visual Object ◽

Visual Object Recognition

Download Full-text

Mechanisms of visual object recognition studied in monkeys

Spatial Vision ◽

10.1163/156856800741171 ◽

2000 ◽

Vol 13 (2-3) ◽

pp. 147-163 ◽

Cited By ~ 34

Author(s):

Keiji Tanaka

Keyword(s):

Object Recognition ◽

Visual Object ◽

Visual Object Recognition

Download Full-text

Quantifying the role of context in visual object recognition

Visual Cognition ◽

10.1080/13506285.2013.865694 ◽

2013 ◽

Vol 22 (1) ◽

pp. 30-56 ◽

Cited By ~ 14

Author(s):

Elan Barenholtz

Keyword(s):

Object Recognition ◽

Visual Object ◽

Visual Object Recognition

Download Full-text

Category-specificity in visual object recognition

Cognition ◽

10.1016/j.cognition.2009.02.005 ◽

2009 ◽

Vol 111 (3) ◽

pp. 281-301 ◽

Cited By ~ 37

Author(s):

Christian Gerlach

Keyword(s):

Object Recognition ◽

Visual Object ◽

Visual Object Recognition ◽

Category Specificity

Download Full-text