Contextual associations represented both in neural networks and human behavior

Contextual associations facilitate object recognition in human vision. However, the role of context in artificial vision remains elusive as does the characteristics that humans use to define context. We investigated whether contextually related objects (bicycle-helmet) are represented more similarly in convolutional neural networks (CNNs) used for image understanding than unrelated objects (bicycle-fork). Stimuli were of objects against a white background and consisted of a diverse set of contexts (N=73). CNN representations of contextually related objects were more similar to one another than to unrelated objects across all CNN layers. Critically, the similarity found in CNNs correlated with human behavior across three experiments assessing contextual relatedness, emerging significant only in the later layers. The results demonstrate that context is inherently represented in CNNs as a result of object recognition training, and that the representation in the later layers of the network tap into the contextual regularities that predict human behavior.

Download Full-text

Integrating Knowledge and Reasoning in Image Understanding

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2019/873 ◽

2019 ◽

Cited By ~ 2

Author(s):

Somak Aditya ◽

Yezhou Yang ◽

Chitta Baral

Keyword(s):

Neural Networks ◽

Deep Learning ◽

Object Recognition ◽

Knowledge Integration ◽

Question Answering ◽

Semantic Segmentation ◽

Image Understanding ◽

Data Driven ◽

Integration Methods ◽

Visual Question Answering

Deep learning based data-driven approaches have been successfully applied in various image understanding applications ranging from object recognition, semantic segmentation to visual question answering. However, the lack of knowledge integration as well as higher-level reasoning capabilities with the methods still pose a hindrance. In this work, we present a brief survey of a few representative reasoning mechanisms, knowledge integration methods and their corresponding image understanding applications developed by various groups of researchers, approaching the problem from a variety of angles. Furthermore, we discuss upon key efforts on integrating external knowledge with neural networks. Taking cues from these efforts, we conclude by discussing potential pathways to improve reasoning capabilities.

Download Full-text

The contrasting shape representations that support object recognition in humans and CNNs

10.1101/2021.12.14.472546 ◽

2021 ◽

Author(s):

Gaurav Malhotra ◽

Marin Dujmovic ◽

John Hummel ◽

Jeffrey S Bowers

Keyword(s):

Neural Networks ◽

Object Recognition ◽

Fundamental Difference ◽

Human Vision ◽

Shape Representations ◽

Enhanced Sensitivity ◽

Human Object ◽

Proximal Stimulus ◽

Human Participants ◽

Object Features

The success of Convolutional Neural Networks (CNNs) in classifying objects has led to a surge of interest in using these systems to understand human vision. Recent studies have argued that when CNNs are trained in the correct learning environment, they can emulate a key property of human vision -- learning to classify objects based on their shape. While showing a shape-bias is indeed a desirable property for any model of human object recognition, it is unclear whether the resulting shape representations learned by these networks are human-like. We explored this question in the context of a well-known observation from psychology showing that humans encode the shape of objects in terms of relations between object features. To check whether this is also true for the representations of CNNs, we ran a series of simulations where we trained CNNs on datasets of novel shapes and tested them on a set of controlled deformations of these shapes. We found that CNNs do not show any enhanced sensitivity to deformations which alter relations between features, even when explicitly trained on such deformations. This behaviour contrasted with human participants in previous studies as well as in a new experiment. We argue that these results are a consequence of a fundamental difference between how humans and CNNs learn to recognise objects: while CNNs select features that allow them to optimally classify the proximal stimulus, humans select features that they infer to be properties of the distal stimulus. This makes human representations more generalisable to novel contexts and tasks.

Download Full-text

Characterizing the temporal dynamics of object recognition by deep neural networks : role of depth

10.1101/178541 ◽

2017 ◽

Cited By ~ 1

Author(s):

Kandan Ramakrishnan ◽

Iris I.A. Groen ◽

Arnold W.M. Smeulders ◽

H. Steven Scholte ◽

Sennay Ghebreab

Keyword(s):

Neural Networks ◽

Object Recognition ◽

Visual Processing ◽

Temporal Dynamics ◽

Occipital Cortex ◽

Stimulus Onset ◽

Human Vision ◽

Visual Object ◽

Visual Object Recognition ◽

Brain Responses

AbstractConvolutional neural networks (CNNs) have recently emerged as promising models of human vision based on their ability to predict hemodynamic brain responses to visual stimuli measured with functional magnetic resonance imaging (fMRI). However, the degree to which CNNs can predict temporal dynamics of visual object recognition reflected in neural measures with millisecond precision is less understood. Additionally, while deeper CNNs with higher numbers of layers perform better on automated object recognition, it is unclear if this also results into better correlation to brain responses. Here, we examined 1) to what extent CNN layers predict visual evoked responses in the human brain over time and 2) whether deeper CNNs better model brain responses. Specifically, we tested how well CNN architectures with 7 (CNN-7) and 15 (CNN-15) layers predicted electro-encephalography (EEG) responses to several thousands of natural images. Our results show that both CNN architectures correspond to EEG responses in a hierarchical spatio-temporal manner, with lower layers explaining responses early in time at electrodes overlying early visual cortex, and higher layers explaining responses later in time at electrodes overlying lateral-occipital cortex. While the explained variance of neural responses by individual layers did not differ between CNN-7 and CNN-15, combining the representations across layers resulted in improved performance of CNN-15 compared to CNN-7, but only after 150 ms after stimulus-onset. This suggests that CNN representations reflect both early (feed-forward) and late (feedback) stages of visual processing. Overall, our results show that depth of CNNs indeed plays a role in explaining time-resolved EEG responses.

Download Full-text

Convolutional neural networks trained with a developmental sequence of blurry to clear images reveal core differences between face and object processing

10.1101/2021.05.25.444835 ◽

2021 ◽

Author(s):

Hojin Jang ◽

Frank Tong

Keyword(s):

Neural Networks ◽

Object Recognition ◽

Convolutional Neural Networks ◽

Frequency Tuning ◽

Human Vision ◽

Object Processing ◽

Training Images ◽

Early Experiences ◽

Spatial Frequencies ◽

Blurry Vision

Although convolutional neural networks (CNNs) provide a promising model for understanding human vision, most CNNs lack robustness to challenging viewing conditions such as image blur, whereas human vision is much more reliable. Might robustness to blur be attributable to vision during infancy, given that acuity is initially poor but improves considerably over the first several months of life? Here, we evaluated the potential consequences of such early experiences by training CNN models on face and object recognition tasks while gradually reducing the amount of blur applied to the training images. For CNNs trained on blurry to clear faces, we observed sustained robustness to blur, consistent with a recent report by Vogelsang and colleagues (2018). By contrast, CNNs trained with blurry to clear objects failed to retain robustness to blur. Further analyses revealed that the spatial frequency tuning of the two CNNs was profoundly different. The blurry to clear face-trained network successfully retained a preference for low spatial frequencies, whereas the blurry to clear object-trained CNN exhibited a progressive shift toward higher spatial frequencies. Our findings provide novel computational evidence showing how face recognition, unlike object recognition, allows for more holistic processing. Moreover, our results suggest that blurry vision during infancy is insufficient to account for the robustness of adult vision to blurry objects.

Download Full-text

Three-stage processing of category and variation information by entangled interactive mechanisms of peri-occipital and peri-frontal cortices

10.1101/189811 ◽

2018 ◽

Author(s):

Hamid Karimi-Rouzbahani

Keyword(s):

Object Recognition ◽

Visual Processing ◽

Visual Information ◽

Human Vision ◽

Task Specificity ◽

Processing Pathways ◽

Invariant Object Recognition ◽

Multivariate Pattern ◽

Transfer Of Information

AbstractInvariant object recognition, which refers to the ability of precisely and rapidly recognizing objects in the presence of variations, has been a central question in human vision research. The general consensus is that the ventral and dorsal visual streams are the major processing pathways which undertake category and variation encoding in entangled layers. This overlooks the mounting evidence which support the role of peri-frontal areas in category encoding. These recent studies, however, have left open several aspects of visual processing in peri-frontal areas including whether these areas contributed only in active tasks, whether they interacted with peri-occipital areas or processed information independently and differently. To address these concerns, a passive EEG paradigm was designed in which subjects viewed a set of variation-controlled object images. Using multivariate pattern analysis, noticeable category and variation information were observed in occipital, parietal, temporal and prefrontal areas, supporting their contribution to visual processing. Using task specificity indices, phase and Granger causality analyses, three distinct stages of processing were identified which revealed transfer of information between peri-frontal and peri-occipital areas suggesting their parallel and interactive processing of visual information. A brain-plausible computational model supported the possibility of parallel processing mechanisms in peri-occipital and peri-frontal areas. These findings, while advocating previous results on the role of prefrontal areas in object recognition, extend their contribution from active recognition, in which peri-frontal to peri-occipital feedback mechanisms are activated, to the general case of object and variation processing, which is an integral part of visual processing and play role even during passive viewing.

Download Full-text

Comparing part-based representations of deep convolutional neural networks with those of human vision through representational similarity analysis

10.31234/osf.io/9dvgq ◽

2021 ◽

Author(s):

Jiaqi Huang ◽

Peter Gerhardstein

Keyword(s):

Neural Networks ◽

Convolutional Neural Networks ◽

Human Vision ◽

Behavioral Experiment ◽

Similarity Analysis ◽

Deep Convolutional Neural Networks ◽

Representational Similarity Analysis ◽

Novel Method ◽

Semantic Parts

Multiple theories of human object recognition argue for the importance of semantic parts in the formation of intermediate representations. However, the role of semantic parts in Deep Convolutional Neural Networks (DCNN), which encapsulate the most recent and successful computer vision models, is poorly examined. We extract representations of DCNNs corresponding to differential performance with stimuli in which different parts of the same exemplar are deleted, and then compare these representations with those of human observers obtained in a behavioral experiment, using representational similarity analysis (RSA). We find that DCNN representations correlate strongly with those of observers, while acknowledging that these DCNN representations may not be part-based given an equally high correlation between DCNN output and part size. Additionally, the exemplars incorrectly identified by DCNNs tend to have less “human-like” representations, which demonstrates RSA as a potential novel method for interpreting error in intermediate processes of recognition of DCNNs.

Download Full-text