Texture-like representation of objects in human visual cortex

Mapping Intimacies ◽

10.1101/2022.01.04.474849 ◽

2022 ◽

Author(s):

Akshay Vivek Jagadeesh ◽

Justin Gardner

Keyword(s):

Visual Cortex ◽

Human Performance ◽

Object Perception ◽

Image Synthesis ◽

Spatial Arrangement ◽

Basis Set ◽

Visual Features ◽

Deep Convolutional Neural Networks ◽

Natural Feature ◽

Complex Features

The human visual ability to recognize objects and scenes is widely thought to rely on representations in category-selective regions of visual cortex. These representations could support object vision by specifically representing objects, or, more simply, by representing complex visual features regardless of the particular spatial arrangement needed to constitute real world objects. That is, by representing visual textures. To discriminate between these hypotheses, we leveraged an image synthesis approach that, unlike previous methods, provides independent control over the complexity and spatial arrangement of visual features. We found that human observers could easily detect a natural object among synthetic images with similar complex features that were spatially scrambled. However, observer models built from BOLD responses from category-selective regions, as well as a model of macaque inferotemporal cortex and Imagenet-trained deep convolutional neural networks, were all unable to identify the real object. This inability was not due to a lack of signal-to-noise, as all of these observer models could predict human performance in image categorization tasks. How then might these texture-like representations in category-selective regions support object perception? An image-specific readout from category-selective cortex yielded a representation that was more selective for natural feature arrangement, showing that the information necessary for object discrimination is available. Thus, our results suggest that the role of human category-selective visual cortex is not to explicitly encode objects but rather to provide a basis set of texture-like features that can be infinitely reconfigured to flexibly learn and identify new object categories.

Download Full-text

Psychophysical evidence and perceptual observations show that object recognition is not hierarchical but is a parallel, simultaneous, egalitarian, non-computational system.

10.1101/2021.06.10.447325 ◽

2021 ◽

Author(s):

Moshe Gur

Keyword(s):

Visual Cortex ◽

Object Recognition ◽

Human Performance ◽

Spatial Information ◽

Fundamental Property ◽

Object Perception ◽

Alternative View ◽

Wide Range ◽

Object Appearance ◽

Perceptual Performance

Object recognition models have at their core similar essential characteristics: feature extraction and hierarchical convergence leading to a code that is unique to each object and immune to variations in the object appearance. To compare computational, biologically-feasible models to human performance, subjects viewed objects displayed at a wide range of orientations and sizes, and were able to recognize them almost perfectly. These empirical results, together with consideration of thought experiments and analysis of everyday perceptual performance, lead to a conclusion that biologically-plausible object perception models do not even come close to matching our perceptual abilities. We can categorize many thousands of objects, discriminate between enormous numbers of different exemplars within each category, and recognize an object as unique although it may appear in countless variations—most of which have never been seen. This seemingly technical, quantitative failure stems from a fundamental property of our perception: the ability to perceive spatial information instantaneously and in parallel, retain details including their relative properties, and yet be able to integrate details into a meaningful percept such as an object. I present an alternative view of object perception whereby objects are represented by responses in primary visual cortex (V1) which is the only cortical area responding to small spatial elements. The rest of the visual cortex is dedicated to scene understanding and interpretation such as constructing 3D percepts from 2D inputs, coding motion, categorization and memories. Since our perception abilities cannot be explained by convergence to 'object cells' or by interactions implemented by axonal transmissions, a parallel-to-parallel field-like process is suggested. In this view, spatial information is not modified by multiple neural interactions but is retained by affecting changes in a 'neural field' which preserves the identity of individual elements while enabling a new holistic percept when these elements change.

Download Full-text

Processing of the S-cone signals in the early visual cortex of primates

Visual Neuroscience ◽

10.1017/s0952523813000278 ◽

2013 ◽

Vol 31 (2) ◽

pp. 189-195 ◽

Cited By ~ 8

Author(s):

Youping Xiao

Keyword(s):

Visual Cortex ◽

Color Vision ◽

Short Wavelength ◽

Nonhuman Primates ◽

Striate Cortex ◽

Color Perception ◽

Visual Pathway ◽

Visual Features ◽

Area V2 ◽

Early Visual Cortex

AbstractThe short-wavelength-sensitive (S) cones play an important role in color vision of primates, and may also contribute to the coding of other visual features, such as luminance and motion. The color signals carried by the S cones and other cone types are largely separated in the subcortical visual pathway. Studies on nonhuman primates or humans have suggested that these signals are combined in the striate cortex (V1) following a substantial amplification of the S-cone signals in the same area. In addition to reviewing these studies, this review describes the circuitry in V1 that may underlie the processing of the S-cone signals and the dynamics of this processing. It also relates the interaction between various cone signals in V1 to the results of some psychophysical and physiological studies on color perception, which leads to a discussion of a previous model, in which color perception is produced by a multistage processing of the cone signals. Finally, I discuss the processing of the S-cone signals in the extrastriate area V2.

Download Full-text

Knowledge-driven perceptual organization reshapes information sampling via eye movements

10.1101/2021.09.24.461220 ◽

2021 ◽

Author(s):

Marek A. Pedziwiatr ◽

Elisabeth von dem Hagen ◽

Christoph Teufel

Keyword(s):

Eye Movements ◽

Feature Space ◽

Object Perception ◽

Visual Features ◽

Object Representations ◽

Object Knowledge ◽

Recent Developments ◽

Gaze Behaviour ◽

Information Sampling ◽

High Level

Humans constantly move their eyes to explore the environment and obtain information. Competing theories of gaze guidance consider the factors driving eye movements within a dichotomy between low-level visual features and high-level object representations. However, recent developments in object perception indicate a complex and intricate relationship between features and objects. Specifically, image-independent object-knowledge can generate objecthood by dynamically reconfiguring how feature space is carved up by the visual system. Here, we adopt this emerging perspective of object perception, moving away from the simplifying dichotomy between features and objects in explanations of gaze guidance. We recorded eye movements in response to stimuli that appear as meaningless patches on initial viewing but are experienced as coherent objects once relevant object-knowledge has been acquired. We demonstrate that gaze guidance differs substantially depending on whether observers experienced the same stimuli as meaningless patches or organised them into object representations. In particular, fixations on identical images became object-centred, less dispersed, and more consistent across observers once exposed to relevant prior object-knowledge. Observers' gaze behaviour also indicated a shift from exploratory information-sampling to a strategy of extracting information mainly from selected, object-related image areas. These effects were evident from the first fixations on the image. Importantly, however, eye-movements were not fully determined by object representations but were best explained by a simple model that integrates image-computable features and high-level, knowledge-dependent object representations. Overall, the results show how information sampling via eye-movements in humans is guided by a dynamic interaction between image-computable features and knowledge-driven perceptual organisation.

Download Full-text

Transformation from independent to integrative coding of multi-object arrangements in human visual cortex

10.1101/117432 ◽

2017 ◽

Cited By ~ 1

Author(s):

Daniel Kaiser ◽

Marius V. Peelen

Keyword(s):

Visual Cortex ◽

Visual System ◽

Activity Patterns ◽

Spatial Arrangement ◽

Living Room ◽

Natural Scenes ◽

Response Patterns ◽

Object Representations ◽

Individual Object ◽

Object Coding

AbstractTo optimize processing, the human visual system utilizes regularities present in naturalistic visual input. One of these regularities is the relative position of objects in a scene (e.g., a sofa in front of a television), with behavioral research showing that regularly positioned objects are easier to perceive and to remember. Here we use fMRI to test how positional regularities are encoded in the visual system. Participants viewed pairs of objects that formed minimalistic two-object scenes (e.g., a “living room” consisting of a sofa and television) presented in their regularly experienced spatial arrangement or in an irregular arrangement (with interchanged positions). Additionally, single objects were presented centrally and in isolation. Multi-voxel activity patterns evoked by the object pairs were modeled as the average of the response patterns evoked by the two single objects forming the pair. In two experiments, this approximation in object-selective cortex was significantly less accurate for the regularly than the irregularly positioned pairs, indicating integration of individual object representations. More detailed analysis revealed a transition from independent to integrative coding along the posterior-anterior axis of the visual cortex, with the independent component (but not the integrative component) being almost perfectly predicted by object selectivity across the visual hierarchy. These results reveal a transitional stage between individual object and multi-object coding in visual cortex, providing a possible neural correlate of efficient processing of regularly positioned objects in natural scenes.

Download Full-text

Depth in convolutional neural networks solves scene segmentation

10.1101/2019.12.16.877753 ◽

2019 ◽

Cited By ~ 1

Author(s):

N Seijdel ◽

N Tsakmakidis ◽

EHF De Haan ◽

SM Bohte ◽

HS Scholte

Keyword(s):

Neural Networks ◽

Convolutional Neural Networks ◽

Visual Processing ◽

Human Performance ◽

Object Identification ◽

Image Features ◽

Background Information ◽

Natural Scenes ◽

Scene Segmentation ◽

Deep Convolutional Neural Networks

AbstractFeedforward deep convolutional neural networks (DCNNs) are, under specific conditions, matching and even surpassing human performance in object recognition in natural scenes. This performance suggests that the analysis of a loose collection of image features could support the recognition of natural object categories, without dedicated systems to solve specific visual subtasks. Research in humans however suggests that while feedforward activity may suffice for sparse scenes with isolated objects, additional visual operations (‘routines’) that aid the recognition process (e.g. segmentation or grouping) are needed for more complex scenes. Linking human visual processing to performance of DCNNs with increasing depth, we here explored if, how, and when object information is differentiated from the backgrounds they appear on. To this end, we controlled the information in both objects and backgrounds, as well as the relationship between them by adding noise, manipulating background congruence and systematically occluding parts of the image. Results indicate that with an increase in network depth, there is an increase in the distinction between object- and background information. For more shallow networks, results indicated a benefit of training on segmented objects. Overall, these results indicate that, de facto, scene segmentation can be performed by a network of sufficient depth. We conclude that the human brain could perform scene segmentation in the context of object identification without an explicit mechanism, by selecting or “binding” features that belong to the object and ignoring other features, in a manner similar to a very deep convolutional neural network.

Download Full-text

Notice of Retraction Object perception model in visual cortex based on Bayesian network

2011 Seventh International Conference on Natural Computation ◽

10.1109/icnc.2011.6022242 ◽

2011 ◽

Author(s):

Wei Li ◽

Zhao Xie

Keyword(s):

Visual Cortex ◽

Bayesian Network ◽

Object Perception ◽

Perception Model

Download Full-text

Spatial Distribution of Contextual Interactions in Primary Visual Cortex and in Visual Perception

Journal of Neurophysiology ◽

10.1152/jn.2000.84.4.2048 ◽

2000 ◽

Vol 84 (4) ◽

pp. 2048-2062 ◽

Cited By ~ 198

Author(s):

Mitesh K. Kapadia ◽

Gerald Westheimer ◽

Charles D. Gilbert

Keyword(s):

Spatial Distribution ◽

Visual Cortex ◽

Primary Visual Cortex ◽

Cortical Neurons ◽

Receptive Fields ◽

Human Perception ◽

Spatial Arrangement ◽

Orthogonal Axis ◽

V1 Neurons ◽

Contextual Interactions

To examine the role of primary visual cortex in visuospatial integration, we studied the spatial arrangement of contextual interactions in the response properties of neurons in primary visual cortex of alert monkeys and in human perception. We found a spatial segregation of opposing contextual interactions. At the level of cortical neurons, excitatory interactions were located along the ends of receptive fields, while inhibitory interactions were strongest along the orthogonal axis. Parallel psychophysical studies in human observers showed opposing contextual interactions surrounding a target line with a similar spatial distribution. The results suggest that V1 neurons can participate in multiple perceptual processes via spatially segregated and functionally distinct components of their receptive fields.

Download Full-text

Local field potential phase and spike timing convey information about different visual features in primary visual cortex

BMC Neuroscience ◽

10.1186/1471-2202-12-s1-p248 ◽

2011 ◽

Vol 12 (S1) ◽

Author(s):

Alberto Mazzoni ◽

Christoph Kayser ◽

Yusuke Murayama ◽

Juan Martinez ◽

Rodrigo Quian Quiroga ◽

...

Keyword(s):

Visual Cortex ◽

Primary Visual Cortex ◽

Local Field ◽

Local Field Potential ◽

Field Potential ◽

Spike Timing ◽

Visual Features

Download Full-text

Representation of angles within continuous contours in V2 of macaque visual cortex and the spatial arrangement of the two half-line components

Neuroscience Research ◽

10.1016/j.neures.2011.07.302 ◽

2011 ◽

Vol 71 ◽

pp. e71

Author(s):

Minami Ito ◽

Kunihiro Asakawa

Keyword(s):

Visual Cortex ◽

Spatial Arrangement ◽

Half Line

Download Full-text

Levels of Representation in a Deep Learning Model of Categorization

10.1101/626374 ◽

2019 ◽

Cited By ~ 1

Author(s):

Olivia Guest ◽

Bradley C. Love

Keyword(s):

Visual Processing ◽

Medial Temporal Lobe ◽

Model Comparison ◽

Deep Convolutional Neural Networks ◽

Network Layers ◽

Shape Bias ◽

Memory Representations ◽

Complex Features ◽

Simple Features ◽

Natural Statistics

AbstractDeep convolutional neural networks (DCNNs) rival humans in object recognition. The layers (or levels of representation) in DCNNs have been successfully aligned with processing stages along the ventral stream for visual processing. Here, we propose a model of concept learning that uses visual representations from these networks to build memory representations of novel categories, which may rely on the medial temporal lobe (MTL) and medial prefrontal cortex (mPFC). Our approach opens up two possibilities:a) formal investigations can involve photographic stimuli as opposed to stimuli handcrafted and coded by the experimenter;b) model comparison can determine which level of representation within a DCNN a learner is using during categorization decisions. Pursuing the latter point, DCNNs suggest that the shape bias in children relies on representations at more advanced network layers whereas a learner that relied on lower network layers would display a color bias. These results confirm the role of natural statistics in the shape bias (i.e., shape is predictive of category membership) while highlighting that the type of statistics matter, i.e., those from lower or higher levels of representation. We use the same approach to provide evidence that pigeons performing seemingly sophisticated categorization of complex imagery may in fact be relying on representations that are very low-level (i.e., retinotopic). Although complex features, such as shape, relatively predominate at more advanced network layers, even simple features, such as spatial frequency and orientation, are better represented at the more advanced layers, contrary to a standard hierarchical view.

Download Full-text