scholarly journals Texture-like representation of objects in human visual cortex

2022 ◽  
Author(s):  
Akshay Vivek Jagadeesh ◽  
Justin Gardner

The human visual ability to recognize objects and scenes is widely thought to rely on representations in category-selective regions of visual cortex. These representations could support object vision by specifically representing objects, or, more simply, by representing complex visual features regardless of the particular spatial arrangement needed to constitute real world objects. That is, by representing visual textures. To discriminate between these hypotheses, we leveraged an image synthesis approach that, unlike previous methods, provides independent control over the complexity and spatial arrangement of visual features. We found that human observers could easily detect a natural object among synthetic images with similar complex features that were spatially scrambled. However, observer models built from BOLD responses from category-selective regions, as well as a model of macaque inferotemporal cortex and Imagenet-trained deep convolutional neural networks, were all unable to identify the real object. This inability was not due to a lack of signal-to-noise, as all of these observer models could predict human performance in image categorization tasks. How then might these texture-like representations in category-selective regions support object perception? An image-specific readout from category-selective cortex yielded a representation that was more selective for natural feature arrangement, showing that the information necessary for object discrimination is available. Thus, our results suggest that the role of human category-selective visual cortex is not to explicitly encode objects but rather to provide a basis set of texture-like features that can be infinitely reconfigured to flexibly learn and identify new object categories.

2021 ◽  
Author(s):  
Moshe Gur

Object recognition models have at their core similar essential characteristics: feature extraction and hierarchical convergence leading to a code that is unique to each object and immune to variations in the object appearance. To compare computational, biologically-feasible models to human performance, subjects viewed objects displayed at a wide range of orientations and sizes, and were able to recognize them almost perfectly. These empirical results, together with consideration of thought experiments and analysis of everyday perceptual performance, lead to a conclusion that biologically-plausible object perception models do not even come close to matching our perceptual abilities. We can categorize many thousands of objects, discriminate between enormous numbers of different exemplars within each category, and recognize an object as unique although it may appear in countless variations—most of which have never been seen. This seemingly technical, quantitative failure stems from a fundamental property of our perception: the ability to perceive spatial information instantaneously and in parallel, retain details including their relative properties, and yet be able to integrate details into a meaningful percept such as an object. I present an alternative view of object perception whereby objects are represented by responses in primary visual cortex (V1) which is the only cortical area responding to small spatial elements. The rest of the visual cortex is dedicated to scene understanding and interpretation such as constructing 3D percepts from 2D inputs, coding motion, categorization and memories. Since our perception abilities cannot be explained by convergence to 'object cells' or by interactions implemented by axonal transmissions, a parallel-to-parallel field-like process is suggested. In this view, spatial information is not modified by multiple neural interactions but is retained by affecting changes in a 'neural field' which preserves the identity of individual elements while enabling a new holistic percept when these elements change.


2013 ◽  
Vol 31 (2) ◽  
pp. 189-195 ◽  
Author(s):  
Youping Xiao

AbstractThe short-wavelength-sensitive (S) cones play an important role in color vision of primates, and may also contribute to the coding of other visual features, such as luminance and motion. The color signals carried by the S cones and other cone types are largely separated in the subcortical visual pathway. Studies on nonhuman primates or humans have suggested that these signals are combined in the striate cortex (V1) following a substantial amplification of the S-cone signals in the same area. In addition to reviewing these studies, this review describes the circuitry in V1 that may underlie the processing of the S-cone signals and the dynamics of this processing. It also relates the interaction between various cone signals in V1 to the results of some psychophysical and physiological studies on color perception, which leads to a discussion of a previous model, in which color perception is produced by a multistage processing of the cone signals. Finally, I discuss the processing of the S-cone signals in the extrastriate area V2.


2021 ◽  
Author(s):  
Marek A. Pedziwiatr ◽  
Elisabeth von dem Hagen ◽  
Christoph Teufel

Humans constantly move their eyes to explore the environment and obtain information. Competing theories of gaze guidance consider the factors driving eye movements within a dichotomy between low-level visual features and high-level object representations. However, recent developments in object perception indicate a complex and intricate relationship between features and objects. Specifically, image-independent object-knowledge can generate objecthood by dynamically reconfiguring how feature space is carved up by the visual system. Here, we adopt this emerging perspective of object perception, moving away from the simplifying dichotomy between features and objects in explanations of gaze guidance. We recorded eye movements in response to stimuli that appear as meaningless patches on initial viewing but are experienced as coherent objects once relevant object-knowledge has been acquired. We demonstrate that gaze guidance differs substantially depending on whether observers experienced the same stimuli as meaningless patches or organised them into object representations. In particular, fixations on identical images became object-centred, less dispersed, and more consistent across observers once exposed to relevant prior object-knowledge. Observers' gaze behaviour also indicated a shift from exploratory information-sampling to a strategy of extracting information mainly from selected, object-related image areas. These effects were evident from the first fixations on the image. Importantly, however, eye-movements were not fully determined by object representations but were best explained by a simple model that integrates image-computable features and high-level, knowledge-dependent object representations. Overall, the results show how information sampling via eye-movements in humans is guided by a dynamic interaction between image-computable features and knowledge-driven perceptual organisation.


2017 ◽  
Author(s):  
Daniel Kaiser ◽  
Marius V. Peelen

AbstractTo optimize processing, the human visual system utilizes regularities present in naturalistic visual input. One of these regularities is the relative position of objects in a scene (e.g., a sofa in front of a television), with behavioral research showing that regularly positioned objects are easier to perceive and to remember. Here we use fMRI to test how positional regularities are encoded in the visual system. Participants viewed pairs of objects that formed minimalistic two-object scenes (e.g., a “living room” consisting of a sofa and television) presented in their regularly experienced spatial arrangement or in an irregular arrangement (with interchanged positions). Additionally, single objects were presented centrally and in isolation. Multi-voxel activity patterns evoked by the object pairs were modeled as the average of the response patterns evoked by the two single objects forming the pair. In two experiments, this approximation in object-selective cortex was significantly less accurate for the regularly than the irregularly positioned pairs, indicating integration of individual object representations. More detailed analysis revealed a transition from independent to integrative coding along the posterior-anterior axis of the visual cortex, with the independent component (but not the integrative component) being almost perfectly predicted by object selectivity across the visual hierarchy. These results reveal a transitional stage between individual object and multi-object coding in visual cortex, providing a possible neural correlate of efficient processing of regularly positioned objects in natural scenes.


Author(s):  
N Seijdel ◽  
N Tsakmakidis ◽  
EHF De Haan ◽  
SM Bohte ◽  
HS Scholte

AbstractFeedforward deep convolutional neural networks (DCNNs) are, under specific conditions, matching and even surpassing human performance in object recognition in natural scenes. This performance suggests that the analysis of a loose collection of image features could support the recognition of natural object categories, without dedicated systems to solve specific visual subtasks. Research in humans however suggests that while feedforward activity may suffice for sparse scenes with isolated objects, additional visual operations (‘routines’) that aid the recognition process (e.g. segmentation or grouping) are needed for more complex scenes. Linking human visual processing to performance of DCNNs with increasing depth, we here explored if, how, and when object information is differentiated from the backgrounds they appear on. To this end, we controlled the information in both objects and backgrounds, as well as the relationship between them by adding noise, manipulating background congruence and systematically occluding parts of the image. Results indicate that with an increase in network depth, there is an increase in the distinction between object- and background information. For more shallow networks, results indicated a benefit of training on segmented objects. Overall, these results indicate that, de facto, scene segmentation can be performed by a network of sufficient depth. We conclude that the human brain could perform scene segmentation in the context of object identification without an explicit mechanism, by selecting or “binding” features that belong to the object and ignoring other features, in a manner similar to a very deep convolutional neural network.


2000 ◽  
Vol 84 (4) ◽  
pp. 2048-2062 ◽  
Author(s):  
Mitesh K. Kapadia ◽  
Gerald Westheimer ◽  
Charles D. Gilbert

To examine the role of primary visual cortex in visuospatial integration, we studied the spatial arrangement of contextual interactions in the response properties of neurons in primary visual cortex of alert monkeys and in human perception. We found a spatial segregation of opposing contextual interactions. At the level of cortical neurons, excitatory interactions were located along the ends of receptive fields, while inhibitory interactions were strongest along the orthogonal axis. Parallel psychophysical studies in human observers showed opposing contextual interactions surrounding a target line with a similar spatial distribution. The results suggest that V1 neurons can participate in multiple perceptual processes via spatially segregated and functionally distinct components of their receptive fields.


2011 ◽  
Vol 12 (S1) ◽  
Author(s):  
Alberto Mazzoni ◽  
Christoph Kayser ◽  
Yusuke Murayama ◽  
Juan Martinez ◽  
Rodrigo Quian Quiroga ◽  
...  

2019 ◽  
Author(s):  
Olivia Guest ◽  
Bradley C. Love

AbstractDeep convolutional neural networks (DCNNs) rival humans in object recognition. The layers (or levels of representation) in DCNNs have been successfully aligned with processing stages along the ventral stream for visual processing. Here, we propose a model of concept learning that uses visual representations from these networks to build memory representations of novel categories, which may rely on the medial temporal lobe (MTL) and medial prefrontal cortex (mPFC). Our approach opens up two possibilities:a) formal investigations can involve photographic stimuli as opposed to stimuli handcrafted and coded by the experimenter;b) model comparison can determine which level of representation within a DCNN a learner is using during categorization decisions. Pursuing the latter point, DCNNs suggest that the shape bias in children relies on representations at more advanced network layers whereas a learner that relied on lower network layers would display a color bias. These results confirm the role of natural statistics in the shape bias (i.e., shape is predictive of category membership) while highlighting that the type of statistics matter, i.e., those from lower or higher levels of representation. We use the same approach to provide evidence that pigeons performing seemingly sophisticated categorization of complex imagery may in fact be relying on representations that are very low-level (i.e., retinotopic). Although complex features, such as shape, relatively predominate at more advanced network layers, even simple features, such as spatial frequency and orientation, are better represented at the more advanced layers, contrary to a standard hierarchical view.


Sign in / Sign up

Export Citation Format

Share Document