object features
Recently Published Documents


TOTAL DOCUMENTS

325
(FIVE YEARS 92)

H-INDEX

36
(FIVE YEARS 4)

2021 ◽  
Author(s):  
Gaurav Malhotra ◽  
Marin Dujmovic ◽  
John Hummel ◽  
Jeffrey S Bowers

The success of Convolutional Neural Networks (CNNs) in classifying objects has led to a surge of interest in using these systems to understand human vision. Recent studies have argued that when CNNs are trained in the correct learning environment, they can emulate a key property of human vision -- learning to classify objects based on their shape. While showing a shape-bias is indeed a desirable property for any model of human object recognition, it is unclear whether the resulting shape representations learned by these networks are human-like. We explored this question in the context of a well-known observation from psychology showing that humans encode the shape of objects in terms of relations between object features. To check whether this is also true for the representations of CNNs, we ran a series of simulations where we trained CNNs on datasets of novel shapes and tested them on a set of controlled deformations of these shapes. We found that CNNs do not show any enhanced sensitivity to deformations which alter relations between features, even when explicitly trained on such deformations. This behaviour contrasted with human participants in previous studies as well as in a new experiment. We argue that these results are a consequence of a fundamental difference between how humans and CNNs learn to recognise objects: while CNNs select features that allow them to optimally classify the proximal stimulus, humans select features that they infer to be properties of the distal stimulus. This makes human representations more generalisable to novel contexts and tasks.


2021 ◽  
Vol 118 (49) ◽  
pp. e2115772118
Author(s):  
Aneesha K. Suresh ◽  
Charles M. Greenspon ◽  
Qinpu He ◽  
Joshua M. Rosenow ◽  
Lee E. Miller ◽  
...  

Tactile nerve fibers fall into a few classes that can be readily distinguished based on their spatiotemporal response properties. Because nerve fibers reflect local skin deformations, they individually carry ambiguous signals about object features. In contrast, cortical neurons exhibit heterogeneous response properties that reflect computations applied to convergent input from multiple classes of afferents, which confer to them a selectivity for behaviorally relevant features of objects. The conventional view is that these complex response properties arise within the cortex itself, implying that sensory signals are not processed to any significant extent in the two intervening structures—the cuneate nucleus (CN) and the thalamus. To test this hypothesis, we recorded the responses evoked in the CN to a battery of stimuli that have been extensively used to characterize tactile coding in both the periphery and cortex, including skin indentations, vibrations, random dot patterns, and scanned edges. We found that CN responses are more similar to their cortical counterparts than they are to their inputs: CN neurons receive input from multiple classes of nerve fibers, they have spatially complex receptive fields, and they exhibit selectivity for object features. Contrary to consensus, then, the CN plays a key role in processing tactile information.


2021 ◽  
Author(s):  
Taicheng Huang ◽  
Yiying Song ◽  
Jia Liu

Abstract Our mind can represent various objects from the physical world metaphorically into an abstract and complex high-dimensional object space, with a finite number of orthogonal axes encoding critical object features. However, little is known about what features serve as axes of the object space to critically affect object recognition. Here we asked whether the feature of objects’ real-world size constructed an axis of object space with deep convolutional neural networks (DCNNs) based on three criteria of sensitivity, independence and necessity that are impractical to be examined altogether with traditional approaches. A principal component analysis on features extracted by the DCNNs showed that objects’ real-world size was encoded by an independent axis, and the removal of this axis significantly impaired DCNN’s performance in recognizing objects. With a mutually-inspired paradigm of computational modeling and biological observation, we found that the shape of objects, rather than retinal size, co-occurrence, task demands and texture features, was necessary to represent the real-world size of objects for DCNNs and humans. In short, our study provided the first evidence supporting the feature of objects’ real-world size as an axis of object space, and devised a novel paradigm for future exploring the structure of object space.


Author(s):  
V. Sambasiva Rao ◽  
V. Mounika ◽  
N. Raghavendra Sai ◽  
G. Sai Chaitanya Kumar

2021 ◽  
Vol 12 (5) ◽  
pp. 1-18
Author(s):  
Min Wang ◽  
Congyan Lang ◽  
Liqian Liang ◽  
Songhe Feng ◽  
Tao Wang ◽  
...  

Semantic image synthesis is a new rising and challenging vision problem accompanied by the recent promising advances in generative adversarial networks. The existing semantic image synthesis methods only consider the global information provided by the semantic segmentation mask, such as class label, global layout, and location, so the generative models cannot capture the rich local fine-grained information of the images (e.g., object structure, contour, and texture). To address this issue, we adopt a multi-scale feature fusion algorithm to refine the generated images by learning the fine-grained information of the local objects. We propose OA-GAN, a novel object-attention generative adversarial network that allows attention-driven, multi-fusion refinement for fine-grained semantic image synthesis. Specifically, the proposed model first generates multi-scale global image features and local object features, respectively, then the local object features are fused into the global image features to improve the correlation between the local and the global. In the process of feature fusion, the global image features and the local object features are fused through the channel-spatial-wise fusion block to learn ‘what’ and ‘where’ to attend in the channel and spatial axes, respectively. The fused features are used to construct correlation filters to obtain feature response maps to determine the locations, contours, and textures of the objects. Extensive quantitative and qualitative experiments on COCO-Stuff, ADE20K and Cityscapes datasets demonstrate that our OA-GAN significantly outperforms the state-of-the-art methods.


2021 ◽  
Author(s):  
Kamila M Jozwik ◽  
Tim C Kietzmann ◽  
Radoslaw M Cichy ◽  
Nikolaus Kriegeskorte ◽  
Marieke Mur

Deep neural networks (DNNs) are promising models of the cortical computations supporting human object recognition. However, despite their ability to explain a significant portion of variance in neural data, the agreement between models and brain representational dynamics is far from perfect. Here, we address this issue by asking which representational features are currently unaccounted for in neural timeseries data, estimated for multiple areas of the human ventral stream via source-reconstructed magnetoencephalography (MEG) data. In particular, we focus on the ability of visuo-semantic models, consisting of human-generated labels of higher-level object features and categories, to explain variance beyond the explanatory power of DNNs alone. We report a gradual transition in the importance of visuo-semantic features from early to higher-level areas along the ventral stream. While early visual areas are better explained by DNN features, higher-level cortical dynamics are best accounted for by visuo-semantic models. These results suggest that current DNNs fail to fully capture the visuo-semantic features represented in higher-level human visual cortex and suggest a path towards more accurate models of ventral stream computations.


2021 ◽  
Author(s):  
Christopher A Henry ◽  
Adam Kohn

Visual perception depends strongly on spatial context. A profound example is visual crowding, whereby the presence of nearby stimuli impairs discriminability of object features. Despite extensive work on both perceptual crowding and the spatial integrative properties of visual cortical neurons, the link between these two aspects of visual processing remains unclear. To understand better the neural basis of crowding, we recorded simultaneously from neuronal populations in V1 and V4 of fixating macaque monkeys. We assessed the information about the orientation of a visual target available from the measured responses, both for targets presented in isolation and amid distractors. Both single neuron and population responses had less information about target orientation when distractors were present. Information loss was moderate in V1 and more substantial in V4. Information loss could be traced to systematic divisive and additive changes in neuronal tuning. Tuning changes were more severe in V4; in addition, tuning exhibited greater context-dependent distortions in V4, further restricting the ability of a fixed sensory readout strategy to extract accurate feature information across changing environments. Our results provide a direct test of crowding effects at different stages of the visual hierarchy, reveal how these effects alter the spiking activity of cortical populations by which sensory stimuli are encoded, and connect these changes to established mechanisms of neuronal spatial integration.


2021 ◽  
Vol 21 (9) ◽  
pp. 2288
Author(s):  
Chen Wei ◽  
Duan ziyi ◽  
Li wenwen ◽  
Ding xiaowei

2021 ◽  
Author(s):  
Platon Tikhonenko ◽  
Timothy F. Brady ◽  
Igor Utochkin

Previous work has shown that semantically meaningful properties of visually presented real-world objects, such as their color, their state/configuration of their parts/pose, or the features that differentiate them from other exemplars of the same category category, are stored with a high degree of independence in long-term memory (e.g., are frequently swapped or misbound across objects). But is this feature independence due to the visual representation of the objects, or because of verbal encoding? Semantically meaningful features can also be labeled by distinct words, which can be recombined to produce independent descriptions of real-world object features. Here, we directly test how much of the pattern of feature independence arises from visual vs. verbal encoding. In two experiments, during the study phase we orthogonally varied the match or mismatch of state (e.g., open/closed) and color information between images of objects and their verbal descriptions (Experiment 1) or between images of two exemplars from the same category (Experiment 2). At test, observers had to choose a previously presented image or description in a 4-AFC task. Whereas in Experiment 1 we found quite a small effect of visual-verbal mismatch on memory for images, the effect of mismatch between exemplars in Experiment 2 was dramatic: memory for a feature was reasonably good when it matched between exemplars, but dropped to chance otherwise. Importantly, this effect was observed both for color and object state independently. We conclude that independent, feature-based storage of objects in long-term memory is provided primarily by visual representations with possible minor influences of verbal encoding.


Sign in / Sign up

Export Citation Format

Share Document