scholarly journals Bottom-up processing of curvilinear visual features is sufficient for animate/inanimate object categorization

2018 ◽  
Vol 18 (12) ◽  
pp. 3 ◽  
Author(s):  
Valentinos Zachariou ◽  
Amanda C. Del Giacco ◽  
Leslie G. Ungerleider ◽  
Xiaomin Yue
2018 ◽  
Vol 18 (10) ◽  
pp. 388
Author(s):  
Amanda Del Giacco ◽  
Valentinos Zachariou ◽  
Leslie Ungerleider ◽  
Xiaomin Yue

2020 ◽  
Author(s):  
Vladislav Khvostov ◽  
Yuri Markov ◽  
Timothy F. Brady ◽  
Igor Utochkin

Many studies have shown that people can rapidly and efficiently categorize the animacy of individual objects and scenes, even with few visual features available. Does this necessarily mean that the visual system has an unlimited capacity to process animacy across the entire visual field? We tested this in an ensemble task requiring observers to judge the relative numerosity of animate vs. inanimate items in briefly presented sets of multiple objects. We generated a set of morphed “animacy continua” between pairs of animal and inanimate object silhouettes and tested them in both individual object categorization and ensemble enumeration. For the ensemble task, we manipulated the ratio between animate and inanimate items present in the display and we also presented two types of animacy distributions: “segmentable” (including only definitely animate and definitely inanimate items) or “non-segmentable” (middle-value, ambiguous morphs pictures were shown along with the definite “extremes”). Our results showed that observers failed to integrate animacy information from multiple items, as they showed very poor performance in the ensemble task and were not sensitive to the distribution type despite their categorization rate for individual objects being near 100%. A control condition using the same design with color as a category-defining dimension elicited both good individual object and ensemble categorization performance and a strong effect of the segmentability type. We conclude that good individual categorization does not necessarily allow people to build ensemble animacy representations, thus showing the limited capacity of animacy perception.


2020 ◽  
Author(s):  
Franziska Pellegrini ◽  
David J Hawellek ◽  
Anna-Antonia Pape ◽  
Joerg F Hipp ◽  
Markus Siegel

Abstract Synchronized neuronal population activity in the gamma-frequency range (>30 Hz) correlates with the bottom-up drive of various visual features. It has been hypothesized that gamma-band synchronization enhances the gain of neuronal representations, yet evidence remains sparse. We tested a critical prediction of the gain hypothesis, which is that features that drive synchronized gamma-band activity interact super-linearly. To test this prediction, we employed whole-head magnetencephalography in human subjects and investigated if the strength of visual motion (motion coherence) and luminance contrast interact in driving gamma-band activity in visual cortex. We found that gamma-band activity (64–128 Hz) monotonically increased with coherence and contrast, while lower frequency activity (8–32 Hz) decreased with both features. Furthermore, as predicted for a gain mechanism, we found a multiplicative interaction between motion coherence and contrast in their joint drive of gamma-band activity. The lower frequency activity did not show such an interaction. Our findings provide evidence that gamma-band activity acts as a cortical gain mechanism that nonlinearly combines the bottom-up drive of different visual features.


2009 ◽  
Vol 21 (1) ◽  
pp. 239-271 ◽  
Author(s):  
Dashan Gao ◽  
Nuno Vasconcelos

A decision-theoretic formulation of visual saliency, first proposed for top-down processing (object recognition) (Gao & Vasconcelos, 2005a ), is extended to the problem of bottom-up saliency. Under this formulation, optimality is defined in the minimum probability of error sense, under a constraint of computational parsimony. The saliency of the visual features at a given location of the visual field is defined as the power of those features to discriminate between the stimulus at the location and a null hypothesis. For bottom-up saliency, this is the set of visual features that surround the location under consideration. Discrimination is defined in an information-theoretic sense and the optimal saliency detector derived for a class of stimuli that complies with known statistical properties of natural images. It is shown that under the assumption that saliency is driven by linear filtering, the optimal detector consists of what is usually referred to as the standard architecture of V1: a cascade of linear filtering, divisive normalization, rectification, and spatial pooling. The optimal detector is also shown to replicate the fundamental properties of the psychophysics of saliency: stimulus pop-out, saliency asymmetries for stimulus presence versus absence, disregard of feature conjunctions, and Weber's law. Finally, it is shown that the optimal saliency architecture can be applied to the solution of generic inference problems. In particular, for the class of stimuli studied, it performs the three fundamental operations of statistical inference: assessment of probabilities, implementation of Bayes decision rule, and feature selection.


2008 ◽  
Vol 46 (7) ◽  
pp. 2033-2042 ◽  
Author(s):  
Annerose Engel ◽  
Michael Burke ◽  
Katja Fiehler ◽  
Siegfried Bien ◽  
Frank Rösler

Author(s):  
Weitao Jiang ◽  
Weixuan Wang ◽  
Haifeng Hu

Image Captioning, which automatically describes an image with natural language, is regarded as a fundamental challenge in computer vision. In recent years, significant advance has been made in image captioning through improving attention mechanism. However, most existing methods construct attention mechanisms based on singular visual features, such as patch features or object features, which limits the accuracy of generated captions. In this article, we propose a Bidirectional Co-Attention Network (BCAN) that combines multiple visual features to provide information from different aspects. Different features are associated with predicting different words, and there are a priori relations between these multiple visual features. Based on this, we further propose a bottom-up and top-down bi-directional co-attention mechanism to extract discriminative attention information. Furthermore, most existing methods do not exploit an effective multimodal integration strategy, generally using addition or concatenation to combine features. To solve this problem, we adopt the Multivariate Residual Module (MRM) to integrate multimodal attention features. Meanwhile, we further propose a Vertical MRM to integrate features of the same category, and a Horizontal MRM to combine features of the different categories, which can balance the contribution of the bottom-up co-attention and the top-down co-attention. In contrast to the existing methods, the BCAN is able to obtain complementary information from multiple visual features via the bi-directional co-attention strategy, and integrate multimodal information via the improved multivariate residual strategy. We conduct a series of experiments on two benchmark datasets (MSCOCO and Flickr30k), and the results indicate that the proposed BCAN achieves the superior performance.


2021 ◽  
Author(s):  
Daniel Janini ◽  
Chris Hamblin ◽  
Arturo Deza ◽  
Talia Konkle

After years of experience, humans become experts at perceiving letters. Is this visual capacity attained by learning specialized letter features, or by reusing general visual features previously learned in service of object categorization? To investigate this question, we first measured the visual representational space for letters in two behavioral tasks, visual search and letter categorization. Then, we created models of specialized letter features and general object-based features by training deep convolutional neural networks on either 26-way letter categorization or 1000-way object categorization, respectively. We found that general object-based features accounted well for the visual similarity of letters measured in both behavioral tasks, while letter-specialized features did not. Further, several approaches to alter object-based features with letter specialization did not improve the match to human behavior. Our findings provide behavioral-computational evidence that the perception of letters depends on general visual features rather than a specialized feature space.


2019 ◽  
Author(s):  
Franziska Pellegrini ◽  
David J Hawellek ◽  
Anna-Antonia Pape ◽  
Joerg F Hipp ◽  
Markus Siegel

AbstractSynchronized neuronal population activity in the gamma-frequency range (> 30 Hz) correlates with the bottom-up drive of various visual features. It has been hypothesized that gamma-band synchronization enhances the gain of neuronal representations, yet evidence remains sparse. We tested a critical prediction of the gain hypothesis, which is that features that drive synchronized gamma-band activity interact super-linearly. To test this prediction, we employed whole-head magnetencephalography (MEG) in human subjects and investigated if the strength of visual motion (motion coherence) and luminance contrast interact in driving gamma-band activity in visual cortex. We found that gamma-band activity (64 to 128 Hz) monotonically increased with coherence and contrast while lower frequency activity (8 to 32 Hz) decreased with both features. Furthermore, as predicted for a gain mechanism, we found a multiplicative interaction between motion coherence and contrast in their joint drive of gamma-band activity. The lower frequency activity did not show such an interaction. Our findings provide evidence, that gamma-band activity acts as a cortical gain mechanism that nonlinearly combines the bottom-up drive of different visual features in support of visually guided behavior.


2015 ◽  
Vol 6 ◽  
Author(s):  
Omid Kardan ◽  
Emre Demiralp ◽  
Michael C. Hout ◽  
MaryCarol R. Hunter ◽  
Hossein Karimi ◽  
...  
Keyword(s):  

Sign in / Sign up

Export Citation Format

Share Document