Top-down analysis of low-level object relatedness leading to semantic understanding of medieval image collections

Author(s):  
Pradeep Yarlagadda ◽  
Antonio Monroy ◽  
Bernd Carque ◽  
Bjorn Ommer
2021 ◽  
Author(s):  
◽  
Ibrahim Mohammad Hussain Rahman

<p>The human visual attention system (HVA) encompasses a set of interconnected neurological modules that are responsible for analyzing visual stimuli by attending to those regions that are salient. Two contrasting biological mechanisms exist in the HVA systems; bottom-up, data-driven attention and top-down, task-driven attention. The former is mostly responsible for low-level instinctive behaviors, while the latter is responsible for performing complex visual tasks such as target object detection.  Very few computational models have been proposed to model top-down attention, mainly due to three reasons. The first is that the functionality of top-down process involves many influential factors. The second reason is that there is a diversity in top-down responses from task to task. Finally, many biological aspects of the top-down process are not well understood yet.  For the above reasons, it is difficult to come up with a generalized top-down model that could be applied to all high level visual tasks. Instead, this thesis addresses some outstanding issues in modelling top-down attention for one particular task, target object detection. Target object detection is an essential step for analyzing images to further perform complex visual tasks. Target object detection has not been investigated thoroughly when modelling top-down saliency and hence, constitutes the may domain application for this thesis.  The thesis will investigate methods to model top-down attention through various high-level data acquired from images. Furthermore, the thesis will investigate different strategies to dynamically combine bottom-up and top-down processes to improve the detection accuracy, as well as the computational efficiency of the existing and new visual attention models. The following techniques and approaches are proposed to address the outstanding issues in modelling top-down saliency:  1. A top-down saliency model that weights low-level attentional features through contextual knowledge of a scene. The proposed model assigns weights to features of a novel image by extracting a contextual descriptor of the image. The contextual descriptor plays the role of tuning the weighting of low-level features to maximize detection accuracy. By incorporating context into the feature weighting mechanism we improve the quality of the assigned weights to these features.  2. Two modules of target features combined with contextual weighting to improve detection accuracy of the target object. In this proposed model, two sets of attentional feature weights are learned, one through context and the other through target features. When both sources of knowledge are used to model top-down attention, a drastic increase in detection accuracy is achieved in images with complex backgrounds and a variety of target objects.  3. A top-down and bottom-up attention combination model based on feature interaction. This model provides a dynamic way for combining both processes by formulating the problem as feature selection. The feature selection exploits the interaction between these features, yielding a robust set of features that would maximize both the detection accuracy and the overall efficiency of the system.  4. A feature map quality score estimation model that is able to accurately predict the detection accuracy score of any previously novel feature map without the need of groundtruth data. The model extracts various local, global, geometrical and statistical characteristic features from a feature map. These characteristics guide a regression model to estimate the quality of a novel map.  5. A dynamic feature integration framework for combining bottom-up and top-down saliencies at runtime. If the estimation model is able to predict the quality score of any novel feature map accurately, then it is possible to perform dynamic feature map integration based on the estimated value. We propose two frameworks for feature map integration using the estimation model. The proposed integration framework achieves higher human fixation prediction accuracy with minimum number of feature maps than that achieved by combining all feature maps.  The proposed works in this thesis provide new directions in modelling top-down saliency for target object detection. In addition, dynamic approaches for top-down and bottom-up combination show considerable improvements over existing approaches in both efficiency and accuracy.</p>


2020 ◽  
Vol 41 (5) ◽  
pp. 1045-1059
Author(s):  
Alan Chi Lun Yu ◽  
Carol Kit Sum To

AbstractThe ability to take contextual information into account is essential for successful speech processing. This study examines individuals with high-functioning autism and those without in terms of how they adjust their perceptual expectation while discriminating speech sounds in different phonological contexts. Listeners were asked to discriminate pairs of sibilant-vowel monosyllables. Typically, discriminability of sibilants increases when the sibilants are embedded in perceptually enhancing contexts (if the appropriate context-specific perceptual adjustment were performed) and decreases in perceptually diminishing contexts. This study found a reduction in the differences in perceptual response across enhancing and diminishing contexts among high-functioning autistic individuals relative to the neurotypical controls. The reduction in perceptual expectation adjustment is consistent with an increase in autonomy in low-level perceptual processing in autism and a reduction in the influence of top-down information from surrounding information.


Perception ◽  
2016 ◽  
Vol 46 (1) ◽  
pp. 31-49 ◽  
Author(s):  
Mick Zeljko ◽  
Philip M. Grove

The stream-bounce effect refers to a bistable motion stimulus that is interpreted as two targets either “streaming” past or “bouncing” off one another, and the manipulations that bias responses. Directional bias, according to Bertenthal et al., is an account of the effect proposing that low-level motion integration promotes streaming, and its disruption leads to bouncing, and it is sometimes cited either directly in a bottom-up fashion or indirectly under top-down control despite Sekuler and Sekuler finding evidence inconsistent with it. We tested two key aspects of the hypothesis: (a) comparable changes in speed should produce comparable disruptions and lead to similar effects; and (b) speed changes alone should disrupt integration without the need for additional more complex changes of motion. We found that target motion influences stream-bounce perception, but not as directional bias predicts. Our results support Sekuler and Sekuler and argue against the low-level motion signals driving perceptual outcomes in stream-bounce displays (directly or indirectly) and point to higher level inferential processes involving perceptual history and expectation. Directional bias as a mechanism should be abandoned and either another specific bottom-up process must be proposed and tested or consideration should be given to top-down factors alone driving the effect.


1992 ◽  
Vol 45 (1) ◽  
pp. 1-20 ◽  
Author(s):  
Bruno H. Repp ◽  
Ram Frost ◽  
Elizabeth Zsiga

In two experiments, we investigated whether simultaneous speech reading can influence the detection of speech in envelope-matched noise. Subjects attempted to detect the presence of a disyllabic utterance in noise while watching a speaker articulate a matching or a non-matching utterance. Speech detection was not facilitated by an audio-visual match, which suggests that listeners relied on low-level auditory cues whose perception was immune to cross-modal top-down influences. However, when the stimuli were words (Experiment 1), there was a (predicted) relative shift in bias, suggesting that the masking noise itself was perceived as more speechlike when its envelope corresponded to the visual information. This bias shift was absent, however, with non-word materials (Experiment 2). These results, which resemble earlier findings obtained with orthographic visual input, indicate that the mapping from sight to sound is lexically mediated even when, as in the case of the articulatory-phonetic correspondence, the cross-modal relationship is non-arbitrary.


2002 ◽  
Vol 13 (3) ◽  
pp. 357-361 ◽  
Author(s):  
Elodie Varraine ◽  
Mireille Bonnard ◽  
Jean Pailhous
Keyword(s):  
Top Down ◽  

1998 ◽  
Vol 21 (1) ◽  
pp. 17-18 ◽  
Author(s):  
Hervé Abdi ◽  
Dominique Valentin ◽  
Betty G. Edelman

Eigenfeatures are created by the principal component approach (PCA) used on objects described by a low-level code (i.e., pixels, Gabor jets). We suggest that eigenfeatures act like the flexible features described by Schyns et al. They are particularly suited for face processing and give rise to class-specific effects such as the other-race effect. The PCA approach can be modified to accommodate top-down constraints.


2016 ◽  
Vol 29 (6-7) ◽  
pp. 557-583 ◽  
Author(s):  
Emiliano Macaluso ◽  
Uta Noppeney ◽  
Durk Talsma ◽  
Tiziana Vercillo ◽  
Jess Hartcher-O’Brien ◽  
...  

The role attention plays in our experience of a coherent, multisensory world is still controversial. On the one hand, a subset of inputs may be selected for detailed processing and multisensory integration in a top-down manner, i.e., guidance of multisensory integration by attention. On the other hand, stimuli may be integrated in a bottom-up fashion according to low-level properties such as spatial coincidence, thereby capturing attention. Moreover, attention itself is multifaceted and can be describedviaboth top-down and bottom-up mechanisms. Thus, the interaction between attention and multisensory integration is complex and situation-dependent. The authors of this opinion paper are researchers who have contributed to this discussion from behavioural, computational and neurophysiological perspectives. We posed a series of questions, the goal of which was to illustrate the interplay between bottom-up and top-down processes in various multisensory scenarios in order to clarify the standpoint taken by each author and with the hope of reaching a consensus. Although divergence of viewpoint emerges in the current responses, there is also considerable overlap: In general, it can be concluded that the amount of influence that attention exerts on MSI depends on the current task as well as prior knowledge and expectations of the observer. Moreover stimulus properties such as the reliability and salience also determine how open the processing is to influences of attention.


2016 ◽  
Vol 22 (1) ◽  
pp. 49-75 ◽  
Author(s):  
Simon Hickinbotham ◽  
Edward Clark ◽  
Adam Nellis ◽  
Susan Stepney ◽  
Tim Clarke ◽  
...  

Automata chemistries are good vehicles for experimentation in open-ended evolution, but they are by necessity complex systems whose low-level properties require careful design. To aid the process of designing automata chemistries, we develop an abstract model that classifies the features of a chemistry from a physical (bottom up) perspective and from a biological (top down) perspective. There are two levels: things that can evolve, and things that cannot. We equate the evolving level with biology and the non-evolving level with physics. We design our initial organisms in the biology, so they can evolve. We design the physics to facilitate evolvable biologies. This architecture leads to a set of design principles that should be observed when creating an instantiation of the architecture. These principles are Everything Evolves, Everything's Soft, and Everything Dies. To evaluate these ideas, we present experiments in the recently developed Stringmol automata chemistry. We examine the properties of Stringmol with respect to the principles, and so demonstrate the usefulness of the principles in designing automata chemistries.


Sign in / Sign up

Export Citation Format

Share Document