Pop-Out: A New Cognitive Model of Visual Attention That Uses Light Level Analysis to Better Mimic the Free-Viewing Task of Static Images

Human gaze is not directed to the same part of an image when lighting conditions change. Current saliency models do not consider light level analysis during their bottom-up processes. In this paper, we introduce a new saliency model which better mimics physiological and psychological processes of our visual attention in case of free-viewing task (bottom-up process). This model analyzes lighting conditions with the aim of giving different weights to color wavelengths. The resulting saliency measure performs better than a lot of popular cognitive approaches.

Download Full-text

Properties of visually-guided saccadic behavior and bottom-up attention in marmoset, macaque, and human

Journal of Neurophysiology ◽

10.1152/jn.00312.2020 ◽

2020 ◽

Author(s):

Chih-Yang Chen ◽

Denis Matrov ◽

Richard Edmund Veale ◽

Hirotaka Onoe ◽

Masatoshi Yoshida ◽

...

Keyword(s):

New World ◽

World Monkey ◽

Gap Effect ◽

Visually Guided ◽

Express Saccades ◽

Bottom Up ◽

Cortical Areas ◽

Saliency Model ◽

Free Viewing ◽

The Brain

Saccades are stereotypic behaviors whose investigation improves our understanding of how primate brains implement precise motor control. Furthermore, saccades offer an important window into the cognitive and attentional state of the brain. Historically, saccade studies have largely relied on macaque. However, the cortical network giving rise to the saccadic command is difficult to study in macaque because relevant cortical areas lie in deep sulci and are difficult to access. Recently, a New World monkey -the marmoset- has garnered attention as an alternative to macaque because of advantages including its smooth cortical surface. However, adoption of marmoset for oculomotor research has been limited due to a lack of in-depth descriptions of marmoset saccade kinematics and their ability to perform psychophysical tasks. Here, we directly compare free-viewing and visually-guided behavior of marmoset, macaque, and human engaged in identical tasks under similar conditions. In video free-viewing task, all species exhibited qualitatively similar saccade kinematics up to 25º in amplitude although with different parameters. Furthermore, the conventional bottom-up saliency model predicted gaze targets at similar rates for all species. We further verified their visually-guided behavior by training them with step and gap saccade tasks. In the step paradigm, marmoset did not show shorter saccade reaction time for upward saccades whereas macaque and human did. In the gap paradigm, all species showed similar gap effect and express saccades. Our results suggest that the marmoset can serve as a model for oculomotor, attentional, and cognitive research while being aware of their difference from macaque or human.

Download Full-text

Separation between top-down and bottom-up control of visual attention

PsycEXTRA Dataset ◽

10.1037/e527352012-134 ◽

2006 ◽

Author(s):

Xingshan Li ◽

Kyle R. Cave

Keyword(s):

Visual Attention ◽

Top Down ◽

Bottom Up

Download Full-text

Bottom-up and Layerwise Domain Adaptation for Pedestrian Detection in Thermal Images

ACM Transactions on Multimedia Computing Communications and Applications ◽

10.1145/3418213 ◽

2021 ◽

Vol 17 (1) ◽

pp. 1-19

Author(s):

My Kieu ◽

Andrew D. Bagdanov ◽

Marco Bertini

Keyword(s):

Domain Adaptation ◽

State Of The Art ◽

Pedestrian Detection ◽

Challenging Problem ◽

Top Down ◽

Bottom Up ◽

Security Applications ◽

Lighting Conditions ◽

Initial Layers ◽

Single Modality

Pedestrian detection is a canonical problem for safety and security applications, and it remains a challenging problem due to the highly variable lighting conditions in which pedestrians must be detected. This article investigates several domain adaptation approaches to adapt RGB-trained detectors to the thermal domain. Building on our earlier work on domain adaptation for privacy-preserving pedestrian detection, we conducted an extensive experimental evaluation comparing top-down and bottom-up domain adaptation and also propose two new bottom-up domain adaptation strategies. For top-down domain adaptation, we leverage a detector pre-trained on RGB imagery and efficiently adapt it to perform pedestrian detection in the thermal domain. Our bottom-up domain adaptation approaches include two steps: first, training an adapter segment corresponding to initial layers of the RGB-trained detector adapts to the new input distribution; then, we reconnect the adapter segment to the original RGB-trained detector for final adaptation with a top-down loss. To the best of our knowledge, our bottom-up domain adaptation approaches outperform the best-performing single-modality pedestrian detection results on KAIST and outperform the state of the art on FLIR.

Download Full-text

Deep Learning Methods for Classification of Certain Abnormalities in Echocardiography

Electronics ◽

10.3390/electronics10040495 ◽

2021 ◽

Vol 10 (4) ◽

pp. 495

Author(s):

Imayanmosha Wahlang ◽

Arnab Kumar Maji ◽

Goutam Saha ◽

Prasun Chakrabarti ◽

Michal Jasinski ◽

...

Keyword(s):

Deep Learning ◽

Short Term Memory ◽

Support Vector ◽

Variational Autoencoder ◽

Different Types ◽

Static Images ◽

Long Short Term Memory ◽

2D And 3D ◽

Better Than

This article experiments with deep learning methodologies in echocardiogram (echo), a promising and vigorously researched technique in the preponderance field. This paper involves two different kinds of classification in the echo. Firstly, classification into normal (absence of abnormalities) or abnormal (presence of abnormalities) has been done, using 2D echo images, 3D Doppler images, and videographic images. Secondly, based on different types of regurgitation, namely, Mitral Regurgitation (MR), Aortic Regurgitation (AR), Tricuspid Regurgitation (TR), and a combination of the three types of regurgitation are classified using videographic echo images. Two deep-learning methodologies are used for these purposes, a Recurrent Neural Network (RNN) based methodology (Long Short Term Memory (LSTM)) and an Autoencoder based methodology (Variational AutoEncoder (VAE)). The use of videographic images distinguished this work from the existing work using SVM (Support Vector Machine) and also application of deep-learning methodologies is the first of many in this particular field. It was found that deep-learning methodologies perform better than SVM methodology in normal or abnormal classification. Overall, VAE performs better in 2D and 3D Doppler images (static images) while LSTM performs better in the case of videographic images.

Download Full-text

Visual Attention and Aging: an Age-Related Cognitive Model of Positive Adaptation

European Journal of Investigation in Health, Psychology and Education ◽

10.3390/ejihpe4030017 ◽

2014 ◽

Vol 4 (3) ◽

pp. 181-191

Author(s):

José Manuel Rodríguez-Ferrer

Keyword(s):

Young People ◽

Visual Attention ◽

Response Times ◽

Age Groups ◽

Cognitive Model ◽

Covert Attention ◽

Positive Adaptation ◽

Healthy Elderly ◽

Age Related ◽

Healthy Young People

We have studied the effects of normal aging on visual attention. Have participated a group of 38 healthy elderly people with an average age of 67.8 years and a group of 39 healthy young people with average age of 19.2 years. In a first experiment of visual detection, response times were recorded, with and without covert attention, to the presentation of stimuli (0.5º in diameter grey circles) appearing in three eccentricities (2.15, 3.83 and 5.53° of visual field) and with three levels of contrast (6, 16 and 78%). In a second experiment of visual form discrimination circles and squares with the same features as in the previous experiment were presented, but in this case subjects only should respond to the emergence of the circles. In both age groups, the covert attention reduced response times. Compared to young people, the older group achieved better results in some aspects of attention tests and response times were reduced more in the stimuli of greater eccentricity. The data suggest that there is a mechanism of adaptation in aging, in which visual attention especially favors the perception of those stimuli more difficult to detec

Download Full-text

The Research of Bottom-Up and Top-Down Combination Visual Attention Calculative Method

Procedia Engineering ◽

10.1016/j.proeng.2012.01.523 ◽

2012 ◽

Vol 29 ◽

pp. 3520-3524

Author(s):

Hui Wang ◽

Gang Liu ◽

Yuanyuan Dang

Keyword(s):

Visual Attention ◽

Top Down ◽

Bottom Up

Download Full-text

How can expert knowledge increase the realism of conceptual hydrological models? A case study based on the concept of dominant runoff process in the Swiss Pre-Alps

Hydrology and Earth System Sciences ◽

10.5194/hess-22-4425-2018 ◽

2018 ◽

Vol 22 (8) ◽

pp. 4425-4447 ◽

Cited By ~ 9

Author(s):

Manuel Antonetti ◽

Massimiliano Zappa

Keyword(s):

Spatial Distribution ◽

Expert Knowledge ◽

Hydrological Models ◽

Top Down ◽

Bottom Up ◽

Process Maps ◽

Uncertainty Sources ◽

Simulation Results ◽

Set Up ◽

Better Than

Abstract. Both modellers and experimentalists agree that using expert knowledge can improve the realism of conceptual hydrological models. However, their use of expert knowledge differs for each step in the modelling procedure, which involves hydrologically mapping the dominant runoff processes (DRPs) occurring on a given catchment, parameterising these processes within a model, and allocating its parameters. Modellers generally use very simplified mapping approaches, applying their knowledge in constraining the model by defining parameter and process relational rules. In contrast, experimentalists usually prefer to invest all their detailed and qualitative knowledge about processes in obtaining as realistic spatial distribution of DRPs as possible, and in defining narrow value ranges for each model parameter.Runoff simulations are affected by equifinality and numerous other uncertainty sources, which challenge the assumption that the more expert knowledge is used, the better will be the results obtained. To test for the extent to which expert knowledge can improve simulation results under uncertainty, we therefore applied a total of 60 modelling chain combinations forced by five rainfall datasets of increasing accuracy to four nested catchments in the Swiss Pre-Alps. These datasets include hourly precipitation data from automatic stations interpolated with Thiessen polygons and with the inverse distance weighting (IDW) method, as well as different spatial aggregations of Combiprecip, a combination between ground measurements and radar quantitative estimations of precipitation. To map the spatial distribution of the DRPs, three mapping approaches with different levels of involvement of expert knowledge were used to derive so-called process maps. Finally, both a typical modellers' top-down set-up relying on parameter and process constraints and an experimentalists' set-up based on bottom-up thinking and on field expertise were implemented using a newly developed process-based runoff generation module (RGM-PRO). To quantify the uncertainty originating from forcing data, process maps, model parameterisation, and parameter allocation strategy, an analysis of variance (ANOVA) was performed.The simulation results showed that (i) the modelling chains based on the most complex process maps performed slightly better than those based on less expert knowledge; (ii) the bottom-up set-up performed better than the top-down one when simulating short-duration events, but similarly to the top-down set-up when simulating long-duration events; (iii) the differences in performance arising from the different forcing data were due to compensation effects; and (iv) the bottom-up set-up can help identify uncertainty sources, but is prone to overconfidence problems, whereas the top-down set-up seems to accommodate uncertainties in the input data best. Overall, modellers' and experimentalists' concept of model realism differ. This means that the level of detail a model should have to accurately reproduce the DRPs expected must be agreed in advance.

Download Full-text

Visual attention strategies for target object detection

10.26686/wgtn.17067635 ◽

2021 ◽

Author(s):

◽

Ibrahim Mohammad Hussain Rahman

Keyword(s):

Visual Attention ◽

Object Detection ◽

Target Object ◽

Detection Accuracy ◽

Estimation Model ◽

Top Down ◽

Bottom Up ◽

Feature Map ◽

Low Level ◽

Visual Tasks

<p>The human visual attention system (HVA) encompasses a set of interconnected neurological modules that are responsible for analyzing visual stimuli by attending to those regions that are salient. Two contrasting biological mechanisms exist in the HVA systems; bottom-up, data-driven attention and top-down, task-driven attention. The former is mostly responsible for low-level instinctive behaviors, while the latter is responsible for performing complex visual tasks such as target object detection. Very few computational models have been proposed to model top-down attention, mainly due to three reasons. The first is that the functionality of top-down process involves many influential factors. The second reason is that there is a diversity in top-down responses from task to task. Finally, many biological aspects of the top-down process are not well understood yet. For the above reasons, it is difficult to come up with a generalized top-down model that could be applied to all high level visual tasks. Instead, this thesis addresses some outstanding issues in modelling top-down attention for one particular task, target object detection. Target object detection is an essential step for analyzing images to further perform complex visual tasks. Target object detection has not been investigated thoroughly when modelling top-down saliency and hence, constitutes the may domain application for this thesis. The thesis will investigate methods to model top-down attention through various high-level data acquired from images. Furthermore, the thesis will investigate different strategies to dynamically combine bottom-up and top-down processes to improve the detection accuracy, as well as the computational efficiency of the existing and new visual attention models. The following techniques and approaches are proposed to address the outstanding issues in modelling top-down saliency: 1. A top-down saliency model that weights low-level attentional features through contextual knowledge of a scene. The proposed model assigns weights to features of a novel image by extracting a contextual descriptor of the image. The contextual descriptor plays the role of tuning the weighting of low-level features to maximize detection accuracy. By incorporating context into the feature weighting mechanism we improve the quality of the assigned weights to these features. 2. Two modules of target features combined with contextual weighting to improve detection accuracy of the target object. In this proposed model, two sets of attentional feature weights are learned, one through context and the other through target features. When both sources of knowledge are used to model top-down attention, a drastic increase in detection accuracy is achieved in images with complex backgrounds and a variety of target objects. 3. A top-down and bottom-up attention combination model based on feature interaction. This model provides a dynamic way for combining both processes by formulating the problem as feature selection. The feature selection exploits the interaction between these features, yielding a robust set of features that would maximize both the detection accuracy and the overall efficiency of the system. 4. A feature map quality score estimation model that is able to accurately predict the detection accuracy score of any previously novel feature map without the need of groundtruth data. The model extracts various local, global, geometrical and statistical characteristic features from a feature map. These characteristics guide a regression model to estimate the quality of a novel map. 5. A dynamic feature integration framework for combining bottom-up and top-down saliencies at runtime. If the estimation model is able to predict the quality score of any novel feature map accurately, then it is possible to perform dynamic feature map integration based on the estimated value. We propose two frameworks for feature map integration using the estimation model. The proposed integration framework achieves higher human fixation prediction accuracy with minimum number of feature maps than that achieved by combining all feature maps. The proposed works in this thesis provide new directions in modelling top-down saliency for target object detection. In addition, dynamic approaches for top-down and bottom-up combination show considerable improvements over existing approaches in both efficiency and accuracy.</p>

Download Full-text

Bottom-up visual attention model for still image: a preliminary study

International Journal of Advances in Intelligent Informatics ◽

10.26555/ijain.v6i1.469 ◽

2020 ◽

Vol 6 (1) ◽

pp. 82

Author(s):

Adhi Prahara ◽

Murinto Murinto ◽

Dewi Pramudi Ismi

Keyword(s):

Visual Attention ◽

Object Detection ◽

Video Compression ◽

Saliency Map ◽

Bottom Up ◽

Attention Model ◽

Intrinsic Cues ◽

Preliminary Study ◽

Segmentation Image ◽

Human Visual Attention

The philosophy of human visual attention is scientifically explained in the field of cognitive psychology and neuroscience then computationally modeled in the field of computer science and engineering. Visual attention models have been applied in computer vision systems such as object detection, object recognition, image segmentation, image and video compression, action recognition, visual tracking, and so on. This work studies bottom-up visual attention, namely human fixation prediction and salient object detection models. The preliminary study briefly covers from the biological perspective of visual attention, including visual pathway, the theory of visual attention, to the computational model of bottom-up visual attention that generates saliency map. The study compares some models at each stage and observes whether the stage is inspired by biological architecture, concept, or behavior of human visual attention. From the study, the use of low-level features, center-surround mechanism, sparse representation, and higher-level guidance with intrinsic cues dominate the bottom-up visual attention approaches. The study also highlights the correlation between bottom-up visual attention and curiosity.

Download Full-text

Where the action could be: Speakers look at graspable objects and meaningful scene regions when describing potential actions

10.31234/osf.io/6uep5 ◽

2019 ◽

Author(s):

Gwendolyn L Rehrig ◽

Candace Elise Peacock ◽

Taylor Hayes ◽

Fernanda Ferreira ◽

John M. Henderson

Keyword(s):

Visual Attention ◽

Real World ◽

Language Production ◽

Native English Speakers ◽

English Speakers ◽

Linguistic Processing ◽

General Meaning ◽

New Evidence ◽

Shed Light ◽

Better Than

The world is visually complex, yet we can efficiently describe it by extracting the information that is most relevant to convey. How do the properties of real-world scenes help us decide where to look and what to say? Image salience has been the dominant explanation for what drives visual attention and production as we describe displays, but new evidence shows scene meaning predicts attention better than image salience. Here we investigated the relevance of one aspect of meaning, graspability (the grasping interactions objects in the scene afford), given that affordances have been implicated in both visual and linguistic processing. We quantified image salience, meaning, and graspability for real-world scenes. In three eyetracking experiments, native English speakers described possible actions that could be carried out in a scene. We hypothesized that graspability would preferentially guide attention due to its task-relevance. In two experiments using stimuli from a previous study, meaning explained visual attention better than graspability or salience did, and graspability explained attention better than salience. In a third experiment we quantified image salience, meaning, graspability, and reach-weighted graspability for scenes that depicted reachable spaces containing graspable objects. Graspability and meaning explained attention equally well in the third experiment, and both explained attention better than salience. We conclude that speakers use object graspability to allocate attention to plan descriptions when scenes depict graspable objects within reach, and otherwise rely more on general meaning. The results shed light on what aspects of meaning guide attention during scene viewing in language production tasks.

Download Full-text