scholarly journals Deep saliency models learn low-, mid-, and high-level features to predict scene attention

2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Taylor R. Hayes ◽  
John M. Henderson

AbstractDeep saliency models represent the current state-of-the-art for predicting where humans look in real-world scenes. However, for deep saliency models to inform cognitive theories of attention, we need to know how deep saliency models prioritize different scene features to predict where people look. Here we open the black box of three prominent deep saliency models (MSI-Net, DeepGaze II, and SAM-ResNet) using an approach that models the association between attention, deep saliency model output, and low-, mid-, and high-level scene features. Specifically, we measured the association between each deep saliency model and low-level image saliency, mid-level contour symmetry and junctions, and high-level meaning by applying a mixed effects modeling approach to a large eye movement dataset. We found that all three deep saliency models were most strongly associated with high-level and low-level features, but exhibited qualitatively different feature weightings and interaction patterns. These findings suggest that prominent deep saliency models are primarily learning image features associated with high-level scene meaning and low-level image saliency and highlight the importance of moving beyond simply benchmarking performance.

2021 ◽  
Author(s):  
Taylor R. Hayes ◽  
John M. Henderson

Abstract Deep saliency models represent the current state-of-the-art for predicting where humans look in real-world scenes. However, for deep saliency models to inform cognitive theories of attention, we need to know how deep saliency models predict where people look. Here we open the black box of deep saliency models using an approach that models the association between the output of 3 prominent deep saliency models (MSI-Net, DeepGaze II, and SAM-ResNet) and low-, mid-, and high-level scene features. Specifically, we measured the association between each deep saliency model and low-level image saliency, mid-level contour symmetry and junctions, and high-level meaning by applying a mixed effects modeling approach to a large eye movement dataset. We found that despite different architectures, training regimens, and loss functions, all three deep saliency models were most strongly associated with high-level meaning. These findings suggest that deep saliency models are primarily learning image features associated with scene meaning.


2013 ◽  
Vol 411-414 ◽  
pp. 1372-1376
Author(s):  
Wei Tin Lin ◽  
Shyi Chyi Cheng ◽  
Chih Lang Lin ◽  
Chen Kuei Yang

An approach to improve the regions of interesting (ROIs) selection accuracy automatically for medical images is proposed. The aim of the study is to select the most interesting regions of image features that good for diffuse objects detection or classification. We use the AHP (Analytic Hierarchy Process) to obtain physicians high-level diagnosis vectors and are clustered using the well-known K-Means clustering algorithm. The system also automatically extracts low-level image features for improving to detect liver diseases from ultrasound images. The weights of low-level features are adaptively updated according the feature variances in the class. Finally, the high-level diagnosis decision is made based on the high-level diagnosis vectors for the top K near neighbors from the medical experts classified database. Experimental results show the effectiveness of the system.


2021 ◽  
Vol 6 (2) ◽  
pp. 161-167
Author(s):  
Eduard Yakubchykt ◽  
◽  
Iryna Yurchak

Finding similar images on a visual sample is a difficult AI task, to solve which many works are devoted. The problem is to determine the essential properties of images of low and higher semantic level. Based on them, a vector of features is built, which will be used in the future to compare pairs of images. Each pair always includes an image from the collection and a sample image that the user is looking for. The result of the comparison is a quantity called the visual relativity of the images. Image properties are called features and are evaluated by calculation algorithms. Image features can be divided into low-level and high-level. Low-level features include basic colors, textures, shapes, significant elements of the whole image. These features are used as part of more complex recognition tasks. The main progress is in the definition of high-level features, which is associated with understanding the content of images. In this paper, research of modern algorithms is done for finding similar images in large multimedia databases. The main problems of determining high-level image features, algorithms of overcoming them and application of effective algorithms are described. The algorithms used to quickly determine the semantic content and improve the search accuracy of similar images are presented. The aim: The purpose of work is to conduct comparative analysis of modern image retrieval algorithms and retrieve its weakness and strength.


2018 ◽  
Vol 232 ◽  
pp. 01061
Author(s):  
Danhua Li ◽  
Xiaofeng Di ◽  
Xuan Qu ◽  
Yunfei Zhao ◽  
Honggang Kong

Pedestrian detection aims to localize and recognize every pedestrian instance in an image with a bounding box. The current state-of-the-art method is Faster RCNN, which is such a network that uses a region proposal network (RPN) to generate high quality region proposals, while Fast RCNN is used to classifiers extract features into corresponding categories. The contribution of this paper is integrated low-level features and high-level features into a Faster RCNN-based pedestrian detection framework, which efficiently increase the capacity of the feature. Through our experiments, we comprehensively evaluate our framework, on the Caltech pedestrian detection benchmark and our methods achieve state-of-the-art accuracy and present a competitive result on Caltech dataset.


Author(s):  
Кузьмина ◽  
N. Kuzmina ◽  
Анциферова ◽  
V. Antsiferova ◽  
Ягодкин ◽  
...  

The current state of the vehicle fleet in the aspects of road safety, the low level of develop-ment of the road network, a high level of harmful impact on the environment, lack of access to transport services in remote areas, the low-level of technical production base, ever-increasing cost of fuel and lubricants


2020 ◽  
pp. 12-17
Author(s):  
Yu. A. Zinnurova ◽  
E. M. Shironina

The method of scoring human resources capacity depending on the level of problem complexity and employment function and the level of responsibility has been presented. The complexity of professional activity is determined by the components: type of activity, variety of work, controllability of work performance, scale and complexity of management. The degree of responsibility is determined by the components: independence, initiative, commitment to the result, significance of the results of work. For each component, three characteristics have been determined, each of which is assigned a score: from 1 – minimum level, to 3 – a high level. The matrix model, formed on the basis of scoring data, with four groups of personnel has been proposed in the article: “Stars” (it is characterized by a high level of responsibility and high complexity of the labour functions perfomed), “Experts” (high complexity of work with a relatively low level of responsibility), «Soldiers» (high responsibility with a low level of complexity), «Cogs» (low level of responsibility and complexity of the work performed). Assessment of human resources capacity, determining the current state, is the basis for the development of management decisions and implementation of measures aimed at improving the efficiency of the organization through better use of human resources potential.


Author(s):  
Kalaivani Anbarasan ◽  
Chitrakala S.

The content based image retrieval system retrieves relevant images based on image features. The lack of performance in the content based image retrieval system is due to the semantic gap. Image annotation is a solution to bridge the semantic gap between low-level content features and high-level semantic concepts Image annotation is defined as tagging images with a single or multiple keywords based on low-level image features. The major issue in building an effective annotation framework is the integration of both low level visual features and high-level textual information into an annotation model. This chapter focus on new statistical-based image annotation model towards semantic based image retrieval system. A multi-label image annotation with multi-level tagging system is introduced to annotate image regions with class labels and extract color, location and topological tags of segmented image regions. The proposed method produced encouraging results and the experimental results outperformed state-of-the-art methods


Author(s):  
Alan Wee-Chung Liew ◽  
Ngai-Fong Law

With the rapid growth of Internet and multimedia systems, the use of visual information has increased enormously, such that indexing and retrieval techniques have become important. Historically, images are usually manually annotated with metadata such as captions or keywords (Chang & Hsu, 1992). Image retrieval is then performed by searching images with similar keywords. However, the keywords used may differ from one person to another. Also, many keywords can be used for describing the same image. Consequently, retrieval results are often inconsistent and unreliable. Due to these limitations, there is a growing interest in content-based image retrieval (CBIR). These techniques extract meaningful information or features from an image so that images can be classified and retrieved automatically based on their contents. Existing image retrieval systems such as QBIC and Virage extract the so-called low-level features such as color, texture and shape from an image in the spatial domain for indexing. Low-level features sometimes fail to represent high level semantic image features as they are subjective and depend greatly upon user preferences. To bridge the gap, a top-down retrieval approach involving high level knowledge can complement these low-level features. This articles deals with various aspects of CBIR. This includes bottom-up feature- based image retrieval in both the spatial and compressed domains, as well as top-down task-based image retrieval using prior knowledge.


2019 ◽  
Author(s):  
Taylor R. Hayes ◽  
John M. Henderson

During scene viewing, is attention primarily guided by low-level image salience or by high-level semantics? Recent evidence suggests that overt attention in scenes is primarily guided by semantic features. Here we examined whether the attentional priority given to meaningful scene regions is involuntary. Participants completed a scene-independent visual search task in which they searched for superimposed letter targets whose locations were orthogonal to both the underlying scene semantics and image salience. Critically, the analyzed scenes contained no targets, and participants were unaware of this manipulation. We then directly compared how well the distribution of semantic features and image salience accounted for the overall distribution of overt attention. The results showed that even when the task was completely independent from the scene semantics and image salience, semantics explained significantly more variance in attention than image salience and more than expected by chance. This suggests that salient image features were effectively suppressed in favor of task goals, but semantic features were not suppressed. The semantic bias was present from the very first fixation and increased non-monotonically over the course of viewing. These findings suggest that overt attention in scenes is involuntarily guided by scene semantics.


Sign in / Sign up

Export Citation Format

Share Document