scholarly journals Patterns of Saliency and Semantic Features Distinguish Gaze of Expert and Novice Viewers of Surveillance Footage

2022 ◽  
Author(s):  
Yujia Peng ◽  
Joseph M Burling ◽  
Greta K Todorova ◽  
Catherine Neary ◽  
Frank E Pollick ◽  
...  

When viewing the actions of others, we not only see patterns of body movements, but we also "see" the intentions and social relations of people, enabling us to understand the surrounding social environment. Previous research has shown that experienced forensic examiners, Closed Circuit Television (CCTV) operators, convey superior performance in identifying and predicting hostile intentions from surveillance footages than novices. However, it remains largely unknown what visual content CCTV operators actively attend to when viewing surveillance footage, and whether CCTV operators develop different strategies for active information seeking from what novices do. In this study, we conducted computational analysis for the gaze-centered stimuli captured by experienced CCTV operators and novices' eye movements when they viewed the same surveillance footage. These analyses examined how low-level visual features and object-level semantic features contribute to attentive gaze patterns associated with the two groups of participants. Low-level image features were extracted by a visual saliency model, whereas object-level semantic features were extracted by a deep convolutional neural network (DCNN), AlexNet, from gaze-centered regions. We found that visual regions attended by CCTV operators versus by novices can be reliably classified by patterns of saliency features and DCNN features. Additionally, CCTV operators showed greater inter-subject correlation in attending to saliency features and DCNN features than did novices. These results suggest that the looking behavior of CCTV operators differs from novices by actively attending to different patterns of saliency and semantic features in both low-level and high-level visual processing. Expertise in selectively attending to informative features at different levels of visual hierarchy may play an important role in facilitating the efficient detection of social relationships between agents and the prediction of harmful intentions.

Author(s):  
Nannan Li ◽  
Zhenzhong Chen

In this paper, a novel image captioning approach is proposed to describe the content of images. Inspired by the visual processing of our cognitive system, we propose a visual-semantic LSTM model to locate the attention objects with their low-level features in the visual cell, and then successively extract high-level semantic features in the semantic cell. In addition, a state perturbation term is introduced to the word sampling strategy in the REINFORCE based method to explore proper vocabularies in the training process. Experimental results on MS COCO and Flickr30K validate the effectiveness of our approach when compared to the state-of-the-art methods.


2019 ◽  
Author(s):  
Taylor R. Hayes ◽  
John M. Henderson

During scene viewing, is attention primarily guided by low-level image salience or by high-level semantics? Recent evidence suggests that overt attention in scenes is primarily guided by semantic features. Here we examined whether the attentional priority given to meaningful scene regions is involuntary. Participants completed a scene-independent visual search task in which they searched for superimposed letter targets whose locations were orthogonal to both the underlying scene semantics and image salience. Critically, the analyzed scenes contained no targets, and participants were unaware of this manipulation. We then directly compared how well the distribution of semantic features and image salience accounted for the overall distribution of overt attention. The results showed that even when the task was completely independent from the scene semantics and image salience, semantics explained significantly more variance in attention than image salience and more than expected by chance. This suggests that salient image features were effectively suppressed in favor of task goals, but semantic features were not suppressed. The semantic bias was present from the very first fixation and increased non-monotonically over the course of viewing. These findings suggest that overt attention in scenes is involuntarily guided by scene semantics.


2020 ◽  
Vol 11 (1) ◽  
pp. 103
Author(s):  
Yadgar I. Abdulkarim ◽  
Fahmi F. Muhammadsharif ◽  
Mehmet Bakır ◽  
Halgurd N. Awl ◽  
Muharrem Karaaslan ◽  
...  

In this work, a new design for a real-time noninvasive metamaterial sensor, based on a corona-shaped resonator, is proposed. The sensor was designed numerically and fabricated experimentally in order to be utilized for efficient detection of glucose in aqueous solutions such as water and blood. The sensor was inspired by a corona in-plane-shaped design with the presumption that its circular structure might produce a broader interaction of the electromagnetic waves with the glucose samples. A clear shift in the resonance frequency was observed for various glucose samples, which implies that the proposed sensor has a good sensitivity and can be easily utilized to distinguish any glucose concentration, even though their dielectric coefficients are close. Results showed a superior performance in terms of resonance frequency shift (1.51 GHz) and quality factor (246) compared to those reported in the literature. The transmission variation level ∆|S21| was investigated for glucose concentration in both water and blood. The sensing mechanism was elaborated through the surface current, electric field and magnetic field distributions on the corona resonator. The proposed metamaterials sensor is considered to be a promising candidate for biosensor and medicine applications in human glycaemia monitoring.


2013 ◽  
Vol 765-767 ◽  
pp. 1401-1405
Author(s):  
Chi Zhang ◽  
Wei Qiang Wang

Object-level saliency detection is an important branch of visual saliency. In this paper, we propose a novel method which can conduct object-level saliency detection in both images and videos in a unified way. We employ a more effective spatial compactness assumption to measure saliency instead of the popular contrast assumption. In addition, we present a combination framework which integrates multiple saliency maps generated in different feature maps. The proposed algorithm can automatically select saliency maps of high quality according to the quality evaluation score we define. The experimental results demonstrate that the proposed method outperforms all state-of-the-art methods on both of the datasets of still images and video sequences.


2018 ◽  
Vol 10 (12) ◽  
pp. 1934 ◽  
Author(s):  
Bao-Di Liu ◽  
Wen-Yang Xie ◽  
Jie Meng ◽  
Ye Li ◽  
Yanjiang Wang

In recent years, the collaborative representation-based classification (CRC) method has achieved great success in visual recognition by directly utilizing training images as dictionary bases. However, it describes a test sample with all training samples to extract shared attributes and does not consider the representation of the test sample with the training samples in a specific class to extract the class-specific attributes. For remote-sensing images, both the shared attributes and class-specific attributes are important for classification. In this paper, we propose a hybrid collaborative representation-based classification approach. The proposed method is capable of improving the performance of classifying remote-sensing images by embedding the class-specific collaborative representation to conventional collaborative representation-based classification. Moreover, we extend the proposed method to arbitrary kernel space to explore the nonlinear characteristics hidden in remote-sensing image features to further enhance classification performance. Extensive experiments on several benchmark remote-sensing image datasets were conducted and clearly demonstrate the superior performance of our proposed algorithm to state-of-the-art approaches.


Author(s):  
D. Duarte ◽  
F. Nex ◽  
N. Kerle ◽  
G. Vosselman

Urban search and rescue (USaR) teams require a fast and thorough building damage assessment, to focus their rescue efforts accordingly. Unmanned aerial vehicles (UAV) are able to capture relevant data in a short time frame and survey otherwise inaccessible areas after a disaster, and have thus been identified as useful when coupled with RGB cameras for façade damage detection. Existing literature focuses on the extraction of 3D and/or image features as cues for damage. However, little attention has been given to the efficiency of the proposed methods which hinders its use in an urban search and rescue context. The framework proposed in this paper aims at a more efficient façade damage detection using UAV multi-view imagery. This was achieved directing all damage classification computations only to the image regions containing the façades, hence discarding the irrelevant areas of the acquired images and consequently reducing the time needed for such task. To accomplish this, a three-step approach is proposed: i) building extraction from the sparse point cloud computed from the nadir images collected in an initial flight; ii) use of the latter as proxy for façade location in the oblique images captured in subsequent flights, and iii) selection of the façade image regions to be fed to a damage classification routine. The results show that the proposed framework successfully reduces the extracted façade image regions to be assessed for damage 6 fold, hence increasing the efficiency of subsequent damage detection routines. The framework was tested on a set of UAV multi-view images over a neighborhood of the city of L’Aquila, Italy, affected in 2009 by an earthquake.


Sign in / Sign up

Export Citation Format

Share Document