visual saliency
Recently Published Documents


TOTAL DOCUMENTS

994
(FIVE YEARS 243)

H-INDEX

46
(FIVE YEARS 7)

Author(s):  
Shilpa Pandey ◽  
Gaurav Harit

In this article, we address the problem of localizing text and symbolic annotations on the scanned image of a printed document. Previous approaches have considered the task of annotation extraction as binary classification into printed and handwritten text. In this work, we further subcategorize the annotations as underlines, encirclements, inline text, and marginal text. We have collected a new dataset of 300 documents constituting all classes of annotations marked around or in-between printed text. Using the dataset as a benchmark, we report the results of two saliency formulations—CRF Saliency and Discriminant Saliency, for predicting salient patches, which can correspond to different types of annotations. We also compare our work with recent semantic segmentation techniques using deep models. Our analysis shows that Discriminant Saliency can be considered as the preferred approach for fast localization of patches containing different types of annotations. The saliency models were learned on a small dataset, but still, give comparable performance to the deep networks for pixel-level semantic segmentation. We show that saliency-based methods give better outcomes with limited annotated data compared to more sophisticated segmentation techniques that require a large training set to learn the model.


2022 ◽  
Vol 15 ◽  
Author(s):  
Ying Yu ◽  
Jun Qian ◽  
Qinglong Wu

This article proposes a bottom-up visual saliency model that uses the wavelet transform to conduct multiscale analysis and computation in the frequency domain. First, we compute the multiscale magnitude spectra by performing a wavelet transform to decompose the magnitude spectrum of the discrete cosine coefficients of an input image. Next, we obtain multiple saliency maps of different spatial scales through an inverse transformation from the frequency domain to the spatial domain, which utilizes the discrete cosine magnitude spectra after multiscale wavelet decomposition. Then, we employ an evaluation function to automatically select the two best multiscale saliency maps. A final saliency map is generated via an adaptive integration of the two selected multiscale saliency maps. The proposed model is fast, efficient, and can simultaneously detect salient regions or objects of different sizes. It outperforms state-of-the-art bottom-up saliency approaches in the experiments of psychophysical consistency, eye fixation prediction, and saliency detection for natural images. In addition, the proposed model is applied to automatic ship detection in optical satellite images. Ship detection tests on satellite data of visual optical spectrum not only demonstrate our saliency model's effectiveness in detecting small and large salient targets but also verify its robustness against various sea background disturbances.


2022 ◽  
Author(s):  
Yujia Peng ◽  
Joseph M Burling ◽  
Greta K Todorova ◽  
Catherine Neary ◽  
Frank E Pollick ◽  
...  

When viewing the actions of others, we not only see patterns of body movements, but we also "see" the intentions and social relations of people, enabling us to understand the surrounding social environment. Previous research has shown that experienced forensic examiners, Closed Circuit Television (CCTV) operators, convey superior performance in identifying and predicting hostile intentions from surveillance footages than novices. However, it remains largely unknown what visual content CCTV operators actively attend to when viewing surveillance footage, and whether CCTV operators develop different strategies for active information seeking from what novices do. In this study, we conducted computational analysis for the gaze-centered stimuli captured by experienced CCTV operators and novices' eye movements when they viewed the same surveillance footage. These analyses examined how low-level visual features and object-level semantic features contribute to attentive gaze patterns associated with the two groups of participants. Low-level image features were extracted by a visual saliency model, whereas object-level semantic features were extracted by a deep convolutional neural network (DCNN), AlexNet, from gaze-centered regions. We found that visual regions attended by CCTV operators versus by novices can be reliably classified by patterns of saliency features and DCNN features. Additionally, CCTV operators showed greater inter-subject correlation in attending to saliency features and DCNN features than did novices. These results suggest that the looking behavior of CCTV operators differs from novices by actively attending to different patterns of saliency and semantic features in both low-level and high-level visual processing. Expertise in selectively attending to informative features at different levels of visual hierarchy may play an important role in facilitating the efficient detection of social relationships between agents and the prediction of harmful intentions.


2022 ◽  
Vol 15 (0) ◽  
pp. 1-9
Author(s):  
ZHAO Peng-peng ◽  
◽  
◽  
LI Shu-zhong ◽  
LI Xun ◽  
...  

2021 ◽  
Vol 12 (1) ◽  
pp. 309
Author(s):  
Fei Yan ◽  
Cheng Chen ◽  
Peng Xiao ◽  
Siyu Qi ◽  
Zhiliang Wang ◽  
...  

The human attention mechanism can be understood and simulated by closely associating the saliency prediction task to neuroscience and psychology. Furthermore, saliency prediction is widely used in computer vision and interdisciplinary subjects. In recent years, with the rapid development of deep learning, deep models have made amazing achievements in saliency prediction. Deep learning models can automatically learn features, thus solving many drawbacks of the classic models, such as handcrafted features and task settings, among others. Nevertheless, the deep models still have some limitations, for example in tasks involving multi-modality and semantic understanding. This study focuses on summarizing the relevant achievements in the field of saliency prediction, including the early neurological and psychological mechanisms and the guiding role of classic models, followed by the development process and data comparison of classic and deep saliency prediction models. This study also discusses the relationship between the model and human vision, as well as the factors that cause the semantic gaps, the influences of attention in cognitive research, the limitations of the saliency model, and the emerging applications, to provide new saliency predictions for follow-up work and the necessary help and advice.


Electronics ◽  
2021 ◽  
Vol 11 (1) ◽  
pp. 33
Author(s):  
Chaowei Duan ◽  
Yiliu Liu ◽  
Changda Xing ◽  
Zhisheng Wang

An efficient method for the infrared and visible image fusion is presented using truncated Huber penalty function smoothing and visual saliency based threshold optimization. The method merges complementary information from multimodality source images into a more informative composite image in two-scale domain, in which the significant objects/regions are highlighted and rich feature information is preserved. Firstly, source images are decomposed into two-scale image representations, namely, the approximate and residual layers, using truncated Huber penalty function smoothing. Benefiting from the edge- and structure-preserving characteristics, the significant objects and regions in the source images are effectively extracted without halo artifacts around the edges. Secondly, a visual saliency based threshold optimization fusion rule is designed to fuse the approximate layers aiming to highlight the salient targets in infrared images and remain the high-intensity regions in visible images. The sparse representation based fusion rule is adopted to fuse the residual layers with the goal of acquiring rich detail texture information. Finally, combining the fused approximate and residual layers reconstructs the fused image with more natural visual effects. Sufficient experimental results demonstrate that the proposed method can achieve comparable or superior performances compared with several state-of-the-art fusion methods in visual results and objective assessments.


Sensors ◽  
2021 ◽  
Vol 22 (1) ◽  
pp. 40
Author(s):  
Chaowei Duan ◽  
Changda Xing ◽  
Yiliu Liu ◽  
Zhisheng Wang

As a powerful technique to merge complementary information of original images, infrared (IR) and visible image fusion approaches are widely used in surveillance, target detecting, tracking, and biological recognition, etc. In this paper, an efficient IR and visible image fusion method is proposed to simultaneously enhance the significant targets/regions in all source images and preserve rich background details in visible images. The multi-scale representation based on the fast global smoother is firstly used to decompose source images into the base and detail layers, aiming to extract the salient structure information and suppress the halos around the edges. Then, a target-enhanced parallel Gaussian fuzzy logic-based fusion rule is proposed to merge the base layers, which can avoid the brightness loss and highlight significant targets/regions. In addition, the visual saliency map-based fusion rule is designed to merge the detail layers with the purpose of obtaining rich details. Finally, the fused image is reconstructed. Extensive experiments are conducted on 21 image pairs and a Nato-camp sequence (32 image pairs) to verify the effectiveness and superiority of the proposed method. Compared with several state-of-the-art methods, experimental results demonstrate that the proposed method can achieve more competitive or superior performances according to both the visual results and objective evaluation.


2021 ◽  
Author(s):  
◽  
Aisha Ajmal

<p>The human vision system (HVS) collects a huge amount of information and performs a variety of biological mechanisms to select relevant information. Computational models based on these biological mechanisms are used in machine vision to select interesting or salient regions in the images for application in scene analysis, object detection and object tracking.  Different object tracking techniques have been proposed often using complex processing methods. On the other hand, attention-based computational models have shown significant performance advantages in various applications. We hypothesise the integration of a visual attention model with object tracking can be effective in increasing the performance by reducing the detection complexity in challenging environments such as illumination change, occlusion, and camera moving.  The overall objective of this thesis is to develop a visual saliency based object tracker that alternates between targets using a measure of current uncertainty derived from a Kalman filter. This thesis presents the results by showing the effectiveness of the tracker using the mean square error when compared to a tracker without the uncertainty mechanism.   Specific colour spaces can contribute to the identification of salient regions. The investigation is done between the non-uniform red, green and blue (RGB) derived opponencies with the hue, saturation and value (HSV) colour space using video information. The main motivation for this particular comparison is to improve the quality of saliency detection in challenging situations such as lighting changes. Precision-Recall curves are used to compare the colour spaces using pyramidal and non-pyramidal saliency models.</p>


2021 ◽  
Author(s):  
◽  
Aisha Ajmal

<p>The human vision system (HVS) collects a huge amount of information and performs a variety of biological mechanisms to select relevant information. Computational models based on these biological mechanisms are used in machine vision to select interesting or salient regions in the images for application in scene analysis, object detection and object tracking.  Different object tracking techniques have been proposed often using complex processing methods. On the other hand, attention-based computational models have shown significant performance advantages in various applications. We hypothesise the integration of a visual attention model with object tracking can be effective in increasing the performance by reducing the detection complexity in challenging environments such as illumination change, occlusion, and camera moving.  The overall objective of this thesis is to develop a visual saliency based object tracker that alternates between targets using a measure of current uncertainty derived from a Kalman filter. This thesis presents the results by showing the effectiveness of the tracker using the mean square error when compared to a tracker without the uncertainty mechanism.   Specific colour spaces can contribute to the identification of salient regions. The investigation is done between the non-uniform red, green and blue (RGB) derived opponencies with the hue, saturation and value (HSV) colour space using video information. The main motivation for this particular comparison is to improve the quality of saliency detection in challenging situations such as lighting changes. Precision-Recall curves are used to compare the colour spaces using pyramidal and non-pyramidal saliency models.</p>


Sign in / Sign up

Export Citation Format

Share Document