Pre-Processing Filter Reflecting Human Visual Perception to Improve Saliency Detection Performance

Kyungjun Lee; Seungwoo Wee; Jechang Jeong

doi:10.3390/electronics10232892

Pre-Processing Filter Reflecting Human Visual Perception to Improve Saliency Detection Performance

Electronics ◽

10.3390/electronics10232892 ◽

2021 ◽

Vol 10 (23) ◽

pp. 2892

Author(s):

Kyungjun Lee ◽

Seungwoo Wee ◽

Jechang Jeong

Keyword(s):

Saliency Detection ◽

Visual Saliency ◽

Ground Truth ◽

Bilateral Filter ◽

Input Image ◽

Human Visual Perception ◽

Difference Of Gaussians ◽

Surrounding Environment ◽

Benchmark Datasets ◽

Previous State

Salient object detection is a method of finding an object within an image that a person determines to be important and is expected to focus on. Various features are used to compute the visual saliency, and in general, the color and luminance of the scene are widely used among the spatial features. However, humans perceive the same color and luminance differently depending on the influence of the surrounding environment. As the human visual system (HVS) operates through a very complex mechanism, both neurobiological and psychological aspects must be considered for the accurate detection of salient objects. To reflect this characteristic in the saliency detection process, we have proposed two pre-processing methods to apply to the input image. First, we applied a bilateral filter to improve the segmentation results by smoothing the image so that only the overall context of the image remains while preserving the important borders of the image. Second, although the amount of light is the same, it can be perceived with a difference in the brightness owing to the influence of the surrounding environment. Therefore, we applied oriented difference-of-Gaussians (ODOG) and locally normalized ODOG (LODOG) filters that adjust the input image by predicting the brightness as perceived by humans. Experiments on five public benchmark datasets for which ground truth exists show that our proposed method further improves the performance of previous state-of-the-art methods.

Download Full-text

R³Net: Recurrent Residual Refinement Network for Saliency Detection

Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2018/95 ◽

2018 ◽

Cited By ~ 72

Author(s):

Zijun Deng ◽

Xiaowei Hu ◽

Lei Zhu ◽

Xuemiao Xu ◽

Jing Qin ◽

...

Keyword(s):

Saliency Detection ◽

Ground Truth ◽

Input Image ◽

Convolutional Network ◽

Low Level ◽

Saliency Maps ◽

Saliency Prediction ◽

Benchmark Datasets ◽

Salient Regions ◽

High Level

Saliency detection is a fundamental yet challenging task in computer vision, aiming at highlighting the most visually distinctive objects in an image. We propose a novel recurrent residual refinement network (R^3Net) equipped with residual refinement blocks (RRBs) to more accurately detect salient regions of an input image. Our RRBs learn the residual between the intermediate saliency prediction and the ground truth by alternatively leveraging the low-level integrated features and the high-level integrated features of a fully convolutional network (FCN). While the low-level integrated features are capable of capturing more saliency details, the high-level integrated features can reduce non-salient regions in the intermediate prediction. Furthermore, the RRBs can obtain complementary saliency information of the intermediate prediction, and add the residual into the intermediate prediction to refine the saliency maps. We evaluate the proposed R^3Net on five widely-used saliency detection benchmarks by comparing it with 16 state-of-the-art saliency detectors. Experimental results show that our network outperforms our competitors in all the benchmark datasets.

Download Full-text

Visual Saliency and Perceptual Quality Assessment of 3D Meshes

Advances in Multimedia and Interactive Technologies - Intelligent Multidimensional Data and Image Processing ◽

10.4018/978-1-5225-5246-8.ch003 ◽

2018 ◽

pp. 38-115

Author(s):

Anass Nouri ◽

Christophe Charrier ◽

Olivier Lezoray

Keyword(s):

Quality Assessment ◽

State Of The Art ◽

Saliency Detection ◽

Visual Saliency ◽

Ground Truth ◽

The State ◽

Perceptual Quality ◽

Adaptive Smoothing ◽

Definition Of ◽

Mesh Database

This chapter concerns the visual saliency and the perceptual quality assessment of 3D meshes. Firstly, the chapter proposes a definition of visual saliency and describes the state-of-the-art methods for its detection on 3D mesh surfaces. A focus is made on a recent model of visual saliency detection for 3D colored and non-colored meshes whose results are compared with a ground-truth saliency as well as with the literature's methods. Since this model is able to estimate the visual saliency on 3D colored meshes, named colorimetric saliency, a description of the construction of a 3D colored mesh database that was used to assess its relevance is presented. The authors also describe three applications of the detailed model that respond to the problems of viewpoint selection, adaptive simplification and adaptive smoothing. Secondly, two perceptual quality assessment metrics for 3D non-colored meshes are described, analyzed, and compared with the state-of-the-art approaches.

Download Full-text

Structural Similarity Loss for Learning to Fuse Multi-Focus Images

Sensors ◽

10.3390/s20226647 ◽

2020 ◽

Vol 20 (22) ◽

pp. 6647

Author(s):

Xiang Yan ◽

Syed Zulqarnain Gilani ◽

Hanlin Qin ◽

Ajmal Mian

Keyword(s):

Ground Truth ◽

Structural Similarity ◽

Input Image ◽

Test Time ◽

Suggested Approach ◽

Extensive Evaluation ◽

Fused Image ◽

Benchmark Datasets ◽

Focus Image ◽

Image Pairs

Convolutional neural networks have recently been used for multi-focus image fusion. However, some existing methods have resorted to adding Gaussian blur to focused images, to simulate defocus, thereby generating data (with ground-truth) for supervised learning. Moreover, they classify pixels as ‘focused’ or ‘defocused’, and use the classified results to construct the fusion weight maps. This then necessitates a series of post-processing steps. In this paper, we present an end-to-end learning approach for directly predicting the fully focused output image from multi-focus input image pairs. The suggested approach uses a CNN architecture trained to perform fusion, without the need for ground truth fused images. The CNN exploits the image structural similarity (SSIM) to calculate the loss, a metric that is widely accepted for fused image quality evaluation. What is more, we also use the standard deviation of a local window of the image to automatically estimate the importance of the source images in the final fused image when designing the loss function. Our network can accept images of variable sizes and hence, we are able to utilize real benchmark datasets, instead of simulated ones, to train our network. The model is a feed-forward, fully convolutional neural network that can process images of variable sizes during test time. Extensive evaluation on benchmark datasets show that our method outperforms, or is comparable with, existing state-of-the-art techniques on both objective and subjective benchmarks.

Download Full-text

Visual Saliency Prediction Based on Deep Learning

Information ◽

10.3390/info10080257 ◽

2019 ◽

Vol 10 (8) ◽

pp. 257 ◽

Cited By ~ 7

Author(s):

Bashir Ghariba ◽

Mohamed S. Shehata ◽

Peter McGuire

Keyword(s):

Deep Learning ◽

Saliency Detection ◽

Visual Saliency ◽

Semantic Segmentation ◽

Input Image ◽

Human Eye ◽

Proposed Model ◽

Global Accuracy ◽

Visual Saliency Detection ◽

Deep Learning Model

Human eye movement is one of the most important functions for understanding our surroundings. When a human eye processes a scene, it quickly focuses on dominant parts of the scene, commonly known as a visual saliency detection or visual attention prediction. Recently, neural networks have been used to predict visual saliency. This paper proposes a deep learning encoder-decoder architecture, based on a transfer learning technique, to predict visual saliency. In the proposed model, visual features are extracted through convolutional layers from raw images to predict visual saliency. In addition, the proposed model uses the VGG-16 network for semantic segmentation, which uses a pixel classification layer to predict the categorical label for every pixel in an input image. The proposed model is applied to several datasets, including TORONTO, MIT300, MIT1003, and DUT-OMRON, to illustrate its efficiency. The results of the proposed model are quantitatively and qualitatively compared to classic and state-of-the-art deep learning models. Using the proposed deep learning model, a global accuracy of up to 96.22% is achieved for the prediction of visual saliency.

Download Full-text

The Study of Randomized Visual Saliency Detection Algorithm

Computational and Mathematical Methods in Medicine ◽

10.1155/2013/380245 ◽

2013 ◽

Vol 2013 ◽

pp. 1-9

Author(s):

Yuantao Chen ◽

Weihong Xu ◽

Fangjun Kuang ◽

Shangbing Gao

Keyword(s):

Image Segmentation ◽

Detection Method ◽

Saliency Detection ◽

Visual Saliency ◽

Detection Algorithm ◽

Saliency Map ◽

Input Image ◽

Memory Space ◽

Segmentation Process ◽

Visual Saliency Detection

Image segmentation process for high quality visual saliency map is very dependent on the existing visual saliency metrics. It is mostly only get sketchy effect of saliency map, and roughly based visual saliency map will affect the image segmentation results. The paper had presented the randomized visual saliency detection algorithm. The randomized visual saliency detection method can quickly generate the same size as the original input image and detailed results of the saliency map. The randomized saliency detection method can be applied to real-time requirements for image content-based scaling saliency results map. The randomization method for fast randomized video saliency area detection, the algorithm only requires a small amount of memory space can be detected detailed oriented visual saliency map, the presented results are shown that the method of visual saliency map used in image after the segmentation process can be an ideal segmentation results.

Download Full-text

R²MRF: Defocus Blur Detection via Recurrently Refining Multi-Scale Residual Features

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i07.6884 ◽

2020 ◽

Vol 34 (07) ◽

pp. 12063-12070

Author(s):

Chang Tang ◽

Xinwang Liu ◽

Xinzhong Zhu ◽

En Zhu ◽

Kun Sun ◽

...

Keyword(s):

Semantic Information ◽

Ground Truth ◽

Input Image ◽

Convolutional Network ◽

Multi Scale ◽

Blur Detection ◽

Defocus Blur ◽

Potential Applications ◽

Benchmark Datasets ◽

Background Clutter

Defocus blur detection aims to separate the in-focus and out-of-focus regions in an image. Although attracting more and more attention due to its remarkable potential applications, there are still several challenges for accurate defocus blur detection, such as the interference of background clutter, sensitivity to scales and missing boundary details of defocus blur regions. In order to address these issues, we propose a deep neural network which Recurrently Refines Multi-scale Residual Features (R2MRF) for defocus blur detection. We firstly extract multi-scale deep features by utilizing a fully convolutional network. For each layer, we design a novel recurrent residual refinement branch embedded with multiple residual refinement modules (RRMs) to more accurately detect blur regions from the input image. Considering that the features from bottom layers are able to capture rich low-level features for details preservation while the features from top layers are capable of characterizing the semantic information for locating blur regions, we aggregate the deep features from different layers to learn the residual between the intermediate prediction and the ground truth for each recurrent step in each residual refinement branch. Since the defocus degree is sensitive to image scales, we finally fuse the side output of each branch to obtain the final blur detection map. We evaluate the proposed network on two commonly used defocus blur detection benchmark datasets by comparing it with other 11 state-of-the-art methods. Extensive experimental results with ablation studies demonstrate that R2MRF consistently and significantly outperforms the competitors in terms of both efficiency and accuracy.

Download Full-text

Saliency Detection Using Sparse and Nonlinear Feature Representation

The Scientific World JOURNAL ◽

10.1155/2014/137349 ◽

2014 ◽

Vol 2014 ◽

pp. 1-16 ◽

Cited By ~ 1

Author(s):

Shahzad Anwar ◽

Qingjie Zhao ◽

Muhammad Farhan Manzoor ◽

Saqib Ishaq Khan

Keyword(s):

Saliency Detection ◽

Visual Saliency ◽

Input Image ◽

Image Features ◽

Feature Representation ◽

Weighting Coefficient ◽

Dual Representation ◽

Representation Scheme ◽

Nonlinear Feature ◽

Sparse Features

An important aspect of visual saliency detection is how features that form an input image are represented. A popular theory supports sparse feature representation, an image being represented with a basis dictionary having sparse weighting coefficient. Another method uses a nonlinear combination of image features for representation. In our work, we combine the two methods and propose a scheme that takes advantage of both sparse and nonlinear feature representation. To this end, we use independent component analysis (ICA) and covariant matrices, respectively. To compute saliency, we use a biologically plausible center surround difference (CSD) mechanism. Our sparse features are adaptive in nature; the ICA basis function are learnt at every image representation, rather than being fixed. We show that Adaptive Sparse Features when used with a CSD mechanism yield better results compared to fixed sparse representations. We also show that covariant matrices consisting of nonlinear integration of color information alone are sufficient to efficiently estimate saliency from an image. The proposed dual representation scheme is then evaluated against human eye fixation prediction, response to psychological patterns, and salient object detection on well-known datasets. We conclude that having two forms of representation compliments one another and results in better saliency detection.

Download Full-text