scholarly journals R³Net: Recurrent Residual Refinement Network for Saliency Detection

Author(s):  
Zijun Deng ◽  
Xiaowei Hu ◽  
Lei Zhu ◽  
Xuemiao Xu ◽  
Jing Qin ◽  
...  

Saliency detection is a fundamental yet challenging task in computer vision, aiming at highlighting the most visually distinctive objects in an image. We propose a novel recurrent residual refinement network (R^3Net) equipped with residual refinement blocks (RRBs) to more accurately detect salient regions of an input image. Our RRBs learn the residual between the intermediate saliency prediction and the ground truth by alternatively leveraging the low-level integrated features and the high-level integrated features of a fully convolutional network (FCN). While the low-level integrated features are capable of capturing more saliency details, the high-level integrated features can reduce non-salient regions in the intermediate prediction. Furthermore, the RRBs can obtain complementary saliency information of the intermediate prediction, and add the residual into the intermediate prediction to refine the saliency maps. We evaluate the proposed R^3Net on five widely-used saliency detection benchmarks by comparing it with 16 state-of-the-art saliency detectors. Experimental results show that our network outperforms our competitors in all the benchmark datasets.

Electronics ◽  
2021 ◽  
Vol 10 (23) ◽  
pp. 2892
Author(s):  
Kyungjun Lee ◽  
Seungwoo Wee ◽  
Jechang Jeong

Salient object detection is a method of finding an object within an image that a person determines to be important and is expected to focus on. Various features are used to compute the visual saliency, and in general, the color and luminance of the scene are widely used among the spatial features. However, humans perceive the same color and luminance differently depending on the influence of the surrounding environment. As the human visual system (HVS) operates through a very complex mechanism, both neurobiological and psychological aspects must be considered for the accurate detection of salient objects. To reflect this characteristic in the saliency detection process, we have proposed two pre-processing methods to apply to the input image. First, we applied a bilateral filter to improve the segmentation results by smoothing the image so that only the overall context of the image remains while preserving the important borders of the image. Second, although the amount of light is the same, it can be perceived with a difference in the brightness owing to the influence of the surrounding environment. Therefore, we applied oriented difference-of-Gaussians (ODOG) and locally normalized ODOG (LODOG) filters that adjust the input image by predicting the brightness as perceived by humans. Experiments on five public benchmark datasets for which ground truth exists show that our proposed method further improves the performance of previous state-of-the-art methods.


Author(s):  
Lai Jiang ◽  
Zhe Wang ◽  
Mai Xu ◽  
Zulin Wang

The transformed domain fearures of images show effectiveness in distinguishing salient and non-salient regions. In this paper, we propose a novel deep complex neural network, named SalDCNN, to predict image saliency by learning features in both pixel and transformed domains. Before proposing Sal-DCNN, we analyze the saliency cues encoded in discrete Fourier transform (DFT) domain. Consequently, we have the following findings: 1) the phase spectrum encodes most saliency cues; 2) a certain pattern of the amplitude spectrum is important for saliency prediction; 3) the transformed domain spectrum is robust to noise and down-sampling for saliency prediction. According to these findings, we develop the structure of SalDCNN, including two main stages: the complex dense encoder and three-stream multi-domain decoder. Given the new SalDCNN structure, the saliency maps can be predicted under the supervision of ground-truth fixation maps in both pixel and transformed domains. Finally, the experimental results show that our Sal-DCNN method outperforms other 8 state-of-theart methods for image saliency prediction on 3 databases.


2020 ◽  
Vol 34 (07) ◽  
pp. 12063-12070
Author(s):  
Chang Tang ◽  
Xinwang Liu ◽  
Xinzhong Zhu ◽  
En Zhu ◽  
Kun Sun ◽  
...  

Defocus blur detection aims to separate the in-focus and out-of-focus regions in an image. Although attracting more and more attention due to its remarkable potential applications, there are still several challenges for accurate defocus blur detection, such as the interference of background clutter, sensitivity to scales and missing boundary details of defocus blur regions. In order to address these issues, we propose a deep neural network which Recurrently Refines Multi-scale Residual Features (R2MRF) for defocus blur detection. We firstly extract multi-scale deep features by utilizing a fully convolutional network. For each layer, we design a novel recurrent residual refinement branch embedded with multiple residual refinement modules (RRMs) to more accurately detect blur regions from the input image. Considering that the features from bottom layers are able to capture rich low-level features for details preservation while the features from top layers are capable of characterizing the semantic information for locating blur regions, we aggregate the deep features from different layers to learn the residual between the intermediate prediction and the ground truth for each recurrent step in each residual refinement branch. Since the defocus degree is sensitive to image scales, we finally fuse the side output of each branch to obtain the final blur detection map. We evaluate the proposed network on two commonly used defocus blur detection benchmark datasets by comparing it with other 11 state-of-the-art methods. Extensive experimental results with ablation studies demonstrate that R2MRF consistently and significantly outperforms the competitors in terms of both efficiency and accuracy.


Sensors ◽  
2021 ◽  
Vol 21 (3) ◽  
pp. 970
Author(s):  
Miguel Ángel Martínez-Domingo ◽  
Juan Luis Nieves ◽  
Eva M. Valero

Saliency prediction is a very important and challenging task within the computer vision community. Many models exist that try to predict the salient regions on a scene from its RGB image values. Several new models are developed, and spectral imaging techniques may potentially overcome the limitations found when using RGB images. However, the experimental study of such models based on spectral images is difficult because of the lack of available data to work with. This article presents the first eight-channel multispectral image database of outdoor urban scenes together with their gaze data recorded using an eyetracker over several observers performing different visualization tasks. Besides, the information from this database is used to study whether the complexity of the images has an impact on the saliency maps retrieved from the observers. Results show that more complex images do not correlate with higher differences in the saliency maps obtained.


2021 ◽  
Author(s):  
Tham Vo

Abstract In abstractive summarization task, most of proposed models adopt the deep recurrent neural network (RNN)-based encoder-decoder architecture to learn and generate meaningful summary for a given input document. However, most of recent RNN-based models always suffer the challenges related to the involvement of much capturing high-frequency/reparative phrases in long documents during the training process which leads to the outcome of trivial and generic summaries are generated. Moreover, the lack of thorough analysis on the sequential and long-range dependency relationships between words within different contexts while learning the textual representation also make the generated summaries unnatural and incoherent. To deal with these challenges, in this paper we proposed a novel semantic-enhanced generative adversarial network (GAN)-based approach for abstractive text summarization task, called as: SGAN4AbSum. We use an adversarial training strategy for our text summarization model in which train the generator and discriminator to simultaneously handle the summary generation and distinguishing the generated summary with the ground-truth one. The input of generator is the jointed rich-semantic and global structural latent representations of training documents which are achieved by applying a combined BERT and graph convolutional network (GCN) textual embedding mechanism. Extensive experiments in benchmark datasets demonstrate the effectiveness of our proposed SGAN4AbSum which achieve the competitive ROUGE-based scores in comparing with state-of-the-art abstractive text summarization baselines.


2018 ◽  
Vol 2018 ◽  
pp. 1-11 ◽  
Author(s):  
Hai Wang ◽  
Lei Dai ◽  
Yingfeng Cai ◽  
Long Chen ◽  
Yong Zhang

Traditional salient object detection models are divided into several classes based on low-level features and contrast between pixels. In this paper, we propose a model based on a multilevel deep pyramid (MLDP), which involves fusing multiple features on different levels. Firstly, the MLDP uses the original image as the input for a VGG16 model to extract high-level features and form an initial saliency map. Next, the MLDP further extracts high-level features to form a saliency map based on a deep pyramid. Then, the MLDP obtains the salient map fused with superpixels by extracting low-level features. After that, the MLDP applies background noise filtering to the saliency map fused with superpixels in order to filter out the interference of background noise and form a saliency map based on the foreground. Lastly, the MLDP combines the saliency map fused with the superpixels with the saliency map based on the foreground, which results in the final saliency map. The MLDP is not limited to low-level features while it fuses multiple features and achieves good results when extracting salient targets. As can be seen in our experiment section, the MLDP is better than the other 7 state-of-the-art models across three different public saliency datasets. Therefore, the MLDP has superiority and wide applicability in extraction of salient targets.


Author(s):  
Monika Singh ◽  
Anand Singh Singh Jalal ◽  
Ruchira Manke ◽  
Aamir Khan

Saliency detection has always been a challenging and interesting research area for researchers. The existing methodologies either focus on foreground regions or background regions of an image by computing low-level features. However, considering only low-level features did not produce worthy results. In this paper, low-level features, which are extracted using super pixels, are embodied with high-level priors. The background features are assumed as the low-level prior due to the similarity in the background areas and boundary of an image which are interconnected and have minimum distance in between them. High-level priors such as location, color, and semantic prior are incorporated with low-level prior to spotlight the salient area in the image. The experimental results illustrate that the proposed approach outperform the sate-of-the-art methods.


Sensors ◽  
2020 ◽  
Vol 20 (22) ◽  
pp. 6647
Author(s):  
Xiang Yan ◽  
Syed Zulqarnain Gilani ◽  
Hanlin Qin ◽  
Ajmal Mian

Convolutional neural networks have recently been used for multi-focus image fusion. However, some existing methods have resorted to adding Gaussian blur to focused images, to simulate defocus, thereby generating data (with ground-truth) for supervised learning. Moreover, they classify pixels as ‘focused’ or ‘defocused’, and use the classified results to construct the fusion weight maps. This then necessitates a series of post-processing steps. In this paper, we present an end-to-end learning approach for directly predicting the fully focused output image from multi-focus input image pairs. The suggested approach uses a CNN architecture trained to perform fusion, without the need for ground truth fused images. The CNN exploits the image structural similarity (SSIM) to calculate the loss, a metric that is widely accepted for fused image quality evaluation. What is more, we also use the standard deviation of a local window of the image to automatically estimate the importance of the source images in the final fused image when designing the loss function. Our network can accept images of variable sizes and hence, we are able to utilize real benchmark datasets, instead of simulated ones, to train our network. The model is a feed-forward, fully convolutional neural network that can process images of variable sizes during test time. Extensive evaluation on benchmark datasets show that our method outperforms, or is comparable with, existing state-of-the-art techniques on both objective and subjective benchmarks.


Author(s):  
Liming Li ◽  
Xiaodong Chai ◽  
Shuguang Zhao ◽  
Shubin Zheng ◽  
Shengchao Su

This paper proposes an effective method to elevate the performance of saliency detection via iterative bootstrap learning, which consists of two tasks including saliency optimization and saliency integration. Specifically, first, multiscale segmentation and feature extraction are performed on the input image successively. Second, prior saliency maps are generated using existing saliency models, which are used to generate the initial saliency map. Third, prior maps are fed into the saliency regressor together, where training samples are collected from the prior maps at multiple scales and the random forest regressor is learned from such training data. An integration of the initial saliency map and the output of saliency regressor is deployed to generate the coarse saliency map. Finally, in order to improve the quality of saliency map further, both initial and coarse saliency maps are fed into the saliency regressor together, and then the output of the saliency regressor, the initial saliency map as well as the coarse saliency map are integrated into the final saliency map. Experimental results on three public data sets demonstrate that the proposed method consistently achieves the best performance and significant improvement can be obtained when applying our method to existing saliency models.


Author(s):  
Xiaowang Zhang ◽  
Qiang Gao ◽  
Zhiyong Feng

In this paper, we present a neural network (InteractionNN) for sparse predictive analysis where hidden features of sparse data can be learned by multilevel feature interaction. To characterize multilevel interaction of features, InteractionNN consists of three modules, namely, nonlinear interaction pooling, layer-lossing, and embedding. Nonlinear interaction pooling (NI pooling) is a hierarchical structure and, by shortcut connection, constructs low-level feature interactions from basic dense features to elementary features. Layer-lossing is a feed-forward neural network where high-level feature interactions can be learned from low-level feature interactions via correlation of all layers with target. Moreover, embedding is to extract basic dense features from sparse features of data which can help in reducing our proposed model computational complex. Finally, our experiment evaluates on the two benchmark datasets and the experimental results show that InteractionNN performs better than most of state-of-the-art models in sparse regression.


Sign in / Sign up

Export Citation Format

Share Document