Progressive Feature Polishing Network for Salient Object Detection

Bo Wang; Quan Chen; Min Zhou; Zhiqiang Zhang; Xiaogang Jin; Kun Gai

doi:10.1609/aaai.v34i07.6892

Progressive Feature Polishing Network for Salient Object Detection

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i07.6892 ◽

2020 ◽

Vol 34 (07) ◽

pp. 12128-12135 ◽

Cited By ~ 1

Author(s):

Bo Wang ◽

Quan Chen ◽

Min Zhou ◽

Zhiqiang Zhang ◽

Xiaogang Jin ◽

...

Keyword(s):

Object Detection ◽

State Of The Art ◽

Hierarchical Structures ◽

Salient Object Detection ◽

Salient Object ◽

Post Processing ◽

Feature Maps ◽

Multiple Feature ◽

Benchmark Datasets ◽

Multi Level

Feature matters for salient object detection. Existing methods mainly focus on designing a sophisticated structure to incorporate multi-level features and filter out cluttered features. We present Progressive Feature Polishing Network (PFPN), a simple yet effective framework to progressively polish the multi-level features to be more accurate and representative. By employing multiple Feature Polishing Modules (FPMs) in a recurrent manner, our approach is able to detect salient objects with fine details without any post-processing. A FPM parallelly updates the features of each level by directly incorporating all higher level context information. Moreover, it can keep the dimensions and hierarchical structures of the feature maps, which makes it flexible to be integrated with any CNN-based models. Empirical experiments show that our results are monotonically getting better with increasing number of FPMs. Without bells and whistles, PFPN outperforms the state-of-the-art methods significantly on five benchmark datasets under various evaluation metrics. Our code is available at: https://github.com/chenquan-cq/PFPN.

Download Full-text

F³Net: Fusion, Feedback and Focus for Salient Object Detection

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i07.6916 ◽

2020 ◽

Vol 34 (07) ◽

pp. 12321-12328 ◽

Cited By ~ 6

Author(s):

Jun Wei ◽

Shuhui Wang ◽

Qingming Huang

Keyword(s):

Object Detection ◽

Feature Fusion ◽

Receptive Fields ◽

Feedback Mechanism ◽

Salient Object Detection ◽

Salient Object ◽

Structure Information ◽

Multi Stage ◽

Benchmark Datasets ◽

Multi Level

Most of existing salient object detection models have achieved great progress by aggregating multi-level features extracted from convolutional neural networks. However, because of the different receptive fields of different convolutional layers, there exists big differences between features generated by these layers. Common feature fusion strategies (addition or concatenation) ignore these differences and may cause suboptimal solutions. In this paper, we propose the F3Net to solve above problem, which mainly consists of cross feature module (CFM) and cascaded feedback decoder (CFD) trained by minimizing a new pixel position aware loss (PPA). Specifically, CFM aims to selectively aggregate multi-level features. Different from addition and concatenation, CFM adaptively selects complementary components from input features before fusion, which can effectively avoid introducing too much redundant information that may destroy the original features. Besides, CFD adopts a multi-stage feedback mechanism, where features closed to supervision will be introduced to the output of previous layers to supplement them and eliminate the differences between features. These refined features will go through multiple similar iterations before generating the final saliency maps. Furthermore, different from binary cross entropy, the proposed PPA loss doesn't treat pixels equally, which can synthesize the local structure information of a pixel to guide the network to focus more on local details. Hard pixels from boundaries or error-prone parts will be given more attention to emphasize their importance. F3Net is able to segment salient object regions accurately and provide clear local details. Comprehensive experiments on five benchmark datasets demonstrate that F3Net outperforms state-of-the-art approaches on six evaluation metrics. Code will be released at https://github.com/weijun88/F3Net.

Download Full-text

Multi-level Features Selection Network Based on Multi-attention for Salient Object Detection

10.1007/978-3-030-87355-4_27 ◽

2021 ◽

pp. 315-326

Author(s):

Jianyi Ren ◽

Zheng Wang ◽

Meijun Sun

Keyword(s):

Object Detection ◽

Salient Object Detection ◽

Features Selection ◽

Salient Object ◽

Multi Level

Download Full-text

SuperVAE: Superpixelwise Variational Autoencoder for Salient Object Detection

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33018569 ◽

2019 ◽

Vol 33 ◽

pp. 8569-8576 ◽

Cited By ~ 2

Author(s):

Bo Li ◽

Zhengxing Sun ◽

Yuqi Guo

Keyword(s):

Deep Learning ◽

Object Detection ◽

Saliency Detection ◽

Salient Object Detection ◽

Salient Object ◽

Image Saliency ◽

Spatial Consistency ◽

Variational Autoencoder ◽

Benchmark Datasets ◽

Supervised Methods

Image saliency detection has recently witnessed rapid progress due to deep neural networks. However, there still exist many important problems in the existing deep learning based methods. Pixel-wise convolutional neural network (CNN) methods suffer from blurry boundaries due to the convolutional and pooling operations. While region-based deep learning methods lack spatial consistency since they deal with each region independently. In this paper, we propose a novel salient object detection framework using a superpixelwise variational autoencoder (SuperVAE) network. We first use VAE to model the image background and then separate salient objects from the background through the reconstruction residuals. To better capture semantic and spatial contexts information, we also propose a perceptual loss to take advantage from deep pre-trained CNNs to train our SuperVAE network. Without the supervision of mask-level annotated data, our method generates high quality saliency results which can better preserve object boundaries and maintain the spatial consistency. Extensive experiments on five wildly-used benchmark datasets show that the proposed method achieves superior or competitive performance compared to other algorithms including the very recent state-of-the-art supervised methods.

Download Full-text

Multi-level and multi-scale deep saliency network for salient object detection

Journal of Visual Communication and Image Representation ◽

10.1016/j.jvcir.2019.01.034 ◽

2019 ◽

Vol 59 ◽

pp. 415-424 ◽

Cited By ~ 1

Author(s):

Qing Zhang ◽

Jiajun Lin ◽

JingJing Zhuge ◽

Wenhao Yuan

Keyword(s):

Object Detection ◽

Salient Object Detection ◽

Salient Object ◽

Multi Scale ◽

Multi Level

Download Full-text

Deeper and Mixed Supervision for Salient Object Detection in Automated Surface Inspection

Mathematical Problems in Engineering ◽

10.1155/2020/3751053 ◽

2020 ◽

Vol 2020 ◽

pp. 1-12

Author(s):

Senbo Yan ◽

Xiaowen Song ◽

Guocong Liu

Keyword(s):

Object Detection ◽

Ground Truth ◽

Salient Object Detection ◽

Surface Inspection ◽

Layer By Layer ◽

Salient Object ◽

Feature Maps ◽

Saliency Maps ◽

Refinement Mechanism ◽

Inspection Tasks

In recent years, researches in the field of salient object detection have been widely made in many industrial visual inspection tasks. Automated surface inspection (ASI) can be regarded as one of the most challenging tasks in computer vision because of its high cost of data acquisition, serious imbalance of test samples, and high real-time requirement. Inspired by the requirements of industrial ASI and the methods of salient object detection (SOD), a task mode of defect type classification plus defect area segmentation and a novel deeper and mixed supervision network (DMS) architecture is proposed. The backbone network ResNeXt-101 was pretrained on ImageNet. Firstly, we extract five multiscale feature maps from backbone and concatenate them layer by layer. In addition, to obtain the classification prediction and saliency maps in one stage, the image-level and pixel-level ground truth is trained in a same side output network. Supervision signal is imposed on each side layer to realize deeper and mixed training for the network. Furthermore, the DMS network is equipped with residual refinement mechanism to refine the saliency maps of input images. We evaluate the DMS network on 4 open access ASI datasets and compare it with other 20 methods, which indicates that mixed supervision can significantly improve the accuracy of saliency segmentation. Experiment results show that the proposed method can achieve the state-of-the-art performance.

Download Full-text

Unsupervised Salient Object Detection by Aggregating Multi-Level Cues

IEEE Photonics Journal ◽

10.1109/jphot.2018.2881271 ◽

2018 ◽

Vol 10 (6) ◽

pp. 1-11

Author(s):

Chenxing Xia ◽

Hanling Zhang

Keyword(s):

Object Detection ◽

Salient Object Detection ◽

Salient Object ◽

Multi Level

Download Full-text

Revise-Net: Exploiting Reverse Attention Mechanism for Salient Object Detection

Remote Sensing ◽

10.3390/rs13234941 ◽

2021 ◽

Vol 13 (23) ◽

pp. 4941

Author(s):

Rukhshanda Hussain ◽

Yash Karbhari ◽

Muhammad Fazal Ijaz ◽

Marcin Woźniak ◽

Pawan Kumar Singh ◽

...

Keyword(s):

Object Detection ◽

State Of The Art ◽

Similarity Index ◽

Saliency Map ◽

Salient Object Detection ◽

Semantic Features ◽

Salient Object ◽

Fully Convolutional Neural Networks ◽

Boundary Estimation ◽

Prediction Module

Recently, deep learning-based methods, especially utilizing fully convolutional neural networks, have shown extraordinary performance in salient object detection. Despite its success, the clean boundary detection of the saliency objects is still a challenging task. Most of the contemporary methods focus on exclusive edge detection modules in order to avoid noisy boundaries. In this work, we propose leveraging on the extraction of finer semantic features from multiple encoding layers and attentively re-utilize it in the generation of the final segmentation result. The proposed Revise-Net model is divided into three parts: (a) the prediction module, (b) a residual enhancement module, and (c) reverse attention modules. Firstly, we generate the coarse saliency map through the prediction modules, which are fine-tuned in the enhancement module. Finally, multiple reverse attention modules at varying scales are cascaded between the two networks to guide the prediction module by employing the intermediate segmentation maps generated at each downsampling level of the REM. Our method efficiently classifies the boundary pixels using a combination of binary cross-entropy, similarity index, and intersection over union losses at the pixel, patch, and map levels, thereby effectively segmenting the saliency objects in an image. In comparison with several state-of-the-art frameworks, our proposed Revise-Net model outperforms them with a significant margin on three publicly available datasets, DUTS-TE, ECSSD, and HKU-IS, both on regional and boundary estimation measures.

Download Full-text

Salient Object Detection with Semantic Priors

Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2017/628 ◽

2017 ◽

Cited By ~ 5

Author(s):

Tam V. Nguyen ◽

Luoqi Liu

Keyword(s):

Object Detection ◽

State Of The Art ◽

Semantic Segmentation ◽

Saliency Map ◽

Salient Object Detection ◽

Salient Object ◽

Artificial Intelligence Research ◽

Semantic Map ◽

Regional Features ◽

Computational Sciences

Salient object detection has increasingly become a popular topic in cognitive and computational sciences, including computer vision and artificial intelligence research. In this paper, we propose integrating semantic priors into the salient object detection process. Our algorithm consists of three basic steps. Firstly, the explicit saliency map is obtained based on the semantic segmentation refined by the explicit saliency priors learned from the data. Next, the implicit saliency map is computed based on a trained model which maps the implicit saliency priors embedded into regional features with the saliency values. Finally, the explicit semantic map and the implicit map are adaptively fused to form a pixel-accurate saliency map which uniformly covers the objects of interest. We further evaluate the proposed framework on two challenging datasets, namely, ECSSD and HKUIS. The extensive experimental results demonstrate that our method outperforms other state-of-the-art methods.

Download Full-text

Revisiting Multi-Level Feature Fusion: A Simple Yet Effective Network for Salient Object Detection

2019 IEEE International Conference on Image Processing (ICIP) ◽

10.1109/icip.2019.8803646 ◽

2019 ◽

Author(s):

Yu Qiu ◽

Yun Liu ◽

Xiaoxu Ma ◽

Lei Liu ◽

Hongcan Gao ◽

...

Keyword(s):

Object Detection ◽

Feature Fusion ◽

Salient Object Detection ◽

Salient Object ◽

Multi Level ◽

Effective Network

Download Full-text

Top-Down Fusing Multi-level Contextual Features for Salient Object Detection

Pattern Recognition and Computer Vision - Lecture Notes in Computer Science ◽

10.1007/978-3-030-60636-7_5 ◽

2020 ◽

pp. 54-65

Author(s):

Mingyuan Pan ◽

Huihui Song ◽

Junxia Li ◽

Kaihua Zhang ◽

Qingshan Liu

Keyword(s):

Object Detection ◽

Salient Object Detection ◽

Salient Object ◽

Top Down ◽

Contextual Features ◽

Multi Level

Download Full-text