Context-aware Cross-level Fusion Network for Camouflaged Object Detection

Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2021/142 ◽

2021 ◽

Author(s):

Yujia Sun ◽

Geng Chen ◽

Tao Zhou ◽

Yi Zhang ◽

Nian Liu

Keyword(s):

Object Detection ◽

State Of The Art ◽

Context Aware ◽

Global Context ◽

Feature Representations ◽

Multi Scale ◽

Benchmark Datasets ◽

Multi Level ◽

High Level ◽

Level Fusion

Camouflaged object detection (COD) is a challenging task due to the low boundary contrast between the object and its surroundings. In addition, the appearance of camouflaged objects varies significantly, e.g., object size and shape, aggravating the difficulties of accurate COD. In this paper, we propose a novel Context-aware Cross-level Fusion Network (C2F-Net) to address the challenging COD task. Specifically, we propose an Attention-induced Cross-level Fusion Module (ACFM) to integrate the multi-level features with informative attention coefficients. The fused features are then fed to the proposed Dual-branch Global Context Module (DGCM), which yields multi-scale feature representations for exploiting rich global context information. In C2F-Net, the two modules are conducted on high-level features using a cascaded manner. Extensive experiments on three widely used benchmark datasets demonstrate that our C2F-Net is an effective COD model and outperforms state-of-the-art models remarkably. Our code is publicly available at: https://github.com/thograce/C2FNet.

Download Full-text

Y-Net: Dual-branch Joint Network for Semantic Segmentation

ACM Transactions on Multimedia Computing Communications and Applications ◽

10.1145/3460940 ◽

2021 ◽

Vol 17 (4) ◽

pp. 1-22

Author(s):

Yizhen Chen ◽

Haifeng Hu

Keyword(s):

Feature Vector ◽

State Of The Art ◽

Computational Cost ◽

Receptive Fields ◽

Semantic Segmentation ◽

Global Context ◽

Multi Level ◽

The One ◽

Public Datasets ◽

High Level

Most existing segmentation networks are built upon a “ U -shaped” encoder–decoder structure, where the multi-level features extracted by the encoder are gradually aggregated by the decoder. Although this structure has been proven to be effective in improving segmentation performance, there are two main drawbacks. On the one hand, the introduction of low-level features brings a significant increase in calculations without an obvious performance gain. On the other hand, general strategies of feature aggregation such as addition and concatenation fuse features without considering the usefulness of each feature vector, which mixes the useful information with massive noises. In this article, we abandon the traditional “ U -shaped” architecture and propose Y-Net, a dual-branch joint network for accurate semantic segmentation. Specifically, it only aggregates the high-level features with low-resolution and utilizes the global context guidance generated by the first branch to refine the second branch. The dual branches are effectively connected through a Semantic Enhancing Module, which can be regarded as the combination of spatial attention and channel attention. We also design a novel Channel-Selective Decoder (CSD) to adaptively integrate features from different receptive fields by assigning specific channelwise weights, where the weights are input-dependent. Our Y-Net is capable of breaking through the limit of singe-branch network and attaining higher performance with less computational cost than “ U -shaped” structure. The proposed CSD can better integrate useful information and suppress interference noises. Comprehensive experiments are carried out on three public datasets to evaluate the effectiveness of our method. Eventually, our Y-Net achieves state-of-the-art performance on PASCAL VOC 2012, PASCAL Person-Part, and ADE20K dataset without pre-training on extra datasets.

Download Full-text

Global Context-Aware Progressive Aggregation Network for Salient Object Detection

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i07.6633 ◽

2020 ◽

Vol 34 (07) ◽

pp. 10599-10606 ◽

Cited By ~ 4

Author(s):

Zuyao Chen ◽

Qianqian Xu ◽

Runmin Cong ◽

Qingming Huang

Keyword(s):

Object Detection ◽

Critical Role ◽

Saliency Map ◽

Salient Object Detection ◽

Semantic Features ◽

Context Aware ◽

Salient Object ◽

Deep Convolutional Neural Networks ◽

Global Context ◽

High Level

Deep convolutional neural networks have achieved competitive performance in salient object detection, in which how to learn effective and comprehensive features plays a critical role. Most of the previous works mainly adopted multiple-level feature integration yet ignored the gap between different features. Besides, there also exists a dilution process of high-level features as they passed on the top-down pathway. To remedy these issues, we propose a novel network named GCPANet to effectively integrate low-level appearance features, high-level semantic features, and global context features through some progressive context-aware Feature Interweaved Aggregation (FIA) modules and generate the saliency map in a supervised way. Moreover, a Head Attention (HA) module is used to reduce information redundancy and enhance the top layers features by leveraging the spatial and channel-wise attention, and the Self Refinement (SR) module is utilized to further refine and heighten the input features. Furthermore, we design the Global Context Flow (GCF) module to generate the global context information at different stages, which aims to learn the relationship among different salient regions and alleviate the dilution effect of high-level features. Experimental results on six benchmark datasets demonstrate that the proposed approach outperforms the state-of-the-art methods both quantitatively and qualitatively.

Download Full-text

Context-Aware Image Inpainting with Learned Semantic Priors

Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2021/183 ◽

2021 ◽

Author(s):

Wendong Zhang ◽

Junwei Zhu ◽

Ying Tai ◽

Yunbo Wang ◽

Wenqing Chu ◽

...

Keyword(s):

State Of The Art ◽

Contextual Information ◽

Image Inpainting ◽

Context Aware ◽

Global Context ◽

Low Level ◽

Complex Scenes ◽

Knowledge Distillation ◽

Image Generator ◽

High Level

Recent advances in image inpainting have shown impressive results for generating plausible visual details on rather simple backgrounds. However, for complex scenes, it is still challenging to restore reasonable contents as the contextual information within the missing regions tends to be ambiguous. To tackle this problem, we introduce pretext tasks that are semantically meaningful to estimating the missing contents. In particular, we perform knowledge distillation on pretext models and adapt the features to image inpainting. The learned semantic priors ought to be partially invariant between the high-level pretext task and low-level image inpainting, which not only help to understand the global context but also provide structural guidance for the restoration of local textures. Based on the semantic priors, we further propose a context-aware image inpainting model, which adaptively integrates global semantics and local features in a unified image generator. The semantic learner and the image generator are trained in an end-to-end manner. We name the model SPL to highlight its ability to learn and leverage semantic priors. It achieves the state of the art on Places2, CelebA, and Paris StreetView datasets

Download Full-text

Multi-Scale Dense Attention Network for Stereo Matching

Electronics ◽

10.3390/electronics9111881 ◽

2020 ◽

Vol 9 (11) ◽

pp. 1881

Author(s):

Yuhui Chang ◽

Jiangtao Xu ◽

Zhiyuan Gao

Keyword(s):

Feature Extraction ◽

Stereo Matching ◽

State Of The Art ◽

Ground Truth ◽

Context Information ◽

Context Aware ◽

Feature Maps ◽

Attention Network ◽

Multi Scale ◽

Benchmark Datasets

To improve the accuracy of stereo matching, the multi-scale dense attention network (MDA-Net) is proposed. The network introduces two novel modules in the feature extraction stage to achieve better exploit of context information: dual-path upsampling (DU) block and attention-guided context-aware pyramid feature extraction (ACPFE) block. The DU block is introduced to fuse different scale feature maps. It introduces sub-pixel convolution to compensate for the loss of information caused by the traditional interpolation upsampling method. The ACPFE block is proposed to extract multi-scale context information. Pyramid atrous convolution is adopted to exploit multi-scale features and the channel-attention is used to fuse the multi-scale features. The proposed network has been evaluated on several benchmark datasets. The three-pixel-error evaluated over all ground truth pixels is 2.10% on KITTI 2015 dataset. The experiment results prove that MDA-Net achieves state-of-the-art accuracy on KITTI 2012 and 2015 datasets.

Download Full-text

Progressive Feature Polishing Network for Salient Object Detection

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i07.6892 ◽

2020 ◽

Vol 34 (07) ◽

pp. 12128-12135 ◽

Cited By ~ 1

Author(s):

Bo Wang ◽

Quan Chen ◽

Min Zhou ◽

Zhiqiang Zhang ◽

Xiaogang Jin ◽

...

Keyword(s):

Object Detection ◽

State Of The Art ◽

Hierarchical Structures ◽

Salient Object Detection ◽

Salient Object ◽

Post Processing ◽

Feature Maps ◽

Multiple Feature ◽

Benchmark Datasets ◽

Multi Level

Feature matters for salient object detection. Existing methods mainly focus on designing a sophisticated structure to incorporate multi-level features and filter out cluttered features. We present Progressive Feature Polishing Network (PFPN), a simple yet effective framework to progressively polish the multi-level features to be more accurate and representative. By employing multiple Feature Polishing Modules (FPMs) in a recurrent manner, our approach is able to detect salient objects with fine details without any post-processing. A FPM parallelly updates the features of each level by directly incorporating all higher level context information. Moreover, it can keep the dimensions and hierarchical structures of the feature maps, which makes it flexible to be integrated with any CNN-based models. Empirical experiments show that our results are monotonically getting better with increasing number of FPMs. Without bells and whistles, PFPN outperforms the state-of-the-art methods significantly on five benchmark datasets under various evaluation metrics. Our code is available at: https://github.com/chenquan-cq/PFPN.

Download Full-text

Feature Augmented Memory with Global Attention Network for VideoQA

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2020/139 ◽

2020 ◽

Author(s):

Jiayin Cai ◽

Chun Yuan ◽

Cheng Shi ◽

Lei Li ◽

Yangyang Cheng ◽

...

Keyword(s):

Question Answering ◽

Temporal Order ◽

State Of The Art ◽

Memory Capacity ◽

Coarse Grain ◽

Fine Grained ◽

Feature Representations ◽

Benchmark Datasets ◽

High Level ◽

Video Question Answering

Recently, Recurrent Neural Network (RNN) based methods and Self-Attention (SA) based methods have achieved promising performance in Video Question Answering (VideoQA). Despite the success of these works, RNN-based methods tend to forget the global semantic contents due to the inherent drawbacks of the recurrent units themselves, while SA-based methods cannot precisely capture the dependencies of the local neighborhood, leading to insufficient modeling for temporal order. To tackle these problems, we propose a novel VideoQA framework which progressively refines the representations of videos and questions from fine to coarse grain in a sequence-sensitive manner. Specifically, our model improves the feature representations via the following two steps: (1) introducing two fine-grained feature-augmented memories to strengthen the information augmentation of video and text which can improve memory capacity by memorizing more relevant and targeted information. (2) appending the self-attention and co-attention module to the memory output thus the module is able to capture global interaction between high-level semantic informations. Experimental results show that our approach achieves state-of-the-art performance on VideoQA benchmark datasets.

Download Full-text

A Fast and Lightweight Method with Feature Fusion and Multi-Context for Face Detection

Future Internet ◽

10.3390/fi10080080 ◽

2018 ◽

Vol 10 (8) ◽

pp. 80

Author(s):

Lei Zhang ◽

Xiaoli Zhi

Keyword(s):

Face Detection ◽

Graphics Processing Units ◽

High Performance ◽

Feature Fusion ◽

Local Context ◽

Data Set ◽

Global Context ◽

Detection Algorithms ◽

Multi Scale ◽

Benchmark Datasets

Convolutional neural networks (CNN for short) have made great progress in face detection. They mostly take computation intensive networks as the backbone in order to obtain high precision, and they cannot get a good detection speed without the support of high-performance GPUs (Graphics Processing Units). This limits CNN-based face detection algorithms in real applications, especially in some speed dependent ones. To alleviate this problem, we propose a lightweight face detector in this paper, which takes a fast residual network as backbone. Our method can run fast even on cheap and ordinary GPUs. To guarantee its detection precision, multi-scale features and multi-context are fully exploited in efficient ways. Specifically, feature fusion is used to obtain semantic strongly multi-scale features firstly. Then multi-context including both local and global context is added to these multi-scale features without extra computational burden. The local context is added through a depthwise separable convolution based approach, and the global context by a simple global average pooling way. Experimental results show that our method can run at about 110 fps on VGA (Video Graphics Array)-resolution images, while still maintaining competitive precision on WIDER FACE and FDDB (Face Detection Data Set and Benchmark) datasets as compared with its state-of-the-art counterparts.

Download Full-text

LGCNet: A Local-to-global Context-aware Feature Augmentation Network for Salient Object Detection

Information Sciences ◽

10.1016/j.ins.2021.10.055 ◽

2021 ◽

Author(s):

Yuzhu Ji ◽

Haijun Zhang ◽

Feng Gao ◽

Haofei Sun ◽

Haokun Wei ◽

...

Keyword(s):

Object Detection ◽

Salient Object Detection ◽

Context Aware ◽

Salient Object ◽

Global Context ◽

Feature Augmentation

Download Full-text

High-Level Video Semantic Concept Detection Based on Multi-level Feature Representations

Lecture Notes in Computer Science - Advances in Multimedia Information Processing – PCM 2013 ◽

10.1007/978-3-319-03731-8_61 ◽

2013 ◽

pp. 658-668 ◽

Cited By ~ 1

Author(s):

Lijuan Liu ◽

Haojie Li ◽

Fuming Sun ◽

Yaomin Yin ◽

Chenxin Liu

Keyword(s):

Semantic Concept ◽

Concept Detection ◽

Feature Representations ◽

Multi Level ◽

High Level

Download Full-text

DCPNet: A Densely Connected Pyramid Network for Monocular Depth Estimation

Sensors ◽

10.3390/s21206780 ◽

2021 ◽

Vol 21 (20) ◽

pp. 6780

Author(s):

Zhitong Lai ◽

Rui Tian ◽

Zhiguo Wu ◽

Nannan Ding ◽

Linjian Sun ◽

...

Keyword(s):

Multiple Scales ◽

Feature Fusion ◽

State Of The Art ◽

Depth Estimation ◽

Multi Scale ◽

Pyramid Structure ◽

Benchmark Datasets ◽

The Common ◽

Monocular Depth ◽

Multiple Stages

Pyramid architecture is a useful strategy to fuse multi-scale features in deep monocular depth estimation approaches. However, most pyramid networks fuse features only within the adjacent stages in a pyramid structure. To take full advantage of the pyramid structure, inspired by the success of DenseNet, this paper presents DCPNet, a densely connected pyramid network that fuses multi-scale features from multiple stages of the pyramid structure. DCPNet not only performs feature fusion between the adjacent stages, but also non-adjacent stages. To fuse these features, we design a simple and effective dense connection module (DCM). In addition, we offer a new consideration of the common upscale operation in our approach. We believe DCPNet offers a more efficient way to fuse features from multiple scales in a pyramid-like network. We perform extensive experiments using both outdoor and indoor benchmark datasets (i.e., the KITTI and the NYU Depth V2 datasets) and DCPNet achieves the state-of-the-art results.

Download Full-text