Deep Feature Fusion with Integration of Residual Connection and Attention Model for Classification of VHR Remote Sensing Images

The classification of very-high-resolution (VHR) remote sensing images is essential in many applications. However, high intraclass and low interclass variations in these kinds of images pose serious challenges. Fully convolutional network (FCN) models, which benefit from a powerful feature learning ability, have shown impressive performance and great potential. Nevertheless, only classification results with coarse resolution can be obtained from the original FCN method. Deep feature fusion is often employed to improve the resolution of outputs. Existing strategies for such fusion are not capable of properly utilizing the low-level features and considering the importance of features at different scales. This paper proposes a novel, end-to-end, fully convolutional network to integrate a multiconnection ResNet model and a class-specific attention model into a unified framework to overcome these problems. The former fuses multilevel deep features without introducing any redundant information from low-level features. The latter can learn the contributions from different features of each geo-object at each scale. Extensive experiments on two open datasets indicate that the proposed method can achieve class-specific scale-adaptive classification results and it outperforms other state-of-the-art methods. The results were submitted to the International Society for Photogrammetry and Remote Sensing (ISPRS) online contest for comparison with more than 50 other methods. The results indicate that the proposed method (ID: SWJ_2) ranks #1 in terms of overall accuracy, even though no additional digital surface model (DSM) data that were offered by ISPRS were used and no postprocessing was applied.

Download Full-text

AttentionBased Deep Feature Fusion for the Scene Classification of HighResolution Remote Sensing Images

Remote Sensing ◽

10.3390/rs11171996 ◽

2019 ◽

Vol 11 (17) ◽

pp. 1996 ◽

Cited By ~ 7

Author(s):

Zhu ◽

Yan ◽

Mo ◽

Liu

Keyword(s):

Remote Sensing ◽

Loss Function ◽

Feature Fusion ◽

Cross Entropy ◽

Scene Classification ◽

Remote Sensing Images ◽

Graphic Processing Units ◽

Entropy Loss ◽

Deep Feature

Scene classification of highresolution remote sensing images (HRRSI) is one of the most important means of landcover classification. Deep learning techniques, especially the convolutional neural network (CNN) have been widely applied to the scene classification of HRRSI due to the advancement of graphic processing units (GPU). However, they tend to extract features from the whole images rather than discriminative regions. The visual attention mechanism can force the CNN to focus on discriminative regions, but it may suffer from the influence of intraclass diversity and repeated texture. Motivated by these problems, we propose an attention-based deep feature fusion (ADFF) framework that constitutes three parts, namely attention maps generated by Gradientweighted Class Activation Mapping (GradCAM), a multiplicative fusion of deep features and the centerbased cross-entropy loss function. First of all, we propose to make attention maps generated by GradCAM as an explicit input in order to force the network to concentrate on discriminative regions. Then, deep features derived from original images and attention maps are proposed to be fused by multiplicative fusion in order to consider both improved abilities to distinguish scenes of repeated texture and the salient regions. Finally, the centerbased cross-entropy loss function that utilizes both the cross-entropy loss and center loss function is proposed to backpropagate fused features so as to reduce the effect of intraclass diversity on feature representations. The proposed ADFF architecture is tested on three benchmark datasets to show its performance in scene classification. The experiments confirm that the proposed method outperforms most competitive scene classification methods with an average overall accuracy of 94% under different training ratios.

Download Full-text

Retraction: Zhu R. et al. Attention-Based Deep Feature Fusion for the Scene Classification of High-Resolution Remote Sensing Images. Remote Sensing. 2019, 11(17), 1996

Remote Sensing ◽

10.3390/rs12040742 ◽

2020 ◽

Vol 12 (4) ◽

pp. 742 ◽

Cited By ~ 1

Author(s):

Ruixi Zhu ◽

Li Yan ◽

Nan Mo ◽

Yi Liu

Keyword(s):

Remote Sensing ◽

High Resolution ◽

Research Method ◽

Feature Fusion ◽

Scene Classification ◽

Remote Sensing Images ◽

Deep Feature

We have been made aware that the innovative contributions, research method and the majority of the content of this article [...]

Download Full-text

Classification of High-Resolution Remote Sensing Images in the Feilaixia Reservoir Based on a Fully Convolutional Network

IEEE Access ◽

10.1109/access.2020.3021071 ◽

2020 ◽

Vol 8 ◽

pp. 161752-161764

Author(s):

Pinghao Wu ◽

Kaiwen Zhong ◽

Hongda Hu ◽

Jianhui Xu ◽

Yunpeng Wang ◽

...

Keyword(s):

Remote Sensing ◽

High Resolution ◽

Remote Sensing Images ◽

Convolutional Network ◽

Fully Convolutional Network

Download Full-text

Class-Wise Fully Convolutional Network for Semantic Segmentation of Remote Sensing Images

Remote Sensing ◽

10.3390/rs13163211 ◽

2021 ◽

Vol 13 (16) ◽

pp. 3211

Author(s):

Tian Tian ◽

Zhengquan Chu ◽

Qian Hu ◽

Li Ma

Keyword(s):

Remote Sensing ◽

Image Interpretation ◽

Semantic Segmentation ◽

Remote Sensing Images ◽

Feature Maps ◽

Convolutional Network ◽

Fully Convolutional Network ◽

Semantic Labeling ◽

Benchmark Datasets ◽

Semantic Label

Semantic segmentation is a fundamental task in remote sensing image interpretation, which aims to assign a semantic label for every pixel in the given image. Accurate semantic segmentation is still challenging due to the complex distributions of various ground objects. With the development of deep learning, a series of segmentation networks represented by fully convolutional network (FCN) has made remarkable progress on this problem, but the segmentation accuracy is still far from expectations. This paper focuses on the importance of class-specific features of different land cover objects, and presents a novel end-to-end class-wise processing framework for segmentation. The proposed class-wise FCN (C-FCN) is shaped in the form of an encoder-decoder structure with skip-connections, in which the encoder is shared to produce general features for all categories and the decoder is class-wise to process class-specific features. To be detailed, class-wise transition (CT), class-wise up-sampling (CU), class-wise supervision (CS), and class-wise classification (CC) modules are designed to achieve the class-wise transfer, recover the resolution of class-wise feature maps, bridge the encoder and modified decoder, and implement class-wise classifications, respectively. Class-wise and group convolutions are adopted in the architecture with regard to the control of parameter numbers. The method is tested on the public ISPRS 2D semantic labeling benchmark datasets. Experimental results show that the proposed C-FCN significantly improves the segmentation performances compared with many state-of-the-art FCN-based networks, revealing its potentials on accurate segmentation of complex remote sensing images.

Download Full-text

Attention-Based Pyramid Network for Segmentation and Classification of High-Resolution and Hyperspectral Remote Sensing Images

Remote Sensing ◽

10.3390/rs12213501 ◽

2020 ◽

Vol 12 (21) ◽

pp. 3501

Author(s):

Qingsong Xu ◽

Xin Yuan ◽

Chaojun Ouyang ◽

Yue Zeng

Keyword(s):

Remote Sensing ◽

High Resolution ◽

Image Classification ◽

Large Scale ◽

Feature Fusion ◽

Spectral Information ◽

Spatial Problem ◽

Remote Sensing Images ◽

Heavy Weight

Unlike conventional natural (RGB) images, the inherent large scale and complex structures of remote sensing images pose major challenges such as spatial object distribution diversity and spectral information extraction when existing models are directly applied for image classification. In this study, we develop an attention-based pyramid network for segmentation and classification of remote sensing datasets. Attention mechanisms are used to develop the following modules: (i) a novel and robust attention-based multi-scale fusion method effectively fuses useful spatial or spectral information at different and same scales; (ii) a region pyramid attention mechanism using region-based attention addresses the target geometric size diversity in large-scale remote sensing images; and (iii) cross-scale attention in our adaptive atrous spatial pyramid pooling network adapts to varied contents in a feature-embedded space. Different forms of feature fusion pyramid frameworks are established by combining these attention-based modules. First, a novel segmentation framework, called the heavy-weight spatial feature fusion pyramid network (FFPNet), is proposed to address the spatial problem of high-resolution remote sensing images. Second, an end-to-end spatial-spectral FFPNet is presented for classifying hyperspectral images. Experiments conducted on ISPRS Vaihingen and ISPRS Potsdam high-resolution datasets demonstrate the competitive segmentation accuracy achieved by the proposed heavy-weight spatial FFPNet. Furthermore, experiments on the Indian Pines and the University of Pavia hyperspectral datasets indicate that the proposed spatial-spectral FFPNet outperforms the current state-of-the-art methods in hyperspectral image classification.

Download Full-text

Classification of Very High-Resolution Remote Sensing Imagery Using a Fully Convolutional Network With Global and Local Context Information Enhancements

IEEE Access ◽

10.1109/access.2020.2964760 ◽

2020 ◽

Vol 8 ◽

pp. 14606-14619

Author(s):

Huanjun Hu ◽

Zheng Li ◽

Lin Li ◽

Hui Yang ◽

Haihong Zhu

Keyword(s):

Remote Sensing ◽

High Resolution ◽

Local Context ◽

Context Information ◽

Convolutional Network ◽

Fully Convolutional Network ◽

Remote Sensing Imagery ◽

Global And Local ◽

Very High

Download Full-text

Dual Attention Feature Fusion and Adaptive Context for Accurate Segmentation of Very High-Resolution Remote Sensing Images

Remote Sensing ◽

10.3390/rs13183715 ◽

2021 ◽

Vol 13 (18) ◽

pp. 3715

Author(s):

Hao Shi ◽

Jiahe Fan ◽

Yupei Wang ◽

Liang Chen

Keyword(s):

Remote Sensing ◽

High Resolution ◽

Land Cover ◽

Feature Fusion ◽

Semantic Segmentation ◽

Land Cover Classification ◽

Contextual Cues ◽

Remote Sensing Images ◽

Object Boundary ◽

Convolutional Network

Land cover classification of high-resolution remote sensing images aims to obtain pixel-level land cover understanding, which is often modeled as semantic segmentation of remote sensing images. In recent years, convolutional network (CNN)-based land cover classification methods have achieved great advancement. However, previous methods fail to generate fine segmentation results, especially for the object boundary pixels. In order to obtain boundary-preserving predictions, we first propose to incorporate spatially adapting contextual cues. In this way, objects with similar appearance can be effectively distinguished with the extracted global contextual cues, which are very helpful to identify pixels near object boundaries. On this basis, low-level spatial details and high-level semantic cues are effectively fused with the help of our proposed dual attention mechanism. Concretely, when fusing multi-level features, we utilize the dual attention feature fusion module based on both spatial and channel attention mechanisms to relieve the influence of the large gap, and further improve the segmentation accuracy of pixels near object boundaries. Extensive experiments were carried out on the ISPRS 2D Semantic Labeling Vaihingen data and GaoFen-2 data to demonstrate the effectiveness of our proposed method. Our method achieves better performance compared with other state-of-the-art methods.

Download Full-text