Class-Wise Fully Convolutional Network for Semantic Segmentation of Remote Sensing Images

Semantic segmentation is a fundamental task in remote sensing image interpretation, which aims to assign a semantic label for every pixel in the given image. Accurate semantic segmentation is still challenging due to the complex distributions of various ground objects. With the development of deep learning, a series of segmentation networks represented by fully convolutional network (FCN) has made remarkable progress on this problem, but the segmentation accuracy is still far from expectations. This paper focuses on the importance of class-specific features of different land cover objects, and presents a novel end-to-end class-wise processing framework for segmentation. The proposed class-wise FCN (C-FCN) is shaped in the form of an encoder-decoder structure with skip-connections, in which the encoder is shared to produce general features for all categories and the decoder is class-wise to process class-specific features. To be detailed, class-wise transition (CT), class-wise up-sampling (CU), class-wise supervision (CS), and class-wise classification (CC) modules are designed to achieve the class-wise transfer, recover the resolution of class-wise feature maps, bridge the encoder and modified decoder, and implement class-wise classifications, respectively. Class-wise and group convolutions are adopted in the architecture with regard to the control of parameter numbers. The method is tested on the public ISPRS 2D semantic labeling benchmark datasets. Experimental results show that the proposed C-FCN significantly improves the segmentation performances compared with many state-of-the-art FCN-based networks, revealing its potentials on accurate segmentation of complex remote sensing images.

Download Full-text

Fully Convolutional Network Method of Semantic Segmentation of Class Imbalance Remote Sensing Images

Acta Optica Sinica ◽

10.3788/aos201939.0428004 ◽

2019 ◽

Vol 39 (4) ◽

pp. 0428004 ◽

Cited By ~ 1

Author(s):

吴止锾 Wu Zhihuan ◽

高永明 Gao Yongming ◽

李磊 Li Lei ◽

薛俊诗 Xue Junshi

Keyword(s):

Remote Sensing ◽

Class Imbalance ◽

Semantic Segmentation ◽

Remote Sensing Images ◽

Convolutional Network ◽

Fully Convolutional Network ◽

Network Method

Download Full-text

Maritime Semantic Labeling of Optical Remote Sensing Images with Multi-Scale Fully Convolutional Network

Remote Sensing ◽

10.3390/rs9050480 ◽

2017 ◽

Vol 9 (5) ◽

pp. 480 ◽

Cited By ~ 37

Author(s):

Haoning Lin ◽

Zhenwei Shi ◽

Zhengxia Zou

Keyword(s):

Remote Sensing ◽

Optical Remote Sensing ◽

Remote Sensing Images ◽

Convolutional Network ◽

Fully Convolutional Network ◽

Multi Scale ◽

Semantic Labeling

Download Full-text

Real-Time Object Detection in Remote Sensing Images Based on Visual Perception and Memory Reasoning

Electronics ◽

10.3390/electronics8101151 ◽

2019 ◽

Vol 8 (10) ◽

pp. 1151 ◽

Cited By ~ 4

Author(s):

Xia Hua ◽

Xinqing Wang ◽

Ting Rui ◽

Dong Wang ◽

Faming Shao

Keyword(s):

Remote Sensing ◽

Visual Perception ◽

Object Detection ◽

Real Time ◽

Detection Accuracy ◽

Small Object ◽

Remote Sensing Images ◽

Feature Maps ◽

Convolutional Network ◽

Fully Convolutional Network

Aiming at the real-time detection of multiple objects and micro-objects in large-scene remote sensing images, a cascaded convolutional neural network real-time object-detection framework for remote sensing images is proposed, which integrates visual perception and convolutional memory network reasoning. The detection framework is composed of two fully convolutional networks, namely, the strengthened object self-attention pre-screening fully convolutional network (SOSA-FCN) and the object accurate detection fully convolutional network (AD-FCN). SOSA-FCN introduces a self-attention module to extract attention feature maps and constructs a depth feature pyramid to optimize the attention feature maps by combining convolutional long-term and short-term memory networks. It guides the acquisition of potential sub-regions of the object in the scene, reduces the computational complexity, and enhances the network’s ability to extract multi-scale object features. It adapts to the complex background and small object characteristics of a large-scene remote sensing image. In AD-FCN, the object mask and object orientation estimation layer are designed to achieve fine positioning of candidate frames. The performance of the proposed algorithm is compared with that of other advanced methods on NWPU_VHR-10, DOTA, UCAS-AOD, and other open datasets. The experimental results show that the proposed algorithm significantly improves the efficiency of object detection while ensuring detection accuracy and has high adaptability. It has extensive engineering application prospects.

Download Full-text

Semantic segmentation of high-resolution remote sensing images using fully convolutional network with adaptive threshold

Connection Science ◽

10.1080/09540091.2018.1510902 ◽

2018 ◽

Vol 31 (2) ◽

pp. 169-184 ◽

Cited By ~ 8

Author(s):

Zhihuan Wu ◽

Yongming Gao ◽

Lei Li ◽

Junshi Xue ◽

Yuntao Li

Keyword(s):

Remote Sensing ◽

High Resolution ◽

Semantic Segmentation ◽

Adaptive Threshold ◽

Remote Sensing Images ◽

Convolutional Network ◽

Fully Convolutional Network

Download Full-text

Performance Evaluation of Single-Label and Multi-Label Remote Sensing Image Retrieval Using a Dense Labeling Dataset

Remote Sensing ◽

10.3390/rs10060964 ◽

2018 ◽

Vol 10 (6) ◽

pp. 964 ◽

Cited By ~ 34

Author(s):

Zhenfeng Shao ◽

Ke Yang ◽

Weixun Zhou

Keyword(s):

Remote Sensing ◽

Performance Evaluation ◽

Deep Learning ◽

Image Retrieval ◽

Semantic Segmentation ◽

Semantic Content ◽

Remote Sensing Image ◽

Remote Sensing Images ◽

Benchmark Datasets ◽

Feature Based

Benchmark datasets are essential for developing and evaluating remote sensing image retrieval (RSIR) approaches. However, most of the existing datasets are single-labeled, with each image in these datasets being annotated by a single label representing the most significant semantic content of the image. This is sufficient for simple problems, such as distinguishing between a building and a beach, but multiple labels and sometimes even dense (pixel) labels are required for more complex problems, such as RSIR and semantic segmentation.We therefore extended the existing multi-labeled dataset collected for multi-label RSIR and presented a dense labeling remote sensing dataset termed "DLRSD". DLRSD contained a total of 17 classes, and the pixels of each image were assigned with 17 pre-defined labels. We used DLRSD to evaluate the performance of RSIR methods ranging from traditional handcrafted feature-based methods to deep learning-based ones. More specifically, we evaluated the performances of RSIR methods from both single-label and multi-label perspectives. These results demonstrated the advantages of multiple labels over single labels for interpreting complex remote sensing images. DLRSD provided the literature a benchmark for RSIR and other pixel-based problems such as semantic segmentation.

Download Full-text

Deep Feature Fusion with Integration of Residual Connection and Attention Model for Classification of VHR Remote Sensing Images

Remote Sensing ◽

10.3390/rs11131617 ◽

2019 ◽

Vol 11 (13) ◽

pp. 1617 ◽

Cited By ~ 5

Author(s):

Jicheng Wang ◽

Li Shen ◽

Wenfan Qiao ◽

Yanshuai Dai ◽

Zhilin Li

Keyword(s):

Remote Sensing ◽

Feature Fusion ◽

Learning Ability ◽

Remote Sensing Images ◽

Convolutional Network ◽

Fully Convolutional Network ◽

Attention Model ◽

Low Level ◽

Deep Feature

The classification of very-high-resolution (VHR) remote sensing images is essential in many applications. However, high intraclass and low interclass variations in these kinds of images pose serious challenges. Fully convolutional network (FCN) models, which benefit from a powerful feature learning ability, have shown impressive performance and great potential. Nevertheless, only classification results with coarse resolution can be obtained from the original FCN method. Deep feature fusion is often employed to improve the resolution of outputs. Existing strategies for such fusion are not capable of properly utilizing the low-level features and considering the importance of features at different scales. This paper proposes a novel, end-to-end, fully convolutional network to integrate a multiconnection ResNet model and a class-specific attention model into a unified framework to overcome these problems. The former fuses multilevel deep features without introducing any redundant information from low-level features. The latter can learn the contributions from different features of each geo-object at each scale. Extensive experiments on two open datasets indicate that the proposed method can achieve class-specific scale-adaptive classification results and it outperforms other state-of-the-art methods. The results were submitted to the International Society for Photogrammetry and Remote Sensing (ISPRS) online contest for comparison with more than 50 other methods. The results indicate that the proposed method (ID: SWJ_2) ranks #1 in terms of overall accuracy, even though no additional digital surface model (DSM) data that were offered by ISPRS were used and no postprocessing was applied.

Download Full-text

Cascaded Residual Attention Enhanced Road Extraction from Remote Sensing Images

ISPRS International Journal of Geo-Information ◽

10.3390/ijgi11010009 ◽

2021 ◽

Vol 11 (1) ◽

pp. 9

Author(s):

Shengfu Li ◽

Cheng Liao ◽

Yulin Ding ◽

Han Hu ◽

Yang Jia ◽

...

Keyword(s):

Remote Sensing ◽

Spatial Information ◽

Semantic Segmentation ◽

Road Extraction ◽

Remote Sensing Images ◽

Long Distance ◽

Features Fusion ◽

Multi Scale ◽

Boundary Recognition ◽

Benchmark Datasets

Efficient and accurate road extraction from remote sensing imagery is important for applications related to navigation and Geographic Information System updating. Existing data-driven methods based on semantic segmentation recognize roads from images pixel by pixel, which generally uses only local spatial information and causes issues of discontinuous extraction and jagged boundary recognition. To address these problems, we propose a cascaded attention-enhanced architecture to extract boundary-refined roads from remote sensing images. Our proposed architecture uses spatial attention residual blocks on multi-scale features to capture long-distance relations and introduce channel attention layers to optimize the multi-scale features fusion. Furthermore, a lightweight encoder-decoder network is connected to adaptively optimize the boundaries of the extracted roads. Our experiments showed that the proposed method outperformed existing methods and achieved state-of-the-art results on the Massachusetts dataset. In addition, our method achieved competitive results on more recent benchmark datasets, e.g., the DeepGlobe and the Huawei Cloud road extraction challenge.

Download Full-text

A Remote Sensing Image Segmentation Method Based on Fusion Mechanism

Journal of Physics Conference Series ◽

10.1088/1742-6596/2138/1/012016 ◽

2021 ◽

Vol 2138 (1) ◽

pp. 012016

Author(s):

Shuangling Zhu ◽

Guli Nazi·Aili Mujiang ◽

Huxidan Jumahong ◽

Pazi Laiti·Nuer Maiti

Keyword(s):

Remote Sensing ◽

Semantic Segmentation ◽

Remote Sensing Image ◽

Detection Algorithm ◽

Attention Mechanism ◽

Segmentation Method ◽

Remote Sensing Images ◽

Convolutional Network ◽

Input Layer ◽

Basic Network

Abstract A U-Net convolutional network structure is fully capable of completing the end-to-end training with extremely little data, and can achieve better results. When the convolutional network has a short link between a near input layer and a near output layer, it can implement training in a deeper, more accurate and effective way. This paper mainly proposes a high-resolution remote sensing image change detection algorithm based on dense convolutional channel attention mechanism. The detection algorithm uses U-Net network module as the basic network to extract features, combines Dense-Net dense module to enhance U-Net, and introduces dense convolution channel attention mechanism into the basic convolution unit to highlight important features, thus completing semantic segmentation of dense convolutional remote sensing images. Simulation results have verified the effectiveness and robustness of this study.

Download Full-text

HQ-ISNet: High-Quality Instance Segmentation for Remote Sensing Imagery

Remote Sensing ◽

10.3390/rs12060989 ◽

2020 ◽

Vol 12 (6) ◽

pp. 989 ◽

Cited By ~ 1

Author(s):

Hao Su ◽

Shunjun Wei ◽

Shan Liu ◽

Jiadian Liang ◽

Chen Wang ◽

...

Keyword(s):

Remote Sensing ◽

High Resolution ◽

Object Detection ◽

Prediction Accuracy ◽

Semantic Segmentation ◽

Remote Sensing Images ◽

Feature Maps ◽

High Quality ◽

Remote Sensing Imagery ◽

Instance Segmentation

Instance segmentation in high-resolution (HR) remote sensing imagery is one of the most challenging tasks and is more difficult than object detection and semantic segmentation tasks. It aims to predict class labels and pixel-wise instance masks to locate instances in an image. However, there are rare methods currently suitable for instance segmentation in the HR remote sensing images. Meanwhile, it is more difficult to implement instance segmentation due to the complex background of remote sensing images. In this article, a novel instance segmentation approach of HR remote sensing imagery based on Cascade Mask R-CNN is proposed, which is called a high-quality instance segmentation network (HQ-ISNet). In this scheme, the HQ-ISNet exploits a HR feature pyramid network (HRFPN) to fully utilize multi-level feature maps and maintain HR feature maps for remote sensing images’ instance segmentation. Next, to refine mask information flow between mask branches, the instance segmentation network version 2 (ISNetV2) is proposed to promote further improvements in mask prediction accuracy. Then, we construct a new, more challenging dataset based on the synthetic aperture radar (SAR) ship detection dataset (SSDD) and the Northwestern Polytechnical University very-high-resolution 10-class geospatial object detection dataset (NWPU VHR-10) for remote sensing images instance segmentation which can be used as a benchmark for evaluating instance segmentation algorithms in the high-resolution remote sensing images. Finally, extensive experimental analyses and comparisons on the SSDD and the NWPU VHR-10 dataset show that (1) the HRFPN makes the predicted instance masks more accurate, which can effectively enhance the instance segmentation performance of the high-resolution remote sensing imagery; (2) the ISNetV2 is effective and promotes further improvements in mask prediction accuracy; (3) our proposed framework HQ-ISNet is effective and more accurate for instance segmentation in the remote sensing imagery than the existing algorithms.

Download Full-text

LaeNet: A Novel Lightweight Multitask CNN for Automatically Extracting Lake Area and Shoreline from Remote Sensing Images

Remote Sensing ◽

10.3390/rs13010056 ◽

2020 ◽

Vol 13 (1) ◽

pp. 56

Author(s):

Wei Liu ◽

Xingyu Chen ◽

Jiangjun Ran ◽

Lin Liu ◽

Qiang Wang ◽

...

Keyword(s):

Remote Sensing ◽

Network Architecture ◽

Semantic Segmentation ◽

Landsat 8 ◽

The Tibetan Plateau ◽

Lake Area ◽

Remote Sensing Images ◽

Feature Maps ◽

Typical Part ◽

Application Fields

Variations of lake area and shoreline can indicate hydrological and climatic changes effectively. Accordingly, how to automatically and simultaneously extract lake area and shoreline from remote sensing images attracts our attention. In this paper, we formulate lake area and shoreline extraction as a multitask learning problem. Different from existing models that take the deep and complex network architecture as the backbone to extract feature maps, we present LaeNet—a novel end-to-end lightweight multitask fully CNN with no-downsampling to automatically extract lake area and shoreline from remote sensing images. Landsat-8 images over Selenco and the vicinity in the Tibetan Plateau are utilized to train and evaluate our model. Experimental results over the testing image patches achieve an Accuracy of 0.9962, Precision of 0.9912, Recall of 0.9982, F1-score of 0.9941, and mIoU of 0.9879, which align with the mainstream semantic segmentation models (UNet, DeepLabV3+, etc.) or even better. Especially, the running time of each epoch and the size of our model are only 6 s and 0.047 megabytes, which achieve a significant reduction compared to the other models. Finally, we conducted fieldwork to collect the in-situ shoreline position for one typical part of lake Selenco, in order to further evaluate the performance of our model. The validation indicates high accuracy in our results (DRMSE: 30.84 m, DMAE: 22.49 m, DSTD: 21.11 m), only about one pixel deviation for Landsat-8 images. LaeNet can be expanded potentially to the tasks of area segmentation and edge extraction in other application fields.

Download Full-text