MACU-Net for Semantic Segmentation of Fine-Resolution Remotely Sensed Images

The semantic segmentation of fine-resolution remotely sensed images is an urgent issue in satellite image processing. Solving this problem can help overcome various obstacles in urban planning, land cover classification, and environmental protection, paving the way for scene-level landscape pattern analysis and decision making. Encoder-decoder structures based on attention mechanisms have been frequently used for fine-resolution image segmentation. In this paper, we incorporate a coordinate attention (CA) mechanism, adopt an asymmetric convolution block (ACB), and design a refinement fusion block (RFB), forming a network named the fusion coordinate and asymmetry-based U-Net (FCAU-Net). Furthermore, we propose novel convolutional neural network (CNN) architecture to fully capture long-term dependencies and fine-grained details in fine-resolution remotely sensed imagery. This approach has the following advantages: (1) the CA mechanism embeds position information into a channel attention mechanism to enhance the feature representations produced by the network while effectively capturing position information and channel relationships; (2) the ACB enhances the feature representation ability of the standard convolution layer and captures and refines the feature information in each layer of the encoder; and (3) the RFB effectively integrates low-level spatial information and high-level abstract features to eliminate background noise when extracting feature information, reduces the fitting residuals of the fused features, and improves the ability of the network to capture information flows. Extensive experiments conducted on two public datasets (ZY-3 and DeepGlobe) demonstrate the effectiveness of the FCAU-Net. The proposed FCAU-Net transcends U-Net, Attention U-Net, the pyramid scene parsing network (PSPNet), DeepLab v3+, the multistage attention residual U-Net (MAResU-Net), MACU-Net, and the Transformer U-Net (TransUNet). Specifically, the FCAU-Net achieves a 97.97% (95.05%) pixel accuracy (PA), a 98.53% (91.27%) mean PA (mPA), a 95.17% (85.54%) mean intersection over union (mIoU), and a 96.07% (90.74%) frequency-weighted IoU (FWIoU) on the ZY-3 (DeepGlobe) dataset.

Download Full-text

Fully Convolutional Networks for Semantic Segmentation of Very High Resolution Remotely Sensed Images Combined With DSM

IEEE Geoscience and Remote Sensing Letters ◽

10.1109/lgrs.2018.2795531 ◽

2018 ◽

Vol 15 (3) ◽

pp. 474-478 ◽

Cited By ~ 51

Author(s):

Weiwei Sun ◽

Ruisheng Wang

Keyword(s):

High Resolution ◽

Semantic Segmentation ◽

Remotely Sensed ◽

Convolutional Networks ◽

Fully Convolutional Networks ◽

Remotely Sensed Images ◽

Very High

Download Full-text

ABCNet: Attentive bilateral contextual network for efficient semantic segmentation of Fine-Resolution remotely sensed imagery

ISPRS Journal of Photogrammetry and Remote Sensing ◽

10.1016/j.isprsjprs.2021.09.005 ◽

2021 ◽

Vol 181 ◽

pp. 84-98

Author(s):

Rui Li ◽

Shunyi Zheng ◽

Ce Zhang ◽

Chenxi Duan ◽

Libo Wang ◽

...

Keyword(s):

Semantic Segmentation ◽

Remotely Sensed ◽

Remotely Sensed Imagery ◽

Fine Resolution

Download Full-text

Semantic Segmentation on Remotely-Sensed Images Using Enhanced Global Convolutional Network with Channel Attention and Domain Specific Transfer Learning

10.20944/preprints201812.0090.v1 ◽

2018 ◽

Author(s):

Teerapong Panboonyuen ◽

Kulsawasd Jitkajornwanich ◽

Siam Lawawirojwong ◽

Panu Srestasathiern ◽

Peerapon Vateekul

Keyword(s):

Remote Sensing ◽

High Resolution ◽

Semantic Segmentation ◽

Remotely Sensed ◽

Convolutional Network ◽

Domain Specific ◽

Medium Resolution ◽

Remotely Sensed Images ◽

Very High ◽

Resolution Data

In remote sensing domain, it is crucial to automatically annotate semantics, e.g., river, building, forest, etc, on the raster images. Deep Convolutional Encoder Decoder (DCED) network is the state-of-the-art semantic segmentation for remotely-sensed images. However, the accuracy is still limited, since the network is not designed for remotely sensed images and the training data in this domain is deficient. In this paper, we aim to propose a novel CNN network for semantic segmentation particularly for remote sensing corpora with three main contributions. First, we propose to apply a recent CNN network call ''Global Convolutional Network (GCN)'', since it can capture different resolutions by extracting multi-scale features from different stages of the network. Also, we further enhance the network by improving its backbone using larger numbers of layers, which is suitable for medium resolution remotely sensed images. Second, ''Channel Attention'' is presented into our network in order to select most discriminative filters (features). Third, ''Domain Specific Transfer Learning'' is introduced to alleviate the scarcity issue by utilizing other remotely sensed corpora with different resolutions as pre-trained data. The experiment was then conducted on two given data sets: ($i$) medium resolution data collected from Landsat-8 satellite and ($ii$) very high resolution data called ''ISPRS Vaihingen Challenge Data Set''. The results show that our networks outperformed DCED in terms of $F1$ for 17.48% and 2.49% on medium and very high resolution corpora, respectively.

Download Full-text

Semantic Segmentation on Remotely-Sensed Images Using Enhanced Global Convolutional Network with Channel Attention and Domain Specific Transfer Learning

10.20944/preprints201812.0090.v2 ◽

2018 ◽

Author(s):

Teerapong Panboonyuen ◽

Kulsawasd Jitkajornwanich ◽

Siam Lawawirojwong ◽

Panu Srestasathiern ◽

Peerapon Vateekul

Keyword(s):

Remote Sensing ◽

High Resolution ◽

Semantic Segmentation ◽

Remotely Sensed ◽

Convolutional Network ◽

Domain Specific ◽

Medium Resolution ◽

Remotely Sensed Images ◽

Very High ◽

Resolution Data

In remote sensing domain, it is crucial to annotate semantics, e.g., river, building, forest, etc, on the raster images. Deep Convolutional Encoder Decoder (DCED) network is the state-of-the-art semantic segmentation for remotely-sensed images. However, the accuracy is still limited, since the network is not designed for remotely sensed images and the training data in this domain is deficient. In this paper, we aim to propose a novel CNN for semantic segmentation particularly for remote sensing corpora with three main contributions. First, we propose to apply a recent CNN call ``Global Convolutional Network (GCN)'', since it can capture different resolutions by extracting multi-scale features from different stages of the network. Also, we further enhance the network by improving its backbone using larger numbers of layers, which is suitable for medium resolution remotely sensed images. Second, ``Channel Attention'' is presented into our network in order to select most discriminative filters (features). Third, ``Domain Specific Transfer Learning'' is introduced to alleviate the scarcity issue by utilizing other remotely sensed corpora with different resolutions as pre-trained data. The experiment was then conducted on two given data sets: ($i$) medium resolution data collected from Landsat-8 satellite and ($ii$) very high resolution data called ``ISPRS Vaihingen Challenge Data Set''. The results show that our networks outperformed DCED in terms of $F1$ for 17.48% and 2.49% on medium and very high resolution corpora, respectively.

Download Full-text

Deep Residual Autoencoder with Multiscaling for Semantic Segmentation of Land-Use Images

Remote Sensing ◽

10.3390/rs11182142 ◽

2019 ◽

Vol 11 (18) ◽

pp. 2142 ◽

Cited By ~ 5

Author(s):

Lianfa Li

Keyword(s):

Land Use ◽

Deep Learning ◽

Semantic Segmentation ◽

Remotely Sensed ◽

Convolutional Network ◽

Convolutional Networks ◽

Residual Learning ◽

Fully Convolutional Networks ◽

Remotely Sensed Images ◽

Real World Datasets

Semantic segmentation is a fundamental means of extracting information from remotely sensed images at the pixel level. Deep learning has enabled considerable improvements in efficiency and accuracy of semantic segmentation of general images. Typical models range from benchmarks such as fully convolutional networks, U-Net, Micro-Net, and dilated residual networks to the more recently developed DeepLab 3+. However, many of these models were originally developed for segmentation of general or medical images and videos, and are not directly relevant to remotely sensed images. The studies of deep learning for semantic segmentation of remotely sensed images are limited. This paper presents a novel flexible autoencoder-based architecture of deep learning that makes extensive use of residual learning and multiscaling for robust semantic segmentation of remotely sensed land-use images. In this architecture, a deep residual autoencoder is generalized to a fully convolutional network in which residual connections are implemented within and between all encoding and decoding layers. Compared with the concatenated shortcuts in U-Net, these residual connections reduce the number of trainable parameters and improve the learning efficiency by enabling extensive backpropagation of errors. In addition, resizing or atrous spatial pyramid pooling (ASPP) can be leveraged to capture multiscale information from the input images to enhance the robustness to scale variations. The residual learning and multiscaling strategies improve the trained model’s generalizability, as demonstrated in the semantic segmentation of land-use types in two real-world datasets of remotely sensed images. Compared with U-Net, the proposed method improves the Jaccard index (JI) or the mean intersection over union (MIoU) by 4-11% in the training phase and by 3-9% in the validation and testing phases. With its flexible deep learning architecture, the proposed approach can be easily applied for and transferred to semantic segmentation of land-use variables and other surface variables of remotely sensed images.

Download Full-text

Duplex Restricted Network with Guided Upsampling for the Semantic Segmentation of Remotely Sensed Images

IEEE Access ◽

10.1109/access.2021.3065695 ◽

2021 ◽

pp. 1-1

Author(s):

Xiaoyu Wang ◽

Longxue Liang ◽

Haowen Yan ◽

Xiaosuo Wu ◽

Wanzhen Lu ◽

...

Keyword(s):

Semantic Segmentation ◽

Remotely Sensed ◽

Remotely Sensed Images

Download Full-text

VPRS-Based Regional Decision Fusion of CNN and MRF Classifications for Very Fine Resolution Remotely Sensed Images

IEEE Transactions on Geoscience and Remote Sensing ◽

10.1109/tgrs.2018.2822783 ◽

2018 ◽

Vol 56 (8) ◽

pp. 4507-4521 ◽

Cited By ~ 26

Author(s):

Ce Zhang ◽

Isabel Sargent ◽

Xin Pan ◽

Andy Gardiner ◽

Jonathon Hare ◽

...

Keyword(s):

Remotely Sensed ◽

Decision Fusion ◽

Remotely Sensed Images ◽

Fine Resolution

Download Full-text

Semantic Segmentation on Remotely Sensed Images Using an Enhanced Global Convolutional Network with Channel Attention and Domain Specific Transfer Learning

Remote Sensing ◽

10.3390/rs11010083 ◽

2019 ◽

Vol 11 (1) ◽

pp. 83 ◽

Cited By ~ 23

Author(s):

Teerapong Panboonyuen ◽

Kulsawasd Jitkajornwanich ◽

Siam Lawawirojwong ◽

Panu Srestasathiern ◽

Peerapon Vateekul

Keyword(s):

Remote Sensing ◽

High Resolution ◽

Semantic Segmentation ◽

Remotely Sensed ◽

Convolutional Network ◽

Domain Specific ◽

Medium Resolution ◽

Remotely Sensed Images ◽

Very High ◽

Resolution Data

In the remote sensing domain, it is crucial to complete semantic segmentation on the raster images, e.g., river, building, forest, etc., on raster images. A deep convolutional encoder–decoder (DCED) network is the state-of-the-art semantic segmentation method for remotely sensed images. However, the accuracy is still limited, since the network is not designed for remotely sensed images and the training data in this domain is deficient. In this paper, we aim to propose a novel CNN for semantic segmentation particularly for remote sensing corpora with three main contributions. First, we propose applying a recent CNN called a global convolutional network (GCN), since it can capture different resolutions by extracting multi-scale features from different stages of the network. Additionally, we further enhance the network by improving its backbone using larger numbers of layers, which is suitable for medium resolution remotely sensed images. Second, “channel attention” is presented in our network in order to select the most discriminative filters (features). Third, “domain-specific transfer learning” is introduced to alleviate the scarcity issue by utilizing other remotely sensed corpora with different resolutions as pre-trained data. The experiment was then conducted on two given datasets: (i) medium resolution data collected from Landsat-8 satellite and (ii) very high resolution data called the ISPRS Vaihingen Challenge Dataset. The results show that our networks outperformed DCED in terms of F 1 for 17.48% and 2.49% on medium and very high resolution corpora, respectively.

Download Full-text

Semantic Segmentation on Remotely Sensed Images Using an Enhanced Global Convolutional Network with Channel Attention and Domain Specific Transfer Learning

10.20944/preprints201812.0090.v3 ◽

2019 ◽

Author(s):

Teerapong Panboonyuen ◽

Kulsawasd Jitkajornwanich ◽

Siam Lawawirojwong ◽

Panu Srestasathiern ◽

Peerapon Vateekul

Keyword(s):

Remote Sensing ◽

High Resolution ◽

Semantic Segmentation ◽

Remotely Sensed ◽

Convolutional Network ◽

Domain Specific ◽

Medium Resolution ◽

Remotely Sensed Images ◽

Very High ◽

Resolution Data

In the remote sensing domain, it is crucial to complete semantic segmentation on the raster images, e.g., river, building, forest, etc, on raster images. A deep convolutional encoder--decoder (DCED) network is the state-of-the-art semantic segmentation method for remotely sensed images. However, the accuracy is still limited, since the network is not designed for remotely sensed images and the training data in this domain is deficient. In this paper, we aim to propose a novel CNN for semantic segmentation particularly for remote sensing corpora with three main contributions. First, we propose applying a recent CNN called a global convolutional network (GCN), since it can capture different resolutions by extracting multi-scale features from different stages of the network. Additionally, we further enhance the network by improving its backbone using larger numbers of layers, which is suitable for medium resolution remotely sensed images. Second, "channel attention'' is presented in our network in order to select the most discriminative filters (features). Third, "domain-specific transfer learning'' is introduced to alleviate the scarcity issue by utilizing other remotely sensed corpora with different resolutions as pre-trained data. The experiment was then conducted on two given datasets: (i) medium resolution data collected from Landsat-8 satellite and (ii) very high resolution data called the ISPRS Vaihingen Challenge Dataset. The results show that our networks outperformed DCED in terms of $F1$ for 17.48% and 2.49% on medium and very high resolution corpora, respectively.

Download Full-text