Semantic Segmentation of Aerial Imagery via Split-Attention Networks with Disentangled Nonlocal and Edge Supervision

In this work, we propose a new deep convolution neural network (DCNN) architecture for semantic segmentation of aerial imagery. Taking advantage of recent research, we use split-attention networks (ResNeSt) as the backbone for high-quality feature expression. Additionally, a disentangled nonlocal (DNL) block is integrated into our pipeline to express the inter-pixel long-distance dependence and highlight the edge pixels simultaneously. Moreover, the depth-wise separable convolution and atrous spatial pyramid pooling (ASPP) modules are combined to extract and fuse multiscale contextual features. Finally, an auxiliary edge detection task is designed to provide edge constraints for semantic segmentation. Evaluation of algorithms is conducted on two benchmarks provided by the International Society for Photogrammetry and Remote Sensing (ISPRS). Extensive experiments demonstrate the effectiveness of each module of our architecture. Precision evaluation based on the Potsdam benchmark shows that the proposed DCNN achieves competitive performance over the state-of-the-art methods.

Download Full-text

Real-Time Semantic Segmentation Network Based on Lite Reduced Atrous Spatial Pyramid Pooling Module Group

2020 5th International Conference on Control, Robotics and Cybernetics (CRC) ◽

10.1109/crc51253.2020.9253492 ◽

2020 ◽

Author(s):

Yangsheng Tian ◽

Fangyuan Chen ◽

Haihui Wang ◽

Shuiping Zhang

Keyword(s):

Real Time ◽

Semantic Segmentation ◽

Spatial Pyramid Pooling ◽

Spatial Pyramid

Download Full-text

Mapping the Unseen: Exploiting Super-Resolution for Semantic Segmentation in Low-Resolution Images

10.5753/sibgrapi.est.2020.12987 ◽

2020 ◽

Author(s):

Matheus B. Pereira ◽

Jefersson Alex Dos Santos

Keyword(s):

Remote Sensing ◽

Pattern Recognition ◽

Super Resolution ◽

Remote Sensing Data ◽

Semantic Segmentation ◽

The Other ◽

Aerial Imagery ◽

Aerial Images ◽

Remote Sensing Images ◽

Low Resolution

High-resolution aerial images are usually not accessible or affordable. On the other hand, low-resolution remote sensing data is easily found in public open repositories. The problem is that the low-resolution representation can compromise pattern recognition algorithms, especially semantic segmentation. In this M.Sc. dissertation1 , we design two frameworks in order to evaluate the effectiveness of super-resolution in the semantic segmentation of low-resolution remote sensing images. We carried out an extensive set of experiments on different remote sensing datasets. The results show that super-resolution is effective to improve semantic segmentation performance on low-resolution aerial imagery, outperforming unsupervised interpolation and achieving semantic segmentation results comparable to highresolution data.

Download Full-text

Continual Learning With Structured Inheritance for Semantic Segmentation in Aerial Imagery

IEEE Transactions on Geoscience and Remote Sensing ◽

10.1109/tgrs.2021.3076664 ◽

2021 ◽

pp. 1-17

Author(s):

Yingchao Feng ◽

Xian Sun ◽

Wenhui Diao ◽

Jihao Li ◽

Xin Gao ◽

...

Keyword(s):

Semantic Segmentation ◽

Aerial Imagery ◽

Continual Learning

Download Full-text

Copy move and splicing forgery detection using deep convolution neural network, and semantic segmentation

Multimedia Tools and Applications ◽

10.1007/s11042-020-09816-3 ◽

2020 ◽

Author(s):

Abhishek ◽

Neeru Jindal

Keyword(s):

Neural Network ◽

Semantic Segmentation ◽

Convolution Neural Network ◽

Forgery Detection ◽

Deep Convolution Neural Network

Download Full-text

Developing a multi-filter convolutional neural network for semantic segmentation using high-resolution aerial imagery and LiDAR data

ISPRS Journal of Photogrammetry and Remote Sensing ◽

10.1016/j.isprsjprs.2018.06.005 ◽

2018 ◽

Vol 143 ◽

pp. 3-14 ◽

Cited By ~ 22

Author(s):

Ying Sun ◽

Xinchang Zhang ◽

Qinchuan Xin ◽

Jianfeng Huang

Keyword(s):

Neural Network ◽

High Resolution ◽

Convolutional Neural Network ◽

Semantic Segmentation ◽

Aerial Imagery ◽

Lidar Data

Download Full-text

Multiscale Road Extraction in Remote Sensing Images

Computational Intelligence and Neuroscience ◽

10.1155/2019/2373798 ◽

2019 ◽

Vol 2019 ◽

pp. 1-9 ◽

Cited By ~ 4

Author(s):

Aziguli Wulamu ◽

Zuxian Shi ◽

Dezheng Zhang ◽

Zheyu He

Keyword(s):

Remote Sensing ◽

Network Architecture ◽

Semantic Segmentation ◽

Road Extraction ◽

Remote Sensing Images ◽

The Road ◽

Proposed Model ◽

Different Types ◽

Spatial Pyramid Pooling ◽

The One

Recent advances in convolutional neural networks (CNNs) have shown impressive results in semantic segmentation. Among the successful CNN-based methods, U-Net has achieved exciting performance. In this paper, we proposed a novel network architecture based on U-Net and atrous spatial pyramid pooling (ASPP) to deal with the road extraction task in the remote sensing field. On the one hand, U-Net structure can effectively extract valuable features; on the other hand, ASPP is able to utilize multiscale context information in remote sensing images. Compared to the baseline, this proposed model has improved the pixelwise mean Intersection over Union (mIoU) of 3 points. Experimental results show that the proposed network architecture can deal with different types of road surface extraction tasks under various terrains in Yinchuan city, solve the road connectivity problem to some extent, and has certain tolerance to shadows and occlusion.

Download Full-text

Expectation-Maximization Attention Networks for Semantic Segmentation

2019 IEEE/CVF International Conference on Computer Vision (ICCV) ◽

10.1109/iccv.2019.00926 ◽

2019 ◽

Cited By ~ 25

Author(s):

Xia Li ◽

Zhisheng Zhong ◽

Jianlong Wu ◽

Yibo Yang ◽

Zhouchen Lin ◽

...

Keyword(s):

Expectation Maximization ◽

Semantic Segmentation ◽

Attention Networks

Download Full-text

DIANet: Dense-and-Implicit Attention Network

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i04.5842 ◽

2020 ◽

Vol 34 (04) ◽

pp. 4206-4214

Author(s):

Zhongzhan Huang ◽

Senwei Liang ◽

Mingfu Liang ◽

Haizhao Yang

Keyword(s):

Classification Accuracy ◽

Short Term Memory ◽

Residual Network ◽

Short Term ◽

Long Distance ◽

Term Memory ◽

Attention Networks ◽

Network Layers ◽

Benchmark Datasets ◽

Long Short Term Memory

Attention networks have successfully boosted the performance in various vision problems. Previous works lay emphasis on designing a new attention module and individually plug them into the networks. Our paper proposes a novel-and-simple framework that shares an attention module throughout different network layers to encourage the integration of layer-wise information and this parameter-sharing module is referred to as Dense-and-Implicit-Attention (DIA) unit. Many choices of modules can be used in the DIA unit. Since Long Short Term Memory (LSTM) has a capacity of capturing long-distance dependency, we focus on the case when the DIA unit is the modified LSTM (called DIA-LSTM). Experiments on benchmark datasets show that the DIA-LSTM unit is capable of emphasizing layer-wise feature interrelation and leads to significant improvement of image classification accuracy. We further empirically show that the DIA-LSTM has a strong regularization ability on stabilizing the training of deep networks by the experiments with the removal of skip connections (He et al. 2016a) or Batch Normalization (Ioffe and Szegedy 2015) in the whole residual network.

Download Full-text

Adaptive Context Encoding Module for Semantic Segmentation

Electronic Imaging ◽

10.2352/issn.2470-1173.2020.10.ipas-027 ◽

2020 ◽

Vol 2020 (10) ◽

pp. 27-1-27-7

Author(s):

Congcong Wang ◽

Faouzi Alaya Cheikh ◽

Azeddine Beghdadi ◽

Ole Jakob Elle

Keyword(s):

Neural Networks ◽

State Of The Art ◽

Experimental Studies ◽

Semantic Segmentation ◽

Multiple Scale ◽

Context Information ◽

Convolution Operation ◽

Sampling Locations ◽

Spatial Pyramid Pooling ◽

Spatial Pyramid

The object sizes in images are diverse, therefore, capturing multiple scale context information is essential for semantic segmentation. Existing context aggregation methods such as pyramid pooling module (PPM) and atrous spatial pyramid pooling (ASPP) employ different pooling size or atrous rate, such that multiple scale information is captured. However, the pooling sizes and atrous rates are chosen empirically. Rethinking of ASPP leads to our observation that learnable sampling locations of the convolution operation can endow the network learnable fieldof- view, thus the ability of capturing object context information adaptively. Following this observation, in this paper, we propose an adaptive context encoding (ACE) module based on deformable convolution operation where sampling locations of the convolution operation are learnable. Our ACE module can be embedded into other Convolutional Neural Networks (CNNs) easily for context aggregation. The effectiveness of the proposed module is demonstrated on Pascal-Context and ADE20K datasets. Although our proposed ACE only consists of three deformable convolution blocks, it outperforms PPM and ASPP in terms of mean Intersection of Union (mIoU) on both datasets. All the experimental studies confirm that our proposed module is effective compared to the state-of-the-art methods.

Download Full-text

Cascaded Residual Attention Enhanced Road Extraction from Remote Sensing Images

ISPRS International Journal of Geo-Information ◽

10.3390/ijgi11010009 ◽

2021 ◽

Vol 11 (1) ◽

pp. 9

Author(s):

Shengfu Li ◽

Cheng Liao ◽

Yulin Ding ◽

Han Hu ◽

Yang Jia ◽

...

Keyword(s):

Remote Sensing ◽

Spatial Information ◽

Semantic Segmentation ◽

Road Extraction ◽

Remote Sensing Images ◽

Long Distance ◽

Features Fusion ◽

Multi Scale ◽

Boundary Recognition ◽

Benchmark Datasets

Efficient and accurate road extraction from remote sensing imagery is important for applications related to navigation and Geographic Information System updating. Existing data-driven methods based on semantic segmentation recognize roads from images pixel by pixel, which generally uses only local spatial information and causes issues of discontinuous extraction and jagged boundary recognition. To address these problems, we propose a cascaded attention-enhanced architecture to extract boundary-refined roads from remote sensing images. Our proposed architecture uses spatial attention residual blocks on multi-scale features to capture long-distance relations and introduce channel attention layers to optimize the multi-scale features fusion. Furthermore, a lightweight encoder-decoder network is connected to adaptively optimize the boundaries of the extracted roads. Our experiments showed that the proposed method outperformed existing methods and achieved state-of-the-art results on the Massachusetts dataset. In addition, our method achieved competitive results on more recent benchmark datasets, e.g., the DeepGlobe and the Huawei Cloud road extraction challenge.

Download Full-text