Semi-Supervised Remote Sensing Image Semantic Segmentation via Consistency Regularization and Average Update of Pseudo-Label

Image segmentation has made great progress in recent years, but the annotation required for image segmentation is usually expensive, especially for remote sensing images. To solve this problem, we explore semi-supervised learning methods and appropriately utilize a large amount of unlabeled data to improve the performance of remote sensing image segmentation. This paper proposes a method for remote sensing image segmentation based on semi-supervised learning. We first design a Consistency Regularization (CR) training method for semi-supervised training, then employ the new learned model for Average Update of Pseudo-label (AUP), and finally combine pseudo labels and strong labels to train semantic segmentation network. We demonstrate the effectiveness of the proposed method on three remote sensing datasets, achieving better performance without more labeled data. Extensive experiments show that our semi-supervised method can learn the latent information from the unlabeled data to improve the segmentation performance.

Download Full-text

Efficient Patch-Wise Semantic Segmentation for Large-Scale Remote Sensing Images

Sensors ◽

10.3390/s18103232 ◽

2018 ◽

Vol 18 (10) ◽

pp. 3232 ◽

Cited By ~ 17

Author(s):

Yan Liu ◽

Qirui Ren ◽

Jiahui Geng ◽

Meng Ding ◽

Jiangyun Li

Keyword(s):

Remote Sensing ◽

High Resolution ◽

Large Scale ◽

Semantic Segmentation ◽

Remote Sensing Image ◽

Training Data ◽

Land Resources ◽

Remote Sensing Images ◽

Training Strategy ◽

The Impact

Efficient and accurate semantic segmentation is the key technique for automatic remote sensing image analysis. While there have been many segmentation methods based on traditional hand-craft feature extractors, it is still challenging to process high-resolution and large-scale remote sensing images. In this work, a novel patch-wise semantic segmentation method with a new training strategy based on fully convolutional networks is presented to segment common land resources. First, to handle the high-resolution image, the images are split as local patches and then a patch-wise network is built. Second, training data is preprocessed in several ways to meet the specific characteristics of remote sensing images, i.e., color imbalance, object rotation variations and lens distortion. Third, a multi-scale training strategy is developed to solve the severe scale variation problem. In addition, the impact of conditional random field (CRF) is studied to improve the precision. The proposed method was evaluated on a dataset collected from a capital city in West China with the Gaofen-2 satellite. The dataset contains ten common land resources (Grassland, Road, etc.). The experimental results show that the proposed algorithm achieves 54.96% in terms of mean intersection over union (MIoU) and outperforms other state-of-the-art methods in remote sensing image segmentation.

Download Full-text

Performance Evaluation of Single-Label and Multi-Label Remote Sensing Image Retrieval Using a Dense Labeling Dataset

Remote Sensing ◽

10.3390/rs10060964 ◽

2018 ◽

Vol 10 (6) ◽

pp. 964 ◽

Cited By ~ 34

Author(s):

Zhenfeng Shao ◽

Ke Yang ◽

Weixun Zhou

Keyword(s):

Remote Sensing ◽

Performance Evaluation ◽

Deep Learning ◽

Image Retrieval ◽

Semantic Segmentation ◽

Semantic Content ◽

Remote Sensing Image ◽

Remote Sensing Images ◽

Benchmark Datasets ◽

Feature Based

Benchmark datasets are essential for developing and evaluating remote sensing image retrieval (RSIR) approaches. However, most of the existing datasets are single-labeled, with each image in these datasets being annotated by a single label representing the most significant semantic content of the image. This is sufficient for simple problems, such as distinguishing between a building and a beach, but multiple labels and sometimes even dense (pixel) labels are required for more complex problems, such as RSIR and semantic segmentation.We therefore extended the existing multi-labeled dataset collected for multi-label RSIR and presented a dense labeling remote sensing dataset termed "DLRSD". DLRSD contained a total of 17 classes, and the pixels of each image were assigned with 17 pre-defined labels. We used DLRSD to evaluate the performance of RSIR methods ranging from traditional handcrafted feature-based methods to deep learning-based ones. More specifically, we evaluated the performances of RSIR methods from both single-label and multi-label perspectives. These results demonstrated the advantages of multiple labels over single labels for interpreting complex remote sensing images. DLRSD provided the literature a benchmark for RSIR and other pixel-based problems such as semantic segmentation.

Download Full-text

Adaptive Distance-Weighted Voronoi Tessellation for Remote Sensing Image Segmentation

Remote Sensing ◽

10.3390/rs12244115 ◽

2020 ◽

Vol 12 (24) ◽

pp. 4115

Author(s):

Xiaoli Li ◽

Jinsong Chen ◽

Longlong Zhao ◽

Shanxin Guo ◽

Luyi Sun ◽

...

Keyword(s):

Remote Sensing ◽

Image Segmentation ◽

Noise Immunity ◽

Voronoi Tessellation ◽

Spectral Characteristics ◽

Remote Sensing Image ◽

Effective Characteristics ◽

Spatial Distance ◽

Remote Sensing Images ◽

Distance Weighted

The spatial fragmentation of high-resolution remote sensing images makes the segmentation algorithm put forward a strong demand for noise immunity. However, the stronger the noise immunity, the more serious the loss of detailed information, which easily leads to the neglect of effective characteristics. In view of the difficulty of balancing the noise immunity and effective characteristic retention, an adaptive distance-weighted Voronoi tessellation technology is proposed for remote sensing image segmentation. The distance between pixels and seed points in Voronoi tessellation is established by the adaptive weighting of spatial distance and spectral distance. The weight coefficient used to control the influence intensity of spatial distance is defined by a monotone decreasing function. Following the fuzzy clustering framework, a fuzzy segmentation model with Kullback–Leibler (KL) entropy regularization is established by using multivariate Gaussian distribution to describe the spectral characteristics and Markov Random Field (MRF) to consider the neighborhood effect of sub-regions. Finally, a series of parameter optimization schemes are designed according to parameter characteristics to obtain the optimal segmentation results. The proposed algorithm is validated on many multispectral remote sensing images with five comparing algorithms by qualitative and quantitative analysis. A large number of experiments show that the proposed algorithm can overcome the complex noise as well as better ensure effective characteristics.

Download Full-text

U-net Network for Building Information Extraction of Remote-Sensing Imagery

International Journal of Online and Biomedical Engineering (iJOE) ◽

10.3991/ijoe.v14i12.9335 ◽

2018 ◽

Vol 14 (12) ◽

pp. 179

Author(s):

Jingtan Li ◽

Maolin Xu ◽

Hongling Xiu

Keyword(s):

Remote Sensing ◽

High Resolution ◽

Information Extraction ◽

Image Data ◽

Semantic Segmentation ◽

Remote Sensing Image ◽

Remote Sensing Images ◽

Training Set ◽

Building Information ◽

The Face

With the resolution of remote sensing images is getting higher and higher, high-resolution remote sensing images are widely used in many areas. Among them, image information extraction is one of the basic applications of remote sensing images. In the face of massive high-resolution remote sensing image data, the traditional method of target recognition is difficult to cope with. Therefore, this paper proposes a remote sensing image extraction based on U-net network. Firstly, the U-net semantic segmentation network is used to train the training set, and the validation set is used to verify the training set at the same time, and finally the test set is used for testing. The experimental results show that U-net can be applied to the extraction of buildings.

Download Full-text

A Remote Sensing Image Segmentation Method Based on Fusion Mechanism

Journal of Physics Conference Series ◽

10.1088/1742-6596/2138/1/012016 ◽

2021 ◽

Vol 2138 (1) ◽

pp. 012016

Author(s):

Shuangling Zhu ◽

Guli Nazi·Aili Mujiang ◽

Huxidan Jumahong ◽

Pazi Laiti·Nuer Maiti

Keyword(s):

Remote Sensing ◽

Semantic Segmentation ◽

Remote Sensing Image ◽

Detection Algorithm ◽

Attention Mechanism ◽

Segmentation Method ◽

Remote Sensing Images ◽

Convolutional Network ◽

Input Layer ◽

Basic Network

Abstract A U-Net convolutional network structure is fully capable of completing the end-to-end training with extremely little data, and can achieve better results. When the convolutional network has a short link between a near input layer and a near output layer, it can implement training in a deeper, more accurate and effective way. This paper mainly proposes a high-resolution remote sensing image change detection algorithm based on dense convolutional channel attention mechanism. The detection algorithm uses U-Net network module as the basic network to extract features, combines Dense-Net dense module to enhance U-Net, and introduces dense convolution channel attention mechanism into the basic convolution unit to highlight important features, thus completing semantic segmentation of dense convolutional remote sensing images. Simulation results have verified the effectiveness and robustness of this study.

Download Full-text

Remote Sensing Image Semantic Segmentation Based on Edge Information Guidance

Remote Sensing ◽

10.3390/rs12091501 ◽

2020 ◽

Vol 12 (9) ◽

pp. 1501

Author(s):

Chu He ◽

Shenglin Li ◽

Dehui Xiong ◽

Peizhang Fang ◽

Mingsheng Liao

Keyword(s):

Remote Sensing ◽

Prior Knowledge ◽

Image Data ◽

Semantic Segmentation ◽

Remote Sensing Image ◽

Segmentation Result ◽

Rapid Progress ◽

Remote Sensing Images ◽

Edge Information ◽

Important Field

Semantic segmentation is an important field for automatic processing of remote sensing image data. Existing algorithms based on Convolution Neural Network (CNN) have made rapid progress, especially the Fully Convolution Network (FCN). However, problems still exist when directly inputting remote sensing images to FCN because the segmentation result of FCN is not fine enough, and it lacks guidance for prior knowledge. To obtain more accurate segmentation results, this paper introduces edge information as prior knowledge into FCN to revise the segmentation results. Specifically, the Edge-FCN network is proposed in this paper, which uses the edge information detected by Holistically Nested Edge Detection (HED) network to correct the FCN segmentation results. The experiment results on ESAR dataset and GID dataset demonstrate the validity of Edge-FCN.

Download Full-text

Fully Convolutional Neural Network with Augmented Atrous Spatial Pyramid Pool and Fully Connected Fusion Path for High Resolution Remote Sensing Image Segmentation

Applied Sciences ◽

10.3390/app9091816 ◽

2019 ◽

Vol 9 (9) ◽

pp. 1816 ◽

Cited By ~ 12

Author(s):

Guangsheng Chen ◽

Chao Li ◽

Wei Wei ◽

Weipeng Jing ◽

Marcin Woźniak ◽

...

Keyword(s):

Neural Network ◽

Remote Sensing ◽

Image Segmentation ◽

High Resolution ◽

Semantic Segmentation ◽

Remote Sensing Image ◽

Dilated Convolution ◽

Segmentation Task ◽

Fully Connected ◽

Spatial Pyramid

Recent developments in Convolutional Neural Networks (CNNs) have allowed for the achievement of solid advances in semantic segmentation of high-resolution remote sensing (HRRS) images. Nevertheless, the problems of poor classification of small objects and unclear boundaries caused by the characteristics of the HRRS image data have not been fully considered by previous works. To tackle these challenging problems, we propose an improved semantic segmentation neural network, which adopts dilated convolution, a fully connected (FC) fusion path and pre-trained encoder for the semantic segmentation task of HRRS imagery. The network is built with the computationally-efficient DeepLabv3 architecture, with added Augmented Atrous Spatial Pyramid Pool and FC Fusion Path layers. Dilated convolution enlarges the receptive field of feature points without decreasing the feature map resolution. The improved neural network architecture enhances HRRS image segmentation, reaching the classification accuracy of 91%, and the precision of recognition of small objects is improved. The applicability of the improved model to the remote sensing image segmentation task is verified.

Download Full-text

When Self-Supervised Learning Meets Scene Classification: Remote Sensing Scene Classification Based on a Multitask Learning Framework

Remote Sensing ◽

10.3390/rs12203276 ◽

2020 ◽

Vol 12 (20) ◽

pp. 3276 ◽

Cited By ~ 2

Author(s):

Zhicheng Zhao ◽

Ze Luo ◽

Jian Li ◽

Can Chen ◽

Yingchao Piao

Keyword(s):

Remote Sensing ◽

Supervised Learning ◽

Remote Sensing Image ◽

Multitask Learning ◽

Natural Image ◽

Scene Classification ◽

Remote Sensing Images ◽

Learning Framework ◽

Continuous Progress ◽

Image Datasets

In recent years, the development of convolutional neural networks (CNNs) has promoted continuous progress in scene classification of remote sensing images. Compared with natural image datasets, however, the acquisition of remote sensing scene images is more difficult, and consequently the scale of remote sensing image datasets is generally small. In addition, many problems related to small objects and complex backgrounds arise in remote sensing image scenes, presenting great challenges for CNN-based recognition methods. In this article, to improve the feature extraction ability and generalization ability of such models and to enable better use of the information contained in the original remote sensing images, we introduce a multitask learning framework which combines the tasks of self-supervised learning and scene classification. Unlike previous multitask methods, we adopt a new mixup loss strategy to combine the two tasks with dynamic weight. The proposed multitask learning framework empowers a deep neural network to learn more discriminative features without increasing the amounts of parameters. Comprehensive experiments were conducted on four representative remote sensing scene classification datasets. We achieved state-of-the-art performance, with average accuracies of 94.21%, 96.89%, 99.11%, and 98.98% on the NWPU, AID, UC Merced, and WHU-RS19 datasets, respectively. The experimental results and visualizations show that our proposed method can learn more discriminative features and simultaneously encode orientation information while effectively improving the accuracy of remote sensing scene classification.

Download Full-text

Efficient Transformer for Remote Sensing Image Segmentation

Remote Sensing ◽

10.3390/rs13183585 ◽

2021 ◽

Vol 13 (18) ◽

pp. 3585

Author(s):

Zhiyong Xu ◽

Weicun Zhang ◽

Tianxiang Zhang ◽

Zhifang Yang ◽

Jiangyun Li

Keyword(s):

Remote Sensing ◽

Image Segmentation ◽

Semantic Segmentation ◽

Remote Sensing Image ◽

Edge Classification ◽

Geological Surveys ◽

Object Edge ◽

Disaster Monitoring ◽

Transformer Model ◽

Computation Load

Semantic segmentation for remote sensing images (RSIs) is widely applied in geological surveys, urban resources management, and disaster monitoring. Recent solutions on remote sensing segmentation tasks are generally addressed by CNN-based models and transformer-based models. In particular, transformer-based architecture generally struggles with two main problems: a high computation load and inaccurate edge classification. Therefore, to overcome these problems, we propose a novel transformer model to realize lightweight edge classification. First, based on a Swin transformer backbone, a pure Efficient transformer with mlphead is proposed to accelerate the inference speed. Moreover, explicit and implicit edge enhancement methods are proposed to cope with object edge problems. The experimental results evaluated on the Potsdam and Vaihingen datasets present that the proposed approach significantly improved the final accuracy, achieving a trade-off between computational complexity (Flops) and accuracy (Efficient-L obtaining 3.23% mIoU improvement on Vaihingen and 2.46% mIoU improvement on Potsdam compared with HRCNet_W48). As a result, it is believed that the proposed Efficient transformer will have an advantage in dealing with remote sensing image segmentation problems.

Download Full-text

CCT: Conditional Co-Training for Truly Unsupervised Remote Sensing Image Segmentation in Coastal Areas

Remote Sensing ◽

10.3390/rs13173521 ◽

2021 ◽

Vol 13 (17) ◽

pp. 3521

Author(s):

Bo Fang ◽

Gang Chen ◽

Jifa Chen ◽

Guichong Ouyang ◽

Rong Kou ◽

...

Keyword(s):

Remote Sensing ◽

Image Segmentation ◽

Semantic Segmentation ◽

Remote Sensing Image ◽

Semantic Knowledge ◽

Input Image ◽

Coastal Areas ◽

Learning Technology ◽

Model Framework ◽

High Level

As the fastest growing trend in big data analysis, deep learning technology has proven to be both an unprecedented breakthrough and a powerful tool in many fields, particularly for image segmentation tasks. Nevertheless, most achievements depend on high-quality pre-labeled training samples, which are labor-intensive and time-consuming. Furthermore, different from conventional natural images, coastal remote sensing ones generally carry far more complicated and considerable land cover information, making it difficult to produce pre-labeled references for supervised image segmentation. In our research, motivated by this observation, we take an in-depth investigation on the utilization of neural networks for unsupervised learning and propose a novel method, namely conditional co-training (CCT), specifically for truly unsupervised remote sensing image segmentation in coastal areas. In our idea, a multi-model framework consisting of two parallel data streams, which are superpixel-based over-segmentation and pixel-level semantic segmentation, is proposed to simultaneously perform the pixel-level classification. The former processes the input image into multiple over-segments, providing self-constrained guidance for model training. Meanwhile, with this guidance, the latter continuously processes the input image into multi-channel response maps until the model converges. Incentivized by multiple conditional constraints, our framework learns to extract high-level semantic knowledge and produce full-resolution segmentation maps without pre-labeled ground truths. Compared to the black-box solutions in conventional supervised learning manners, this method is of stronger explainability and transparency for its specific architecture and mechanism. The experimental results on two representative real-world coastal remote sensing datasets of image segmentation and the comparison with other state-of-the-art truly unsupervised methods validate the plausible performance and excellent efficiency of our proposed CCT.

Download Full-text