Reducing the stride of the convolution kernel: a simple and effective strategy to increase the performance of CNN in building extraction from remote sensing image

Author(s):  
Meng Chen ◽  
Jianjun Wu ◽  
Feng Tian

<p>Automatically extracting buildings from remote sensing images (RSI) plays important roles in urban planning, population estimation, disaster emergency response, etc. With the development of deep learning technology, convolutional neural networks (CNN) with better performance than traditional methods have been widely used in extracting buildings from remote sensing imagery (RSI). But it still faces some problems. First of all, low-level features extracted by shallow layers and abstract features extracted by deep layers of the artificial neural network could not be fully fused. it makes building extraction is often inaccurate, especially for buildings with complex structures, irregular shapes and small sizes. Secondly, there are so many parameters that need to be trained in a network, which occupies a lot of computing resources and consumes a lot of time in the training process. By analyzing the structure of the CNN, we found that abstract features extracted by deep layers with low geospatial resolution contain more semantic information. These abstract features are conducive to determine the category of pixels while not sensitive to the boundaries of the buildings. We found the stride of the convolution kernel and pooling operation reduced the geospatial resolution of feature maps, so, this paper proposed a simple and effective strategy—reduce the stride of convolution kernel contains in one of the layers and reduced the number of convolutional kernels to alleviate the above two bottlenecks. This strategy was used to deeplabv3+net and the experimental results for both the WHU Building Dataset and Massachusetts Building Dataset. Compared with the original deeplabv3+net the result showed that this strategy has a better performance. In terms of WHU building data set, the Intersection over Union (IoU) increased by 1.4% and F1 score increased by 0.9%; in terms of Massachusetts Building Dataset, IoU increased by 3.31% and F1 score increased by 2.3%.</p>

2021 ◽  
Vol 13 (2) ◽  
pp. 294
Author(s):  
Meng Chen ◽  
Jianjun Wu ◽  
Leizhen Liu ◽  
Wenhui Zhao ◽  
Feng Tian ◽  
...  

At present, convolutional neural networks (CNN) have been widely used in building extraction from remote sensing imagery (RSI), but there are still some bottlenecks. On the one hand, there are so many parameters in the previous network with complex structure, which will occupy lots of memories and consume much time during training process. On the other hand, low-level features extracted by shallow layers and abstract features extracted by deep layers of artificial neural network cannot be fully fused, which leads to an inaccurate building extraction from RSI. To alleviate these disadvantages, a dense residual neural network (DR-Net) was proposed in this paper. DR-Net uses a deeplabv3+Net encoder/decoder backbone, in combination with densely connected convolution neural network (DCNN) and residual network (ResNet) structure. Compared with deeplabv3+net (containing about 41 million parameters) and BRRNet (containing about 17 million parameters), DR-Net contains about 9 million parameters; So, the number of parameters reduced a lot. The experimental results for both the WHU Building Dataset and Massachusetts Building Dataset, DR-Net show better performance in building extraction than other two state-of-the-art methods. Experiments on WHU building data set showed that Intersection over Union (IoU) increased by 2.4% and F1 score increased by 1.4%; in terms of Massachusetts Building Dataset, IoU increased by 3.8% and F1 score increased by 2.9%.


2021 ◽  
Vol 13 (23) ◽  
pp. 4743
Author(s):  
Wei Yuan ◽  
Wenbo Xu

The segmentation of remote sensing images by deep learning technology is the main method for remote sensing image interpretation. However, the segmentation model based on a convolutional neural network cannot capture the global features very well. A transformer, whose self-attention mechanism can supply each pixel with a global feature, makes up for the deficiency of the convolutional neural network. Therefore, a multi-scale adaptive segmentation network model (MSST-Net) based on a Swin Transformer is proposed in this paper. Firstly, a Swin Transformer is used as the backbone to encode the input image. Then, the feature maps of different levels are decoded separately. Thirdly, the convolution is used for fusion, so that the network can automatically learn the weight of the decoding results of each level. Finally, we adjust the channels to obtain the final prediction map by using the convolution with a kernel of 1 × 1. By comparing this with other segmentation network models on a WHU building data set, the evaluation metrics, mIoU, F1-score and accuracy are all improved. The network model proposed in this paper is a multi-scale adaptive network model that pays more attention to the global features for remote sensing segmentation.


2019 ◽  
Vol 11 (20) ◽  
pp. 2380 ◽  
Author(s):  
Liu ◽  
Luo ◽  
Huang ◽  
Hu ◽  
Sun ◽  
...  

Deep convolutional neural networks have promoted significant progress in building extraction from high-resolution remote sensing imagery. Although most of such work focuses on modifying existing image segmentation networks in computer vision, we propose a new network in this paper, Deep Encoding Network (DE-Net), that is designed for the very problem based on many lately introduced techniques in image segmentation. Four modules are used to construct DE-Net: the inceptionstyle downsampling modules combining a striding convolution layer and a max-pooling layer, the encoding modules comprising six linear residual blocks with a scaled exponential linear unit (SELU) activation function, the compressing modules reducing the feature channels, and a densely upsampling module that enables the network to encode spatial information inside feature maps. Thus, DE-Net achieves stateoftheart performance on the WHU Building Dataset in recall, F1-Score, and intersection over union (IoU) metrics without pretraining. It also outperformed several segmentation networks in our self-built Suzhou Satellite Building Dataset. The experimental results validate the effectiveness of DE-Net on building extraction from aerial imagery and satellite imagery. It also suggests that given enough training data, designing and training a network from scratch may excel fine-tuning models pre-trained on datasets unrelated to building extraction.


2019 ◽  
Vol 11 (16) ◽  
pp. 1897 ◽  
Author(s):  
Yan Zhang ◽  
Weiguo Gong ◽  
Jingxi Sun ◽  
Weihong Li

How to efficiently utilize vast amounts of easily accessed aerial imageries is a critical challenge for researchers with the proliferation of high-resolution remote sensing sensors and platforms. Recently, the rapid development of deep neural networks (DNN) has been a focus in remote sensing, and the networks have achieved remarkable progress in image classification and segmentation tasks. However, the current DNN models inevitably lose the local cues during the downsampling operation. Additionally, even with skip connections, the upsampling methods cannot properly recover the structural information, such as the edge intersections, parallelism, and symmetry. In this paper, we propose the Web-Net, which is a nested network architecture with hierarchical dense connections, to handle these issues. We design the Ultra-Hierarchical Sampling (UHS) block to absorb and fuse the inter-level feature maps to propagate the feature maps among different levels. The position-wise downsampling/upsampling methods in the UHS iteratively change the shape of the inputs while preserving the number of their parameters, so that the low-level local cues and high-level semantic cues are properly preserved. We verify the effectiveness of the proposed Web-Net in the Inria Aerial Dataset and WHU Dataset. The results of the proposed Web-Net achieve an overall accuracy of 96.97% and an IoU (Intersection over Union) of 80.10% on the Inria Aerial Dataset, which surpasses the state-of-the-art SegNet 1.8% and 9.96%, respectively; the results on the WHU Dataset also support the effectiveness of the proposed Web-Net. Additionally, benefitting from the nested network architecture and the UHS block, the extracted buildings on the prediction maps are obviously sharper and more accurately identified, and even the building areas that are covered by shadows can also be correctly extracted. The verified results indicate that the proposed Web-Net is both effective and efficient for building extraction from high-resolution remote sensing images.


2019 ◽  
Vol 85 (10) ◽  
pp. 737-752
Author(s):  
Yihua Tan ◽  
Shengzhou Xiong ◽  
Zhi Li ◽  
Jinwen Tian ◽  
Yansheng Li

The analysis of built-up areas has always been a popular research topic for remote sensing applications. However, automatic extraction of built-up areas from a wide range of regions remains challenging. In this article, a fully convolutional network (FCN)–based strategy is proposed to address built-up area extraction. The proposed algorithm can be divided into two main steps. First, divide the remote sensing image into blocks and extract their deep features by a lightweight multi-branch convolutional neural network (LMB-CNN). Second, rearrange the deep features into feature maps that are fed into a well-designed FCN for image segmentation. Our FCN is integrated with multi-branch blocks and outputs multi-channel segmentation masks that are utilized to balance the false alarm and missing alarm. Experiments demonstrate that the overall classification accuracy of the proposed algorithm can achieve 98.75% in the test data set and that it has a faster processing compared with the existing state-of-the-art algorithms.


Sensor Review ◽  
2019 ◽  
Vol 39 (5) ◽  
pp. 629-635 ◽  
Author(s):  
Haiqing He ◽  
Ting Chen ◽  
Minqiang Chen ◽  
Dajun Li ◽  
Penggen Cheng

Purpose This paper aims to present a novel approach of image super-resolution based on deep–shallow cascaded convolutional neural networks for reconstructing a clear and high-resolution (HR) remote sensing image from a low-resolution (LR) input. Design/methodology/approach The proposed approach directly learns the residuals and mapping between simulated LR and their corresponding HR remote sensing images based on deep and shallow end-to-end convolutional networks instead of assuming any specific restored models. Extra max-pooling and up-sampling are used to achieve a multiscale space by concatenating low- and high-level feature maps, and an HR image is generated by combining LR input and the residual image. This model ensures a strong response to spatially local input patterns by using a large filter and cascaded small filters. The authors adopt a strategy based on epochs to update the learning rate for boosting convergence speed. Findings The proposed deep network is trained to reconstruct high-quality images for low-quality inputs through a simulated dataset, which is generated with Set5, Set14, Berkeley Segmentation Data set and remote sensing images. Experimental results demonstrate that this model considerably enhances remote sensing images in terms of spatial detail and spectral fidelity and outperforms state-of-the-art SR methods in terms of peak signal-to-noise ratio, structural similarity and visual assessment. Originality/value The proposed method can reconstruct an HR remote sensing image from an LR input and significantly improve the quality of remote sensing images in terms of spatial detail and fidelity.


2020 ◽  
Vol 38 (4A) ◽  
pp. 510-514
Author(s):  
Tay H. Shihab ◽  
Amjed N. Al-Hameedawi ◽  
Ammar M. Hamza

In this paper to make use of complementary potential in the mapping of LULC spatial data is acquired from LandSat 8 OLI sensor images are taken in 2019.  They have been rectified, enhanced and then classified according to Random forest (RF) and artificial neural network (ANN) methods. Optical remote sensing images have been used to get information on the status of LULC classification, and extraction details. The classification of both satellite image types is used to extract features and to analyse LULC of the study area. The results of the classification showed that the artificial neural network method outperforms the random forest method. The required image processing has been made for Optical Remote Sensing Data to be used in LULC mapping, include the geometric correction, Image Enhancements, The overall accuracy when using the ANN methods 0.91 and the kappa accuracy was found 0.89 for the training data set. While the overall accuracy and the kappa accuracy of the test dataset were found 0.89 and 0.87 respectively.


Electronics ◽  
2021 ◽  
Vol 10 (3) ◽  
pp. 348
Author(s):  
Choongsang Cho ◽  
Young Han Lee ◽  
Jongyoul Park ◽  
Sangkeun Lee

Semantic image segmentation has a wide range of applications. When it comes to medical image segmentation, its accuracy is even more important than those of other areas because the performance gives useful information directly applicable to disease diagnosis, surgical planning, and history monitoring. The state-of-the-art models in medical image segmentation are variants of encoder-decoder architecture, which is called U-Net. To effectively reflect the spatial features in feature maps in encoder-decoder architecture, we propose a spatially adaptive weighting scheme for medical image segmentation. Specifically, the spatial feature is estimated from the feature maps, and the learned weighting parameters are obtained from the computed map, since segmentation results are predicted from the feature map through a convolutional layer. Especially in the proposed networks, the convolutional block for extracting the feature map is replaced with the widely used convolutional frameworks: VGG, ResNet, and Bottleneck Resent structures. In addition, a bilinear up-sampling method replaces the up-convolutional layer to increase the resolution of the feature map. For the performance evaluation of the proposed architecture, we used three data sets covering different medical imaging modalities. Experimental results show that the network with the proposed self-spatial adaptive weighting block based on the ResNet framework gave the highest IoU and DICE scores in the three tasks compared to other methods. In particular, the segmentation network combining the proposed self-spatially adaptive block and ResNet framework recorded the highest 3.01% and 2.89% improvements in IoU and DICE scores, respectively, in the Nerve data set. Therefore, we believe that the proposed scheme can be a useful tool for image segmentation tasks based on the encoder-decoder architecture.


Author(s):  
Xuewu Zhang ◽  
Yansheng Gong ◽  
Chen Qiao ◽  
Wenfeng Jing

AbstractThis article mainly focuses on the most common types of high-speed railways malfunctions in overhead contact systems, namely, unstressed droppers, foreign-body invasions, and pole number-plate malfunctions, to establish a deep-network detection model. By fusing the feature maps of the shallow and deep layers in the pretraining network, global and local features of the malfunction area are combined to enhance the network's ability of identifying small objects. Further, in order to share the fully connected layers of the pretraining network and reduce the complexity of the model, Tucker tensor decomposition is used to extract features from the fused-feature map. The operation greatly reduces training time. Through the detection of images collected on the Lanxin railway line, experiments result show that the proposed multiview Faster R-CNN based on tensor decomposition had lower miss probability and higher detection accuracy for the three types faults. Compared with object-detection methods YOLOv3, SSD, and the original Faster R-CNN, the average miss probability of the improved Faster R-CNN model in this paper is decreased by 37.83%, 51.27%, and 43.79%, respectively, and average detection accuracy is increased by 3.6%, 9.75%, and 5.9%, respectively.


Sign in / Sign up

Export Citation Format

Share Document