A Deep Learning-Based Framework for Automated Extraction of Building Footprint Polygons from Very High-Resolution Aerial Imagery

Accurate building footprint polygons provide essential data for a wide range of urban applications. While deep learning models have been proposed to extract pixel-based building areas from remote sensing imagery, the direct vectorization of pixel-based building maps often leads to building footprint polygons with irregular shapes that are inconsistent with real building boundaries, making it difficult to use them in geospatial analysis. In this study, we proposed a novel deep learning-based framework for automated extraction of building footprint polygons (DLEBFP) from very high-resolution aerial imagery by combining deep learning models for different tasks. Our approach uses the U-Net, Cascade R-CNN, and Cascade CNN deep learning models to obtain building segmentation maps, building bounding boxes, and building corners, respectively, from very high-resolution remote sensing images. We used Delaunay triangulation to construct building footprint polygons based on the detected building corners with the constraints of building bounding boxes and building segmentation maps. Experiments on the Wuhan University building dataset and ISPRS Vaihingen dataset indicate that DLEBFP can perform well in extracting high-quality building footprint polygons. Compared with the other semantic segmentation models and the vector map generalization method, DLEBFP is able to achieve comparable mapping accuracies with semantic segmentation models on a pixel basis and generate building footprint polygons with concise edges and vertices with regular shapes that are close to the reference data. The promising performance indicates that our method has the potential to extract accurate building footprint polygons from remote sensing images for applications in geospatial analysis.

Download Full-text

A Review of Remote Sensing Applications on Very High-Resolution Imagery Using Deep Learning-Based Semantic Segmentation Techniques

International Journal of Advanced Engineering Research and Science ◽

10.22161/ijaers.88.29 ◽

2021 ◽

Vol 8 (8) ◽

pp. 238-255

Author(s):

Philipe Borba ◽

Edilson de Souza Bias ◽

Nilton Correia da Silva ◽

Henrique Llacer Roig

Keyword(s):

Remote Sensing ◽

Deep Learning ◽

High Resolution ◽

Semantic Segmentation ◽

Sensing Applications ◽

High Resolution Imagery ◽

Remote Sensing Applications ◽

Very High Resolution Imagery ◽

Very High

Download Full-text

Self-Attention in Reconstruction Bias U-Net for Semantic Segmentation of Building Rooftops in Optical Remote Sensing Images

Remote Sensing ◽

10.3390/rs13132524 ◽

2021 ◽

Vol 13 (13) ◽

pp. 2524

Author(s):

Ziyi Chen ◽

Dilong Li ◽

Wentao Fan ◽

Haiyan Guan ◽

Cheng Wang ◽

...

Keyword(s):

Remote Sensing ◽

Deep Learning ◽

Semantic Segmentation ◽

Extraction Methods ◽

The Self ◽

Optical Remote Sensing ◽

Building Extraction ◽

Learning Models ◽

Remote Sensing Images ◽

Segmentation Methods

Deep learning models have brought great breakthroughs in building extraction from high-resolution optical remote-sensing images. Among recent research, the self-attention module has called up a storm in many fields, including building extraction. However, most current deep learning models loading with the self-attention module still lose sight of the reconstruction bias’s effectiveness. Through tipping the balance between the abilities of encoding and decoding, i.e., making the decoding network be much more complex than the encoding network, the semantic segmentation ability will be reinforced. To remedy the research weakness in combing self-attention and reconstruction-bias modules for building extraction, this paper presents a U-Net architecture that combines self-attention and reconstruction-bias modules. In the encoding part, a self-attention module is added to learn the attention weights of the inputs. Through the self-attention module, the network will pay more attention to positions where there may be salient regions. In the decoding part, multiple large convolutional up-sampling operations are used for increasing the reconstruction ability. We test our model on two open available datasets: the WHU and Massachusetts Building datasets. We achieve IoU scores of 89.39% and 73.49% for the WHU and Massachusetts Building datasets, respectively. Compared with several recently famous semantic segmentation methods and representative building extraction methods, our method’s results are satisfactory.

Download Full-text

Method of Improving Instance Segmentation for Very High Resolution Remote Sensing Imagery Using Deep Learning

Communications in Computer and Information Science - Data Stream Mining & Processing ◽

10.1007/978-3-030-61656-4_21 ◽

2020 ◽

pp. 323-333

Author(s):

Volodymyr Hnatushenko ◽

Vadym Zhernovyi

Keyword(s):

Remote Sensing ◽

Deep Learning ◽

High Resolution ◽

Remote Sensing Imagery ◽

Very High ◽

Instance Segmentation

Download Full-text

Introducing AIDE: a Software Suite for Annotating Images with Deep and Active Learning Assistance

10.5194/egusphere-egu21-12065 ◽

2021 ◽

Author(s):

Benjamin Kellenberger ◽

Devis Tuia ◽

Dan Morris

Keyword(s):

Deep Learning ◽

Active Learning ◽

Expert Knowledge ◽

Semantic Segmentation ◽

Third Party ◽

Learning Models ◽

Web Browser ◽

Web Based ◽

Model Training ◽

Bounding Boxes

Ecological research like wildlife censuses increasingly relies on data on the scale of Terabytes. For example, modern camera trap datasets contain millions of images that require prohibitive amounts of manual labour to be annotated with species, bounding boxes, and the like. Machine learning, especially deep learning [3], could greatly accelerate this task through automated predictions, but involves expansive coding and expert knowledge.In this abstract we present AIDE, the Annotation Interface for Data-driven Ecology [2]. In a first instance, AIDE is a web-based annotation suite for image labelling with support for concurrent access and scalability, up to the cloud. In a second instance, it tightly integrates deep learning models into the annotation process through active learning [7], where models learn from user-provided labels and in turn select the most relevant images for review from the large pool of unlabelled ones (Fig. 1). The result is a system where users only need to label what is required, which saves time and decreases errors due to fatigue.<img src="https://contentmanager.copernicus.org/fileStorageProxy.php?f=gnp.0402be60f60062057601161/sdaolpUECMynit/12UGE&app=m&a=0&c=131251398e575ac9974634bd0861fadc&ct=x&pn=gnp.elif&d=1" alt="">Fig. 1: AIDE offers concurrent web image labelling support and uses annotations and deep learning models in an active learning loop.AIDE includes a comprehensive set of built-in models, such as ResNet [1] for image classification, Faster R-CNN [5] and RetinaNet [4] for object detection, and U-Net [6] for semantic segmentation. All models can be customised and used without having to write a single line of code. Furthermore, AIDE accepts any third-party model with minimal implementation requirements. To complete the package, AIDE offers both user annotation and model prediction evaluation, access control, customisable model training, and more, all through the web browser.AIDE is fully open source and available under https://github.com/microsoft/aerial_wildlife_detection.&#160;References

Download Full-text

Road Extraction from Very-High-Resolution Remote Sensing Images via a Nested SE-Deeplab Model

Remote Sensing ◽

10.3390/rs12182985 ◽

2020 ◽

Vol 12 (18) ◽

pp. 2985 ◽

Cited By ~ 1

Author(s):

Yeneng Lin ◽

Dongyun Xu ◽

Nan Wang ◽

Zhou Shi ◽

Qiuxiao Chen

Keyword(s):

Remote Sensing ◽

Deep Learning ◽

High Resolution ◽

Network Models ◽

Road Extraction ◽

Remote Sensing Images ◽

Proposed Model ◽

Wide Range ◽

Network Modules ◽

Very High

Automatic road extraction from very-high-resolution remote sensing images has become a popular topic in a wide range of fields. Convolutional neural networks are often used for this purpose. However, many network models do not achieve satisfactory extraction results because of the elongated nature and varying sizes of roads in images. To improve the accuracy of road extraction, this paper proposes a deep learning model based on the structure of Deeplab v3. It incorporates squeeze-and-excitation (SE) module to apply weights to different feature channels, and performs multi-scale upsampling to preserve and fuse shallow and deep information. To solve the problems associated with unbalanced road samples in images, different loss functions and backbone network modules are tested in the model’s training process. Compared with cross entropy, dice loss can improve the performance of the model during training and prediction. The SE module is superior to ResNext and ResNet in improving the integrity of the extracted roads. Experimental results obtained using the Massachusetts Roads Dataset show that the proposed model (Nested SE-Deeplab) improves F1-Score by 2.4% and Intersection over Union by 2.0% compared with FC-DenseNet. The proposed model also achieves better segmentation accuracy in road extraction compared with other mainstream deep-learning models including Deeplab v3, SegNet, and UNet.

Download Full-text