Automatic building extraction from oblique aerial images

Deep-learning technologies, especially convolutional neural networks (CNNs), have achieved great success in building extraction from areal images. However, shape details are often lost during the down-sampling process, which results in discontinuous segmentation or inaccurate segmentation boundary. In order to compensate for the loss of shape information, two shape-related auxiliary tasks (i.e., boundary prediction and distance estimation) were jointly learned with building segmentation task in our proposed network. Meanwhile, two consistency constraint losses were designed based on the multi-task network to exploit the duality between the mask prediction and two shape-related information predictions. Specifically, an atrous spatial pyramid pooling (ASPP) module was appended to the top of the encoder of a U-shaped network to obtain multi-scale features. Based on the multi-scale features, one regression loss and two classification losses were used for predicting the distance-transform map, segmentation, and boundary. Two inter-task consistency-loss functions were constructed to ensure the consistency between distance maps and masks, and the consistency between masks and boundary maps. Experimental results on three public aerial image data sets showed that our method achieved superior performance over the recent state-of-the-art models.

Download Full-text

Multiscale Semantic Feature Optimization and Fusion Network for Building Extraction Using High-Resolution Aerial Images and LiDAR Data

Remote Sensing ◽

10.3390/rs13132473 ◽

2021 ◽

Vol 13 (13) ◽

pp. 2473

Author(s):

Qinglie Yuan ◽

Helmi Zulhaidi Mohd Shafri ◽

Aidi Hizami Alias ◽

Shaiful Jahari Hashim

Keyword(s):

High Resolution ◽

Large Scale ◽

Spatial Information ◽

Feature Fusion ◽

Aerial Images ◽

Semantic Gap ◽

Superior Performance ◽

Lidar Data ◽

Building Extraction ◽

Hierarchical Features

Automatic building extraction has been applied in many domains. It is also a challenging problem because of the complex scenes and multiscale. Deep learning algorithms, especially fully convolutional neural networks (FCNs), have shown robust feature extraction ability than traditional remote sensing data processing methods. However, hierarchical features from encoders with a fixed receptive field perform weak ability to obtain global semantic information. Local features in multiscale subregions cannot construct contextual interdependence and correlation, especially for large-scale building areas, which probably causes fragmentary extraction results due to intra-class feature variability. In addition, low-level features have accurate and fine-grained spatial information for tiny building structures but lack refinement and selection, and the semantic gap of across-level features is not conducive to feature fusion. To address the above problems, this paper proposes an FCN framework based on the residual network and provides the training pattern for multi-modal data combining the advantage of high-resolution aerial images and LiDAR data for building extraction. Two novel modules have been proposed for the optimization and integration of multiscale and across-level features. In particular, a multiscale context optimization module is designed to adaptively generate the feature representations for different subregions and effectively aggregate global context. A semantic guided spatial attention mechanism is introduced to refine shallow features and alleviate the semantic gap. Finally, hierarchical features are fused via the feature pyramid network. Compared with other state-of-the-art methods, experimental results demonstrate superior performance with 93.19 IoU, 97.56 OA on WHU datasets and 94.72 IoU, 97.84 OA on the Boston dataset, which shows that the proposed network can improve accuracy and achieve better performance for building extraction.

Download Full-text

Automatic building extraction from high-resolution aerial images and LiDAR data using gated residual refinement network

ISPRS Journal of Photogrammetry and Remote Sensing ◽

10.1016/j.isprsjprs.2019.02.019 ◽

2019 ◽

Vol 151 ◽

pp. 91-105 ◽

Cited By ~ 29

Author(s):

Jianfeng Huang ◽

Xinchang Zhang ◽

Qinchuan Xin ◽

Ying Sun ◽

Pengcheng Zhang

Keyword(s):

High Resolution ◽

Aerial Images ◽

Lidar Data ◽

Building Extraction

Download Full-text

Building extraction from stereoscopic aerial images

Applied Optics ◽

10.1364/ao.43.000218 ◽

2004 ◽

Vol 43 (2) ◽

pp. 218 ◽

Cited By ~ 8

Author(s):

Hélène Oriot ◽

Alain Michel

Keyword(s):

Aerial Images ◽

Building Extraction

Download Full-text

Automatic high-rise building extraction from aerial images

Fifth World Congress on Intelligent Control and Automation (IEEE Cat. No.04EX788) ◽

10.1109/wcica.2004.1343093 ◽

2004 ◽

Cited By ~ 1

Author(s):

Liang Tang ◽

Weixin Xie ◽

Jianjun Hang

Keyword(s):

Aerial Images ◽

Building Extraction ◽

High Rise ◽

High Rise Building

Download Full-text

Semantic Segmentation of Urban Buildings Using a High-Resolution Network (HRNet) with Channel and Spatial Attention Gates

Remote Sensing ◽

10.3390/rs13163087 ◽

2021 ◽

Vol 13 (16) ◽

pp. 3087

Author(s):

Seonkyeong Seong ◽

Jaewan Choi

Keyword(s):

Deep Learning ◽

High Resolution ◽

Spatial Attention ◽

Semantic Segmentation ◽

Aerial Images ◽

Building Extraction ◽

Learning Models ◽

Urban Buildings

In this study, building extraction in aerial images was performed using csAG-HRNet by applying HRNet-v2 in combination with channel and spatial attention gates. HRNet-v2 consists of transition and fusion processes based on subnetworks according to various resolutions. The channel and spatial attention gates were applied in the network to efficiently learn important features. A channel attention gate assigns weights in accordance with the importance of each channel, and a spatial attention gate assigns weights in accordance with the importance of each pixel position for the entire channel. In csAG-HRNet, csAG modules consisting of a channel attention gate and a spatial attention gate were applied to each subnetwork of stage and fusion modules in the HRNet-v2 network. In experiments using two datasets, it was confirmed that csAG-HRNet could minimize false detections based on the shapes of large buildings and small nonbuilding objects compared to existing deep learning models.

Download Full-text

Building Extraction from Stereo Pairs of Aerial Images: Accuracy and Productivity Constraint of a Topographic Production Line

Automatic Extraction of Man-Made Objects from Aerial and Space Images ◽

10.1007/978-3-0348-9242-1_22 ◽

1995 ◽

pp. 231-240 ◽

Cited By ~ 2

Author(s):

Olivier Jamet ◽

Olivier Dissard ◽

Sylvain Airault

Keyword(s):

Production Line ◽

Aerial Images ◽

Building Extraction

Download Full-text

Building Instance Change Detection from Large-Scale Aerial Images using Convolutional Neural Networks and Simulated Samples

Remote Sensing ◽

10.3390/rs11111343 ◽

2019 ◽

Vol 11 (11) ◽

pp. 1343 ◽

Cited By ~ 11

Author(s):

Shunping Ji ◽

Yanyun Shen ◽

Meng Lu ◽

Yongjun Zhang

Keyword(s):

Deep Learning ◽

Change Detection ◽

Urban Areas ◽

Large Scale ◽

Real Life ◽

Aerial Images ◽

Building Extraction ◽

Object Based ◽

Wide Range ◽

Change Map

We present a novel convolutional neural network (CNN)-based change detection framework for locating changed building instances as well as changed building pixels from very high resolution (VHR) aerial images. The distinctive advantage of the framework is the self-training ability, which is highly important in deep-learning-based change detection in practice, as high-quality samples of changes are always lacking for training a successful deep learning model. The framework consists two parts: a building extraction network to produce a binary building map and a building change detection network to produce a building change map. The building extraction network is implemented with two widely used structures: a Mask R-CNN for object-based instance segmentation, and a multi-scale full convolutional network for pixel-based semantic segmentation. The building change detection network takes bi-temporal building maps produced from the building extraction network as input and outputs a building change map at the object and pixel levels. By simulating arbitrary building changes and various building parallaxes in the binary building map, the building change detection network is well trained without real-life samples. This greatly lowers the requirements of labeled changed buildings, and guarantees the algorithm’s robustness to registration errors caused by parallaxes. To evaluate the proposed method, we chose a wide range of urban areas from an open-source dataset as training and testing areas, and both pixel-based and object-based model evaluation measures were used. Experiments demonstrated our approach was vastly superior: without using any real change samples, it reached 63% average precision (AP) at the object (building instance) level. In contrast, with adequate training samples, other methods—including the most recent CNN-based and generative adversarial network (GAN)-based ones—have only reached 25% AP in their best cases.

Download Full-text

Automatic Building Extraction from Aerial Images

Computer Vision and Image Understanding ◽

10.1006/cviu.1998.0731 ◽

1998 ◽

Vol 72 (2) ◽

pp. 99-100 ◽

Cited By ~ 17

Author(s):

A. Gruen ◽

R. Nevatia

Keyword(s):

Aerial Images ◽

Building Extraction

Download Full-text

Boundary-Aware Refined Network for Automatic Building Extraction in Very High-Resolution Urban Aerial Images

Remote Sensing ◽

10.3390/rs13040692 ◽

2021 ◽

Vol 13 (4) ◽

pp. 692

Author(s):

Yuwei Jin ◽

Wenbo Xu ◽

Ce Zhang ◽

Xin Luo ◽

Haitao Jia

Keyword(s):

High Resolution ◽

Large Scale ◽

Aerial Images ◽

Data Sets ◽

Building Extraction ◽

Visual Interpretation ◽

Urban Scenes ◽

Multi Scale ◽

Spatial Pyramid Pooling ◽

Very High

Convolutional Neural Networks (CNNs), such as U-Net, have shown competitive performance in the automatic extraction of buildings from Very High-Resolution (VHR) aerial images. However, due to the unstable multi-scale context aggregation, the insufficient combination of multi-level features and the lack of consideration of the semantic boundary, most existing CNNs produce incomplete segmentation for large-scale buildings and result in predictions with huge uncertainty at building boundaries. This paper presents a novel network with a special boundary-aware loss embedded, called the Boundary-Aware Refined Network (BARNet), to address the gap above. The unique properties of the proposed BARNet are the gated-attention refined fusion unit, the denser atrous spatial pyramid pooling module, and the boundary-aware loss. The performance of the BARNet is tested on two popular data sets that include various urban scenes and diverse patterns of buildings. Experimental results demonstrate that the proposed method outperforms several state-of-the-art approaches in both visual interpretation and quantitative evaluations.

Download Full-text