Identifying Damaged Buildings in Aerial Images Using the Object Detection Method

The collapse of buildings caused by the earthquake seriously threatened human lives and safety. So, the quick detection of collapsed buildings from post-earthquake images is essential for disaster relief and disaster damage assessment. Compared with the traditional building extraction methods, the methods based on convolutional neural networks perform better because it can automatically extract high-dimensional abstract features from images. However, there are still many problems with deep learning in the extraction of collapsed buildings. For example, due to the complex scenes after the earthquake, the collapsed buildings are easily confused with the background, so it is difficult to fully use the multiple features extracted by collapsed buildings, which leads to time consumption and low accuracy of collapsed buildings extraction when training the model. In addition, model training is prone to overfitting, which reduces the performance of model migration. This paper proposes to use the improved classic version of the you only look once model (YOLOv4) to detect collapsed buildings from the post-earthquake aerial images. Specifically, the k-means algorithm is used to optimally select the number and size of anchors from the image. We replace the Resblock in CSPDarkNet53 in YOLOv4 with the ResNext block to improve the backbone’s ability and the performance of classification. Furthermore, to replace the loss function of YOLOv4 with the Focal-EOIU loss function. The result shows that compared with the original YOLOv4 model, our proposed method can extract collapsed buildings more accurately. The AP (average precision) increased from 88.23% to 93.76%. The detection speed reached 32.7 f/s. Our method not only improves the accuracy but also enhances the detection speed of the collapsed buildings. Moreover, providing a basis for the detection of large-scale collapsed buildings in the future.

Download Full-text

Multiscale Semantic Feature Optimization and Fusion Network for Building Extraction Using High-Resolution Aerial Images and LiDAR Data

Remote Sensing ◽

10.3390/rs13132473 ◽

2021 ◽

Vol 13 (13) ◽

pp. 2473

Author(s):

Qinglie Yuan ◽

Helmi Zulhaidi Mohd Shafri ◽

Aidi Hizami Alias ◽

Shaiful Jahari Hashim

Keyword(s):

High Resolution ◽

Large Scale ◽

Spatial Information ◽

Feature Fusion ◽

Aerial Images ◽

Semantic Gap ◽

Superior Performance ◽

Lidar Data ◽

Building Extraction ◽

Hierarchical Features

Automatic building extraction has been applied in many domains. It is also a challenging problem because of the complex scenes and multiscale. Deep learning algorithms, especially fully convolutional neural networks (FCNs), have shown robust feature extraction ability than traditional remote sensing data processing methods. However, hierarchical features from encoders with a fixed receptive field perform weak ability to obtain global semantic information. Local features in multiscale subregions cannot construct contextual interdependence and correlation, especially for large-scale building areas, which probably causes fragmentary extraction results due to intra-class feature variability. In addition, low-level features have accurate and fine-grained spatial information for tiny building structures but lack refinement and selection, and the semantic gap of across-level features is not conducive to feature fusion. To address the above problems, this paper proposes an FCN framework based on the residual network and provides the training pattern for multi-modal data combining the advantage of high-resolution aerial images and LiDAR data for building extraction. Two novel modules have been proposed for the optimization and integration of multiscale and across-level features. In particular, a multiscale context optimization module is designed to adaptively generate the feature representations for different subregions and effectively aggregate global context. A semantic guided spatial attention mechanism is introduced to refine shallow features and alleviate the semantic gap. Finally, hierarchical features are fused via the feature pyramid network. Compared with other state-of-the-art methods, experimental results demonstrate superior performance with 93.19 IoU, 97.56 OA on WHU datasets and 94.72 IoU, 97.84 OA on the Boston dataset, which shows that the proposed network can improve accuracy and achieve better performance for building extraction.

Download Full-text

JointNet: A Common Neural Network for Road and Building Extraction

Remote Sensing ◽

10.3390/rs11060696 ◽

2019 ◽

Vol 11 (6) ◽

pp. 696 ◽

Cited By ~ 13

Author(s):

Zhengxin Zhang ◽

Yunhong Wang

Keyword(s):

Neural Network ◽

Receptive Field ◽

Loss Function ◽

Large Scale ◽

Connectivity Pattern ◽

Automatic Extraction ◽

Network Module ◽

Building Extraction ◽

Road Extraction ◽

General Method

Automatic extraction of ground objects is fundamental for many applications of remote sensing. It is valuable to extract different kinds of ground objects effectively by using a general method. We propose such a method, JointNet, which is a novel neural network to meet extraction requirements for both roads and buildings. The proposed method makes three contributions to road and building extraction: (1) in addition to the accurate extraction of small objects, it can extract large objects with a wide receptive field. By switching the loss function, the network can effectively extract multi-type ground objects, from road centerlines to large-scale buildings. (2) This network module combines the dense connectivity with the atrous convolution layers, maintaining the efficiency of the dense connection connectivity pattern and reaching a large receptive field. (3) The proposed method utilizes the focal loss function to improve road extraction. The proposed method is designed to be effective on both road and building extraction tasks. Experimental results on three datasets verified the effectiveness of JointNet in information extraction of road and building objects.

Download Full-text

Building Instance Change Detection from Large-Scale Aerial Images using Convolutional Neural Networks and Simulated Samples

Remote Sensing ◽

10.3390/rs11111343 ◽

2019 ◽

Vol 11 (11) ◽

pp. 1343 ◽

Cited By ~ 11

Author(s):

Shunping Ji ◽

Yanyun Shen ◽

Meng Lu ◽

Yongjun Zhang

Keyword(s):

Deep Learning ◽

Change Detection ◽

Urban Areas ◽

Large Scale ◽

Real Life ◽

Aerial Images ◽

Building Extraction ◽

Object Based ◽

Wide Range ◽

Change Map

We present a novel convolutional neural network (CNN)-based change detection framework for locating changed building instances as well as changed building pixels from very high resolution (VHR) aerial images. The distinctive advantage of the framework is the self-training ability, which is highly important in deep-learning-based change detection in practice, as high-quality samples of changes are always lacking for training a successful deep learning model. The framework consists two parts: a building extraction network to produce a binary building map and a building change detection network to produce a building change map. The building extraction network is implemented with two widely used structures: a Mask R-CNN for object-based instance segmentation, and a multi-scale full convolutional network for pixel-based semantic segmentation. The building change detection network takes bi-temporal building maps produced from the building extraction network as input and outputs a building change map at the object and pixel levels. By simulating arbitrary building changes and various building parallaxes in the binary building map, the building change detection network is well trained without real-life samples. This greatly lowers the requirements of labeled changed buildings, and guarantees the algorithm’s robustness to registration errors caused by parallaxes. To evaluate the proposed method, we chose a wide range of urban areas from an open-source dataset as training and testing areas, and both pixel-based and object-based model evaluation measures were used. Experiments demonstrated our approach was vastly superior: without using any real change samples, it reached 63% average precision (AP) at the object (building instance) level. In contrast, with adequate training samples, other methods—including the most recent CNN-based and generative adversarial network (GAN)-based ones—have only reached 25% AP in their best cases.

Download Full-text

Boundary-Aware Refined Network for Automatic Building Extraction in Very High-Resolution Urban Aerial Images

Remote Sensing ◽

10.3390/rs13040692 ◽

2021 ◽

Vol 13 (4) ◽

pp. 692

Author(s):

Yuwei Jin ◽

Wenbo Xu ◽

Ce Zhang ◽

Xin Luo ◽

Haitao Jia

Keyword(s):

High Resolution ◽

Large Scale ◽

Aerial Images ◽

Data Sets ◽

Building Extraction ◽

Visual Interpretation ◽

Urban Scenes ◽

Multi Scale ◽

Spatial Pyramid Pooling ◽

Very High

Convolutional Neural Networks (CNNs), such as U-Net, have shown competitive performance in the automatic extraction of buildings from Very High-Resolution (VHR) aerial images. However, due to the unstable multi-scale context aggregation, the insufficient combination of multi-level features and the lack of consideration of the semantic boundary, most existing CNNs produce incomplete segmentation for large-scale buildings and result in predictions with huge uncertainty at building boundaries. This paper presents a novel network with a special boundary-aware loss embedded, called the Boundary-Aware Refined Network (BARNet), to address the gap above. The unique properties of the proposed BARNet are the gated-attention refined fusion unit, the denser atrous spatial pyramid pooling module, and the boundary-aware loss. The performance of the BARNet is tested on two popular data sets that include various urban scenes and diverse patterns of buildings. Experimental results demonstrate that the proposed method outperforms several state-of-the-art approaches in both visual interpretation and quantitative evaluations.

Download Full-text

Building Outline Extraction Directly Using the U2-Net Semantic Segmentation Model from High-Resolution Aerial Images and a Comparison Study

Remote Sensing ◽

10.3390/rs13163187 ◽

2021 ◽

Vol 13 (16) ◽

pp. 3187

Author(s):

Xinchun Wei ◽

Xing Li ◽

Wei Liu ◽

Lianpeng Zhang ◽

Dayu Cheng ◽

...

Keyword(s):

Edge Detection ◽

Loss Function ◽

Semantic Segmentation ◽

Cross Entropy ◽

Aerial Images ◽

Building Extraction ◽

Precise Position ◽

Entropy Loss ◽

Imbalance Problem ◽

Outline Extraction

Deep learning techniques have greatly improved the efficiency and accuracy of building extraction using remote sensing images. However, high-quality building outline extraction results that can be applied to the field of surveying and mapping remain a significant challenge. In practice, most building extraction tasks are manually executed. Therefore, an automated procedure of a building outline with a precise position is required. In this study, we directly used the U2-net semantic segmentation model to extract the building outline. The extraction results showed that the U2-net model can provide the building outline with better accuracy and a more precise position than other models based on comparisons with semantic segmentation models (Segnet, U-Net, and FCN) and edge detection models (RCF, HED, and DexiNed) applied for two datasets (Nanjing and Wuhan University (WHU)). We also modified the binary cross-entropy loss function in the U2-net model into a multiclass cross-entropy loss function to directly generate the binary map with the building outline and background. We achieved a further refined outline of the building, thus showing that with the modified U2-net model, it is not necessary to use non-maximum suppression as a post-processing step, as in the other edge detection models, to refine the edge map. Moreover, the modified model is less affected by the sample imbalance problem. Finally, we created an image-to-image program to further validate the modified U2-net semantic segmentation model for building outline extraction.

Download Full-text

A Novel Focal Phi Loss for Power Line Segmentation with Auxiliary Classifier U-Net

Sensors ◽

10.3390/s21082803 ◽

2021 ◽

Vol 21 (8) ◽

pp. 2803

Author(s):

Rabeea Jaffari ◽

Manzoor Ahmed Hashmani ◽

Constantino Carlos Reyes-Aldasoro

Keyword(s):

Loss Function ◽

Class Imbalance ◽

Power Line ◽

Aerial Images ◽

Class Imbalance Problem ◽

Trade Off ◽

Urban Scenes ◽

Imbalance Problem ◽

A Minor ◽

Evaluation Parameters

The segmentation of power lines (PLs) from aerial images is a crucial task for the safe navigation of unmanned aerial vehicles (UAVs) operating at low altitudes. Despite the advances in deep learning-based approaches for PL segmentation, these models are still vulnerable to the class imbalance present in the data. The PLs occupy only a minimal portion (1–5%) of the aerial images as compared to the background region (95–99%). Generally, this class imbalance problem is addressed via the use of PL-specific detectors in conjunction with the popular class balanced cross entropy (BBCE) loss function. However, these PL-specific detectors do not work outside their application areas and a BBCE loss requires hyperparameter tuning for class-wise weights, which is not trivial. Moreover, the BBCE loss results in low dice scores and precision values and thus, fails to achieve an optimal trade-off between dice scores, model accuracy, and precision–recall values. In this work, we propose a generalized focal loss function based on the Matthews correlation coefficient (MCC) or the Phi coefficient to address the class imbalance problem in PL segmentation while utilizing a generic deep segmentation architecture. We evaluate our loss function by improving the vanilla U-Net model with an additional convolutional auxiliary classifier head (ACU-Net) for better learning and faster model convergence. The evaluation of two PL datasets, namely the Mendeley Power Line Dataset and the Power Line Dataset of Urban Scenes (PLDU), where PLs occupy around 1% and 2% of the aerial images area, respectively, reveal that our proposed loss function outperforms the popular BBCE loss by 16% in PL dice scores on both the datasets, 19% in precision and false detection rate (FDR) values for the Mendeley PL dataset and 15% in precision and FDR values for the PLDU with a minor degradation in the accuracy and recall values. Moreover, our proposed ACU-Net outperforms the baseline vanilla U-Net for the characteristic evaluation parameters in the range of 1–10% for both the PL datasets. Thus, our proposed loss function with ACU-Net achieves an optimal trade-off for the characteristic evaluation parameters without any bells and whistles. Our code is available at Github.

Download Full-text

LARGE SCALE CITY MAPPING USING SATELLITE IMAGERY / KOSMINIŲ NUOTRAUKŲ IŠ GOOGLE EARTH, TAIKOMŲ MIESTAMS KARTOGRAFUOTI STAMBIUOJU MASTELIU, REKTIFIKAVIMAS / РЕКТИФИКАЦИЯ КОСМИЧЕСКИХ СНИМКОВ ИЗ GOOGLE EARTH ДЛЯ КРУПНОМАСШТАБНОГО КАРТОГРАФИРОВАНИЯ ГОРОДОВ

Geodesy and Cartography ◽

10.3846/13921541.2011.645348 ◽

2012 ◽

Vol 37 (4) ◽

pp. 168-171 ◽

Cited By ~ 1

Author(s):

Birutė Ruzgienė ◽

Qian Yi Xiang ◽

Silvija Gečytė

Keyword(s):

Satellite Imagery ◽

Large Scale ◽

Modern Technology ◽

Google Earth ◽

Aerial Images ◽

Control Points ◽

Image Rectification ◽

Image Deformation ◽

Map Construction ◽

Short Period

The rectification of high resolution digital aerial images or satellite imagery employed for large scale city mapping is modern technology that needs well distributed and accurately defined control points. Digital satellite imagery, obtained using widely known software Google Earth, can be applied for accurate city map construction. The method of five control points is suggested for imagery rectification introducing the algorithm offered by Prof. Ruan Wei (tong ji University, Shanghai). Image rectification software created on the basis of the above suggested algorithm can correct image deformation with required accuracy, is reliable and keeps advantages in flexibility. Experimental research on testing the applied technology has been executed using GeoEye imagery with Google Earth builder over the city of Vilnius. Orthophoto maps at the scales of 1:1000 and 1:500 are generated referring to the methodology of five control points. Reference data and rectification results are checked comparing with those received from processing digital aerial images using a digital photogrammetry approach. The image rectification process applying the investigated method takes a short period of time (about 4-5 minutes) and uses only five control points. The accuracy of the created models satisfies requirements for large scale mapping. Santrauka Didelės skiriamosios gebos skaitmeninių nuotraukų ir kosminių nuotraukų rektifikavimas miestams kartografuoti stambiuoju masteliu yra nauja technologija. Tai atliekant būtini tikslūs ir aiškiai matomi kontroliniai taškai. Skaitmeninės kosminės nuotraukos, gautos taikant plačiai žinomą programinį paketą Google Earth, gali būti naudojamos miestams kartografuoti dideliu tikslumu. Siūloma nuotraukas rektifikuoti Penkių kontrolinių taskų metodu pagal prof. Ruan Wei (Tong Ji universitetas, Šanchajus) algoritmą. Moksliniam eksperimentui pasirinkta Vilniaus GeoEye nuotrauka iš Google Earth. 1:1000 ir 1:500 mastelio ortofotografiniai žemėlapiai sudaromi Penkių kontrolinių taškų metodu. Rektifikavimo duomenys lyginami su skaitmeninių nuotraukų apdorojimo rezultatais, gautais skaitmeninės fotogrametrijos metodu. Nuotraukų rektifikavimas Penkių kontrolinių taskų metodu atitinka kartografavimo stambiuoju masteliu reikalavimus, sumažėja laiko sąnaudos. Резюме Ректификация цифровых и космических снимков высокой резолюции для крупномасштабного картографирования является новой технологией, требующей точных и четких контрольных точек. Цифровые космические снимки, полученные с использованием широкоизвестного программного пакета Google Earth, могут применяться для точного картографирования городов. Для ректификации снимков предложен метод пяти контрольных точек с применением алгоритма проф. Ruan Wei (Университет Tong Ji, Шанхай). Для научного эксперимента использован снимок города Вильнюса GeoEye из Google Earth. Ортофотографические карты в масштабе 1:1000 и 1:500 генерируются с применением метода пяти контрольных точек. Полученные результаты и данные ректификации сравниваются с результатами цифровых снимков, полученных с применением метода цифровой фотограмметрии. Ректификация снимков с применением метода пяти контрольных точек уменьшает временные расходы и удовлетворяет требования, предъявляемые к крупномасштабному картографированию.

Download Full-text

A Multi-Task Network with Distance–Mask–Boundary Consistency Constraints for Building Extraction from Aerial Images

Remote Sensing ◽

10.3390/rs13142656 ◽

2021 ◽

Vol 13 (14) ◽

pp. 2656

Author(s):

Furong Shi ◽

Tong Zhang

Keyword(s):

Distance Estimation ◽

Image Data ◽

Learning Technologies ◽

Aerial Images ◽

Superior Performance ◽

Aerial Image ◽

Great Success ◽

Building Extraction ◽

Shape Information ◽

Multi Scale

Deep-learning technologies, especially convolutional neural networks (CNNs), have achieved great success in building extraction from areal images. However, shape details are often lost during the down-sampling process, which results in discontinuous segmentation or inaccurate segmentation boundary. In order to compensate for the loss of shape information, two shape-related auxiliary tasks (i.e., boundary prediction and distance estimation) were jointly learned with building segmentation task in our proposed network. Meanwhile, two consistency constraint losses were designed based on the multi-task network to exploit the duality between the mask prediction and two shape-related information predictions. Specifically, an atrous spatial pyramid pooling (ASPP) module was appended to the top of the encoder of a U-shaped network to obtain multi-scale features. Based on the multi-scale features, one regression loss and two classification losses were used for predicting the distance-transform map, segmentation, and boundary. Two inter-task consistency-loss functions were constructed to ensure the consistency between distance maps and masks, and the consistency between masks and boundary maps. Experimental results on three public aerial image data sets showed that our method achieved superior performance over the recent state-of-the-art models.

Download Full-text

Classification of Very-High-Spatial-Resolution Aerial Images Based on Multiscale Features with Limited Semantic Information

Remote Sensing ◽

10.3390/rs13030364 ◽

2021 ◽

Vol 13 (3) ◽

pp. 364

Author(s):

Han Gao ◽

Jinhui Guo ◽

Peng Guo ◽

Xiuwan Chen

Keyword(s):

Deep Learning ◽

Land Cover ◽

Spatial Resolution ◽

Large Scale ◽

High Spatial Resolution ◽

Training Data ◽

Aerial Images ◽

Rural Landscapes ◽

Feature Representations ◽

Object Based

Recently, deep learning has become the most innovative trend for a variety of high-spatial-resolution remote sensing imaging applications. However, large-scale land cover classification via traditional convolutional neural networks (CNNs) with sliding windows is computationally expensive and produces coarse results. Additionally, although such supervised learning approaches have performed well, collecting and annotating datasets for every task are extremely laborious, especially for those fully supervised cases where the pixel-level ground-truth labels are dense. In this work, we propose a new object-oriented deep learning framework that leverages residual networks with different depths to learn adjacent feature representations by embedding a multibranch architecture in the deep learning pipeline. The idea is to exploit limited training data at different neighboring scales to make a tradeoff between weak semantics and strong feature representations for operational land cover mapping tasks. We draw from established geographic object-based image analysis (GEOBIA) as an auxiliary module to reduce the computational burden of spatial reasoning and optimize the classification boundaries. We evaluated the proposed approach on two subdecimeter-resolution datasets involving both urban and rural landscapes. It presented better classification accuracy (88.9%) compared to traditional object-based deep learning methods and achieves an excellent inference time (11.3 s/ha).

Download Full-text

Self-Attention in Reconstruction Bias U-Net for Semantic Segmentation of Building Rooftops in Optical Remote Sensing Images

Remote Sensing ◽

10.3390/rs13132524 ◽

2021 ◽

Vol 13 (13) ◽

pp. 2524

Author(s):

Ziyi Chen ◽

Dilong Li ◽

Wentao Fan ◽

Haiyan Guan ◽

Cheng Wang ◽

...

Keyword(s):

Remote Sensing ◽

Deep Learning ◽

Semantic Segmentation ◽

Extraction Methods ◽

The Self ◽

Optical Remote Sensing ◽

Building Extraction ◽

Learning Models ◽

Remote Sensing Images ◽

Segmentation Methods

Deep learning models have brought great breakthroughs in building extraction from high-resolution optical remote-sensing images. Among recent research, the self-attention module has called up a storm in many fields, including building extraction. However, most current deep learning models loading with the self-attention module still lose sight of the reconstruction bias’s effectiveness. Through tipping the balance between the abilities of encoding and decoding, i.e., making the decoding network be much more complex than the encoding network, the semantic segmentation ability will be reinforced. To remedy the research weakness in combing self-attention and reconstruction-bias modules for building extraction, this paper presents a U-Net architecture that combines self-attention and reconstruction-bias modules. In the encoding part, a self-attention module is added to learn the attention weights of the inputs. Through the self-attention module, the network will pay more attention to positions where there may be salient regions. In the decoding part, multiple large convolutional up-sampling operations are used for increasing the reconstruction ability. We test our model on two open available datasets: the WHU and Massachusetts Building datasets. We achieve IoU scores of 89.39% and 73.49% for the WHU and Massachusetts Building datasets, respectively. Compared with several recently famous semantic segmentation methods and representative building extraction methods, our method’s results are satisfactory.

Download Full-text