Building Instance Change Detection from Large-Scale Aerial Images using Convolutional Neural Networks and Simulated Samples

We present a novel convolutional neural network (CNN)-based change detection framework for locating changed building instances as well as changed building pixels from very high resolution (VHR) aerial images. The distinctive advantage of the framework is the self-training ability, which is highly important in deep-learning-based change detection in practice, as high-quality samples of changes are always lacking for training a successful deep learning model. The framework consists two parts: a building extraction network to produce a binary building map and a building change detection network to produce a building change map. The building extraction network is implemented with two widely used structures: a Mask R-CNN for object-based instance segmentation, and a multi-scale full convolutional network for pixel-based semantic segmentation. The building change detection network takes bi-temporal building maps produced from the building extraction network as input and outputs a building change map at the object and pixel levels. By simulating arbitrary building changes and various building parallaxes in the binary building map, the building change detection network is well trained without real-life samples. This greatly lowers the requirements of labeled changed buildings, and guarantees the algorithm’s robustness to registration errors caused by parallaxes. To evaluate the proposed method, we chose a wide range of urban areas from an open-source dataset as training and testing areas, and both pixel-based and object-based model evaluation measures were used. Experiments demonstrated our approach was vastly superior: without using any real change samples, it reached 63% average precision (AP) at the object (building instance) level. In contrast, with adequate training samples, other methods—including the most recent CNN-based and generative adversarial network (GAN)-based ones—have only reached 25% AP in their best cases.

Download Full-text

BUILDING CHANGE DETECTION FROM BITEMPORAL AERIAL IMAGES USING DEEP LEARNING

ISPRS Annals of Photogrammetry Remote Sensing and Spatial Information Sciences ◽

10.5194/isprs-annals-v-2-2020-565-2020 ◽

2020 ◽

Vol V-2-2020 ◽

pp. 565-571

Author(s):

S. Su ◽

T. Nawata ◽

T. Fuse

Keyword(s):

Deep Learning ◽

Change Detection ◽

Urban Areas ◽

Network Architecture ◽

Aerial Images ◽

Detection Accuracy ◽

Detection Model ◽

Wide Range ◽

Surface Models ◽

Data Source

Abstract. Automatic building change detection has become a topical issue owing to its wide range of applications, such as updating building maps. However, accurate building change detection remains challenging, particularly in urban areas. Thus far, there has been limited research on the use of the outdated building map (the building map before the update, referred to herein as the old-map) to increase the accuracy of building change detection. This paper presents a novel deep-learning-based method for building change detection using bitemporal aerial images containing RGB bands, bitemporal digital surface models (DSMs), and an old-map. The aerial images have two types of spatial resolutions, 12.5 cm or 16 cm, and the cell size of the DSMs is 50 cm × 50 cm. The bitemporal aerial images, the height variations calculated using the differences between the bitemporal DSMs, and the old-map were fed into a network architecture to build an automatic building change detection model. The performance of the model was quantitatively and qualitatively evaluated for an urban area that covered approximately 10 km2 and contained over 21,000 buildings. The results indicate that it can detect the building changes with optimum accuracy as compared to other methods that use inputs such as i) bitemporal aerial images only, ii) bitemporal aerial images and bitemporal DSMs, and iii) bitemporal aerial images and an old-map. The proposed method achieved recall rates of 89.3%, 88.8%, and 99.5% for new, demolished, and other buildings, respectively. The results also demonstrate that the old-map is an effective data source for increasing building change detection accuracy.

Download Full-text

Classification of Very-High-Spatial-Resolution Aerial Images Based on Multiscale Features with Limited Semantic Information

Remote Sensing ◽

10.3390/rs13030364 ◽

2021 ◽

Vol 13 (3) ◽

pp. 364

Author(s):

Han Gao ◽

Jinhui Guo ◽

Peng Guo ◽

Xiuwan Chen

Keyword(s):

Deep Learning ◽

Land Cover ◽

Spatial Resolution ◽

Large Scale ◽

High Spatial Resolution ◽

Training Data ◽

Aerial Images ◽

Rural Landscapes ◽

Feature Representations ◽

Object Based

Recently, deep learning has become the most innovative trend for a variety of high-spatial-resolution remote sensing imaging applications. However, large-scale land cover classification via traditional convolutional neural networks (CNNs) with sliding windows is computationally expensive and produces coarse results. Additionally, although such supervised learning approaches have performed well, collecting and annotating datasets for every task are extremely laborious, especially for those fully supervised cases where the pixel-level ground-truth labels are dense. In this work, we propose a new object-oriented deep learning framework that leverages residual networks with different depths to learn adjacent feature representations by embedding a multibranch architecture in the deep learning pipeline. The idea is to exploit limited training data at different neighboring scales to make a tradeoff between weak semantics and strong feature representations for operational land cover mapping tasks. We draw from established geographic object-based image analysis (GEOBIA) as an auxiliary module to reduce the computational burden of spatial reasoning and optimize the classification boundaries. We evaluated the proposed approach on two subdecimeter-resolution datasets involving both urban and rural landscapes. It presented better classification accuracy (88.9%) compared to traditional object-based deep learning methods and achieves an excellent inference time (11.3 s/ha).

Download Full-text

Uncertainty-Aware Deep Learning-Based Cardiac Arrhythmias Classification Model of Electrocardiogram Signals

Computers ◽

10.3390/computers10060082 ◽

2021 ◽

Vol 10 (6) ◽

pp. 82

Author(s):

Ahmad O. Aseeri

Keyword(s):

Deep Learning ◽

Cardiac Arrhythmias ◽

Large Scale ◽

Clinical Decision Making ◽

Probabilistic Approach ◽

Classification Model ◽

Gating Mechanism ◽

Uncertainty Estimates ◽

Wide Range

Deep Learning-based methods have emerged to be one of the most effective and practical solutions in a wide range of medical problems, including the diagnosis of cardiac arrhythmias. A critical step to a precocious diagnosis in many heart dysfunctions diseases starts with the accurate detection and classification of cardiac arrhythmias, which can be achieved via electrocardiograms (ECGs). Motivated by the desire to enhance conventional clinical methods in diagnosing cardiac arrhythmias, we introduce an uncertainty-aware deep learning-based predictive model design for accurate large-scale classification of cardiac arrhythmias successfully trained and evaluated using three benchmark medical datasets. In addition, considering that the quantification of uncertainty estimates is vital for clinical decision-making, our method incorporates a probabilistic approach to capture the model’s uncertainty using a Bayesian-based approximation method without introducing additional parameters or significant changes to the network’s architecture. Although many arrhythmias classification solutions with various ECG feature engineering techniques have been reported in the literature, the introduced AI-based probabilistic-enabled method in this paper outperforms the results of existing methods in outstanding multiclass classification results that manifest F1 scores of 98.62% and 96.73% with (MIT-BIH) dataset of 20 annotations, and 99.23% and 96.94% with (INCART) dataset of eight annotations, and 97.25% and 96.73% with (BIDMC) dataset of six annotations, for the deep ensemble and probabilistic mode, respectively. We demonstrate our method’s high-performing and statistical reliability results in numerical experiments on the language modeling using the gating mechanism of Recurrent Neural Networks.

Download Full-text

Multiscale Semantic Feature Optimization and Fusion Network for Building Extraction Using High-Resolution Aerial Images and LiDAR Data

Remote Sensing ◽

10.3390/rs13132473 ◽

2021 ◽

Vol 13 (13) ◽

pp. 2473

Author(s):

Qinglie Yuan ◽

Helmi Zulhaidi Mohd Shafri ◽

Aidi Hizami Alias ◽

Shaiful Jahari Hashim

Keyword(s):

High Resolution ◽

Large Scale ◽

Spatial Information ◽

Feature Fusion ◽

Aerial Images ◽

Semantic Gap ◽

Superior Performance ◽

Lidar Data ◽

Building Extraction ◽

Hierarchical Features

Automatic building extraction has been applied in many domains. It is also a challenging problem because of the complex scenes and multiscale. Deep learning algorithms, especially fully convolutional neural networks (FCNs), have shown robust feature extraction ability than traditional remote sensing data processing methods. However, hierarchical features from encoders with a fixed receptive field perform weak ability to obtain global semantic information. Local features in multiscale subregions cannot construct contextual interdependence and correlation, especially for large-scale building areas, which probably causes fragmentary extraction results due to intra-class feature variability. In addition, low-level features have accurate and fine-grained spatial information for tiny building structures but lack refinement and selection, and the semantic gap of across-level features is not conducive to feature fusion. To address the above problems, this paper proposes an FCN framework based on the residual network and provides the training pattern for multi-modal data combining the advantage of high-resolution aerial images and LiDAR data for building extraction. Two novel modules have been proposed for the optimization and integration of multiscale and across-level features. In particular, a multiscale context optimization module is designed to adaptively generate the feature representations for different subregions and effectively aggregate global context. A semantic guided spatial attention mechanism is introduced to refine shallow features and alleviate the semantic gap. Finally, hierarchical features are fused via the feature pyramid network. Compared with other state-of-the-art methods, experimental results demonstrate superior performance with 93.19 IoU, 97.56 OA on WHU datasets and 94.72 IoU, 97.84 OA on the Boston dataset, which shows that the proposed network can improve accuracy and achieve better performance for building extraction.

Download Full-text

Self-Supervised Pre-Training of Transformers for Satellite Image Time Series Classification

10.36227/techrxiv.13025039.v1 ◽

2020 ◽

Author(s):

Yuan Yuan ◽

Lei Lin

Keyword(s):

Time Series ◽

Deep Learning ◽

Large Scale ◽

Temporal Structure ◽

Satellite Image ◽

Fine Tuning ◽

Small Scale ◽

Model Parameters ◽

Learning Approaches ◽

Wide Range

Satellite image time series (SITS) classification is a major research topic in remote sensing and is relevant for a wide range of applications. Deep learning approaches have been commonly employed for SITS classification and have provided state-of-the-art performance. However, deep learning methods suffer from overfitting when labeled data is scarce. To address this problem, we propose a novel self-supervised pre-training scheme to initialize a Transformer-based network by utilizing large-scale unlabeled data. In detail, the model is asked to predict randomly contaminated observations given an entire time series of a pixel. The main idea of our proposal is to leverage the inherent temporal structure of satellite time series to learn general-purpose spectral-temporal representations related to land cover semantics. Once pre-training is completed, the pre-trained network can be further adapted to various SITS classification tasks by fine-tuning all the model parameters on small-scale task-related labeled data. In this way, the general knowledge and representations about SITS can be transferred to a label-scarce task, thereby improving the generalization performance of the model as well as reducing the risk of overfitting. Comprehensive experiments have been carried out on three benchmark datasets over large study areas. Experimental results demonstrate the effectiveness of the proposed method, leading to a classification accuracy increment up to 1.91% to 6.69%. <div><b>This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible.</b></div>

Download Full-text

Detecting Objects from Space: An Evaluation of Deep-Learning Modern Approaches

Electronics ◽

10.3390/electronics9040583 ◽

2020 ◽

Vol 9 (4) ◽

pp. 583 ◽

Cited By ~ 6

Author(s):

Khang Nguyen ◽

Nhut T. Huynh ◽

Phat C. Nguyen ◽

Khanh-Duy Nguyen ◽

Nguyen D. Vo ◽

...

Keyword(s):

Deep Learning ◽

Object Detection ◽

Unmanned Aircraft ◽

Aerial Images ◽

Great Success ◽

Single Shot ◽

Convolutional Networks ◽

Image Pyramids ◽

Fully Convolutional Networks ◽

Wide Range

Unmanned aircraft systems or drones enable us to record or capture many scenes from the bird’s-eye view and they have been fast deployed to a wide range of practical domains, i.e., agriculture, aerial photography, fast delivery and surveillance. Object detection task is one of the core steps in understanding videos collected from the drones. However, this task is very challenging due to the unconstrained viewpoints and low resolution of captured videos. While deep-learning modern object detectors have recently achieved great success in general benchmarks, i.e., PASCAL-VOC and MS-COCO, the robustness of these detectors on aerial images captured by drones is not well studied. In this paper, we present an evaluation of state-of-the-art deep-learning detectors including Faster R-CNN (Faster Regional CNN), RFCN (Region-based Fully Convolutional Networks), SNIPER (Scale Normalization for Image Pyramids with Efficient Resampling), Single-Shot Detector (SSD), YOLO (You Only Look Once), RetinaNet, and CenterNet for the object detection in videos captured by drones. We conduct experiments on VisDrone2019 dataset which contains 96 videos with 39,988 annotated frames and provide insights into efficient object detectors for aerial images.

Download Full-text

An object-based approach for semi-automated landslide change detection and attribution of changes to landslide classes in northern Taiwan

Earth Science Informatics ◽

10.1007/s12145-015-0217-3 ◽

2015 ◽

Vol 8 (2) ◽

pp. 327-335 ◽

Cited By ~ 37

Author(s):

Daniel Hölbling ◽

Barbara Friedl ◽

Clemens Eisank

Keyword(s):

Change Detection ◽

Spatial Resolution ◽

Vegetation Index ◽

Normalized Difference Vegetation Index ◽

Remote Sensing Data ◽

Detection Methods ◽

Detection And Attribution ◽

Object Based ◽

Wide Range ◽

Northern Taiwan

Abstract Earth observation (EO) data are very useful for the detection of landslides after triggering events, especially if they occur in remote and hardly accessible terrain. To fully exploit the potential of the wide range of existing remote sensing data, innovative and reliable landslide (change) detection methods are needed. Recently, object-based image analysis (OBIA) has been employed for EO-based landslide (change) mapping. The proposed object-based approach has been tested for a sub-area of the Baichi catchment in northern Taiwan. The focus is on the mapping of landslides and debris flows/sediment transport areas caused by the Typhoons Aere in 2004 and Matsa in 2005. For both events, pre- and post-disaster optical satellite images (SPOT-5 with 2.5 m spatial resolution) were analysed. A Digital Elevation Model (DEM) with 5 m spatial resolution and its derived products, i.e., slope and curvature, were additionally integrated in the analysis to support the semi-automated object-based landslide mapping. Changes were identified by comparing the normalised values of the Normalized Difference Vegetation Index (NDVI) and the Green Normalized Difference Vegetation Index (GNDVI) of segmentation-derived image objects between pre- and post-event images and attributed to landslide classes.

Download Full-text

Two-Phase Object-Based Deep Learning for Multi-Temporal SAR Image Change Detection

Remote Sensing ◽

10.3390/rs12030548 ◽

2020 ◽

Vol 12 (3) ◽

pp. 548 ◽

Cited By ~ 1

Author(s):

Xinzheng Zhang ◽

Guo Liu ◽

Ce Zhang ◽

Peter M. Atkinson ◽

Xiaoheng Tan ◽

...

Keyword(s):

Deep Learning ◽

Change Detection ◽

Speckle Noise ◽

Sar Image ◽

Phase Object ◽

Sar Images ◽

Two Phase ◽

Object Based ◽

Multi Temporal ◽

Image Change Detection

Change detection is one of the fundamental applications of synthetic aperture radar (SAR) images. However, speckle noise presented in SAR images has a negative effect on change detection, leading to frequent false alarms in the mapping products. In this research, a novel two-phase object-based deep learning approach is proposed for multi-temporal SAR image change detection. Compared with traditional methods, the proposed approach brings two main innovations. One is to classify all pixels into three categories rather than two categories: unchanged pixels, changed pixels caused by strong speckle (false changes), and changed pixels formed by real terrain variation (real changes). The other is to group neighbouring pixels into superpixel objects such as to exploit local spatial context. Two phases are designed in the methodology: (1) Generate objects based on the simple linear iterative clustering (SLIC) algorithm, and discriminate these objects into changed and unchanged classes using fuzzy c-means (FCM) clustering and a deep PCANet. The prediction of this Phase is the set of changed and unchanged superpixels. (2) Deep learning on the pixel sets over the changed superpixels only, obtained in the first phase, to discriminate real changes from false changes. SLIC is employed again to achieve new superpixels in the second phase. Low rank and sparse decomposition are applied to these new superpixels to suppress speckle noise significantly. A further clustering step is applied to these new superpixels via FCM. A new PCANet is then trained to classify two kinds of changed superpixels to achieve the final change maps. Numerical experiments demonstrate that, compared with benchmark methods, the proposed approach can distinguish real changes from false changes effectively with significantly reduced false alarm rates, and achieve up to 99.71% change detection accuracy using multi-temporal SAR imagery.

Download Full-text

Super-Resolution-Based Snake Model—An Unsupervised Method for Large-Scale Building Extraction Using Airborne LiDAR Data and Optical Image

Remote Sensing ◽

10.3390/rs12111702 ◽

2020 ◽

Vol 12 (11) ◽

pp. 1702 ◽

Cited By ~ 2

Author(s):

Thanh Huy Nguyen ◽

Sylvie Daniel ◽

Didier Guériot ◽

Christophe Sintès ◽

Jean-Marc Le Caillec

Keyword(s):

Large Scale ◽

Active Contour Model ◽

Super Resolution ◽

Airborne Lidar ◽

Lidar Data ◽

Force Model ◽

Building Extraction ◽

Snake Model ◽

Object Based

Automatic extraction of buildings in urban and residential scenes has become a subject of growing interest in the domain of photogrammetry and remote sensing, particularly since the mid-1990s. Active contour model, colloquially known as snake model, has been studied to extract buildings from aerial and satellite imagery. However, this task is still very challenging due to the complexity of building size, shape, and its surrounding environment. This complexity leads to a major obstacle for carrying out a reliable large-scale building extraction, since the involved prior information and assumptions on building such as shape, size, and color cannot be generalized over large areas. This paper presents an efficient snake model to overcome such a challenge, called Super-Resolution-based Snake Model (SRSM). The SRSM operates on high-resolution Light Detection and Ranging (LiDAR)-based elevation images—called z-images—generated by a super-resolution process applied to LiDAR data. The involved balloon force model is also improved to shrink or inflate adaptively, instead of inflating continuously. This method is applicable for a large scale such as city scale and even larger, while having a high level of automation and not requiring any prior knowledge nor training data from the urban scenes (hence unsupervised). It achieves high overall accuracy when tested on various datasets. For instance, the proposed SRSM yields an average area-based Quality of 86.57% and object-based Quality of 81.60% on the ISPRS Vaihingen benchmark datasets. Compared to other methods using this benchmark dataset, this level of accuracy is highly desirable even for a supervised method. Similarly desirable outcomes are obtained when carrying out the proposed SRSM on the whole City of Quebec (total area of 656 km2), yielding an area-based Quality of 62.37% and an object-based Quality of 63.21%.

Download Full-text

Deep Learning-Based Sentimental Analysis for Large-Scale Imbalanced Twitter Data

Future Internet ◽

10.3390/fi11090190 ◽

2019 ◽

Vol 11 (9) ◽

pp. 190 ◽

Cited By ~ 3

Author(s):

Jamal ◽

Xianqiao ◽

Aldabbas

Keyword(s):

Deep Learning ◽

Large Scale ◽

State Of The Art ◽

Hybrid Approach ◽

Principal Component ◽

Specific Topic ◽

Weighting Method ◽

Psychological Conditions ◽

Twitter Data ◽

Wide Range

Emotions detection in social media is very effective to measure the mood of people about a specific topic, news, or product. It has a wide range of applications, including identifying psychological conditions such as anxiety or depression in users. However, it is a challenging task to distinguish useful emotions’ features from a large corpus of text because emotions are subjective, with limited fuzzy boundaries that may be expressed in different terminologies and perceptions. To tackle this issue, this paper presents a hybrid approach of deep learning based on TensorFlow with Keras for emotions detection on a large scale of imbalanced tweets’ data. First, preprocessing steps are used to get useful features from raw tweets without noisy data. Second, the entropy weighting method is used to compute the importance of each feature. Third, class balancer is applied to balance each class. Fourth, Principal Component Analysis (PCA) is applied to transform high correlated features into normalized forms. Finally, the TensorFlow based deep learning with Keras algorithm is proposed to predict high-quality features for emotions classification. The proposed methodology is analyzed on a dataset of 1,600,000 tweets collected from the website ‘kaggle’. Comparison is made of the proposed approach with other state of the art techniques on different training ratios. It is proved that the proposed approach outperformed among other techniques.

Download Full-text