DEEP LEARNING BASED ROOF TYPE CLASSIFICATION USING VERY HIGH RESOLUTION AERIAL IMAGERY

Abstract. Automatic detection, segmentation and reconstruction of buildings in urban areas from Earth Observation (EO) data are still challenging for many researchers. Roof is one of the most important element in a building model. The three-dimensional geographical information system (3D GIS) applications generally require the roof type and roof geometry for performing various analyses on the models, such as energy efficiency. The conventional segmentation and classification methods are often based on features like corners, edges and line segments. In parallel to the developments in computer hardware and artificial intelligence (AI) methods including deep learning (DL), image features can be extracted automatically. As a DL technique, convolutional neural networks (CNNs) can also be used for image classification tasks, but require large amount of high quality training data for obtaining accurate results. The main aim of this study was to generate a roof type dataset from very high-resolution (10 cm) orthophotos of Cesme, Turkey, and to classify the roof types using a shallow CNN architecture. The training dataset consists 10,000 roof images and their labels. Six roof type classes such as flat, hip, half-hip, gable, pyramid and complex roofs were used for the classification in the study area. The prediction performance of the shallow CNN model used here was compared with the results obtained from the fine-tuning of three well-known pre-trained networks, i.e. VGG-16, EfficientNetB4, ResNet-50. The results show that although our CNN has slightly lower performance expressed with the overall accuracy, it is still acceptable for many applications using sparse data.

Download Full-text

Characterizing the Spatial and Temporal Availability of Very High Resolution Satellite Imagery in Google Earth and Microsoft Bing Maps as a Source of Reference Data

Land ◽

10.3390/land7040118 ◽

2018 ◽

Vol 7 (4) ◽

pp. 118 ◽

Cited By ~ 18

Author(s):

Myroslava Lesiv ◽

Linda See ◽

Juan Laso Bayas ◽

Tobias Sturn ◽

Dmitry Schepaschenko ◽

...

Keyword(s):

High Resolution ◽

Satellite Imagery ◽

Urban Areas ◽

Reference Data ◽

Temporal Distribution ◽

Google Earth ◽

Training Data ◽

Visual Interpretation ◽

The Usa ◽

Very High

Very high resolution (VHR) satellite imagery from Google Earth and Microsoft Bing Maps is increasingly being used in a variety of applications from computer sciences to arts and humanities. In the field of remote sensing, one use of this imagery is to create reference data sets through visual interpretation, e.g., to complement existing training data or to aid in the validation of land-cover products. Through new applications such as Collect Earth, this imagery is also being used for monitoring purposes in the form of statistical surveys obtained through visual interpretation. However, little is known about where VHR satellite imagery exists globally or the dates of the imagery. Here we present a global overview of the spatial and temporal distribution of VHR satellite imagery in Google Earth and Microsoft Bing Maps. The results show an uneven availability globally, with biases in certain areas such as the USA, Europe and India, and with clear discontinuities at political borders. We also show that the availability of VHR imagery is currently not adequate for monitoring protected areas and deforestation, but is better suited for monitoring changes in cropland or urban areas using visual interpretation.

Download Full-text

ROOFN3D: DEEP LEARNING TRAINING DATA FOR 3D BUILDING RECONSTRUCTION

ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences ◽

10.5194/isprs-archives-xlii-2-1191-2018 ◽

2018 ◽

Vol XLII-2 ◽

pp. 1191-1198 ◽

Cited By ~ 5

Author(s):

A. Wichmann ◽

A. Agoub ◽

M. Kada

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Point Cloud ◽

Geometric Model ◽

Training Data ◽

Computer Hardware ◽

Training Dataset ◽

3D Point Cloud ◽

Learning Methods ◽

Building Reconstruction

Machine learning methods have gained in importance through the latest development of artificial intelligence and computer hardware. Particularly approaches based on deep learning have shown that they are able to provide state-of-the-art results for various tasks. However, the direct application of deep learning methods to improve the results of 3D building reconstruction is often not possible due, for example, to the lack of suitable training data. To address this issue, we present RoofN3D which provides a new 3D point cloud training dataset that can be used to train machine learning models for different tasks in the context of 3D building reconstruction. It can be used, among others, to train semantic segmentation networks or to learn the structure of buildings and the geometric model construction. Further details about RoofN3D and the developed data preparation framework, which enables the automatic derivation of training data, are described in this paper. Furthermore, we provide an overview of other available 3D point cloud training data and approaches from current literature in which solutions for the application of deep learning to unstructured and not gridded 3D point cloud data are presented.

Download Full-text

Comparative Research on Deep Learning Approaches for Airplane Detection from Very High-Resolution Satellite Images

Remote Sensing ◽

10.3390/rs12030458 ◽

2020 ◽

Vol 12 (3) ◽

pp. 458 ◽

Cited By ~ 7

Author(s):

Ugur Alganci ◽

Mehmet Soydas ◽

Elif Sertel

Keyword(s):

Deep Learning ◽

High Resolution ◽

Object Detection ◽

Satellite Images ◽

Training Data ◽

Object Localization ◽

Detection Accuracy ◽

Single Shot ◽

High Resolution Satellite Images ◽

Very High

Object detection from satellite images has been a challenging problem for many years. With the development of effective deep learning algorithms and advancement in hardware systems, higher accuracies have been achieved in the detection of various objects from very high-resolution (VHR) satellite images. This article provides a comparative evaluation of the state-of-the-art convolutional neural network (CNN)-based object detection models, which are Faster R-CNN, Single Shot Multi-box Detector (SSD), and You Look Only Once-v3 (YOLO-v3), to cope with the limited number of labeled data and to automatically detect airplanes in VHR satellite images. Data augmentation with rotation, rescaling, and cropping was applied on the test images to artificially increase the number of training data from satellite images. Moreover, a non-maximum suppression algorithm (NMS) was introduced at the end of the SSD and YOLO-v3 flows to get rid of the multiple detection occurrences near each detected object in the overlapping areas. The trained networks were applied to five independent VHR test images that cover airports and their surroundings to evaluate their performance objectively. Accuracy assessment results of the test regions proved that Faster R-CNN architecture provided the highest accuracy according to the F1 scores, average precision (AP) metrics, and visual inspection of the results. The YOLO-v3 ranked as second, with a slightly lower performance but providing a balanced trade-off between accuracy and speed. The SSD provided the lowest detection performance, but it was better in object localization. The results were also evaluated in terms of the object size and detection accuracy manner, which proved that large- and medium-sized airplanes were detected with higher accuracy.

Download Full-text

Road Extraction from Very High Resolution Images Using Weakly labeled OpenStreetMap Centerline

ISPRS International Journal of Geo-Information ◽

10.3390/ijgi8110478 ◽

2019 ◽

Vol 8 (11) ◽

pp. 478 ◽

Cited By ~ 6

Author(s):

Songbing Wu ◽

Chun Du ◽

Hao Chen ◽

Yingxiao Xu ◽

Ning Guo ◽

...

Keyword(s):

Deep Learning ◽

High Resolution ◽

Road Networks ◽

Semantic Segmentation ◽

Google Earth ◽

Training Data ◽

City Management ◽

High Quality ◽

High Resolution Imagery ◽

Very High

Road networks play a significant role in modern city management. It is necessary to continually extract current road structure, as it changes rapidly with the development of the city. Due to the success of semantic segmentation based on deep learning in the application of computer vision, extracting road networks from VHR (Very High Resolution) imagery becomes a method of updating geographic databases. The major shortcoming of deep learning methods for road networks extraction is that they need a massive amount of high quality pixel-wise training datasets, which is hard to obtain. Meanwhile, a large amount of different types of VGI (volunteer geographic information) data including road centerline has been accumulated in the past few decades. However, most road centerlines in VGI data lack precise width information and, therefore, cannot be directly applied to conventional supervised deep learning models. In this paper, we propose a novel weakly supervised method to extract road networks from VHR images using only the OSM (OpenStreetMap) road centerline as training data instead of high quality pixel-wise road width label. Large amounts of paired Google Earth images and OSM data are used to validate the approach. The results show that the proposed method can extract road networks from the VHR images both accurately and effectively without using pixel-wise road training data.

Download Full-text

VERY HIGH RESOLUTION LAND COVER MAPPING OF URBAN AREAS AT GLOBAL SCALE WITH CONVOLUTIONAL NEURAL NETWORKS

ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences ◽

10.5194/isprs-archives-xliii-b3-2020-201-2020 ◽

2020 ◽

Vol XLIII-B3-2020 ◽

pp. 201-208

Author(s):

T. Tilak ◽

A. Braun ◽

D. Chandler ◽

N. David ◽

S. Galopin ◽

...

Keyword(s):

High Resolution ◽

Land Cover ◽

Urban Areas ◽

Ground Truth ◽

Bare Soil ◽

Aerial Images ◽

Training Dataset ◽

Large Area ◽

Land Cover Map ◽

Very High

Abstract. This paper describes a methodology to produce a 7-classes land cover map of urban areas from very high resolution images and limited noisy labeled data. The objective is to make a segmentation map of a large area (a french department) with the following classes: asphalt, bare soil, building, grassland, mineral material (permeable artificialized areas), forest and water from 20cm aerial images and Digital Height Model.We created a training dataset on a few areas of interest aggregating databases, semi-automatic classification, and manual annotation to get a complete ground truth in each class.A comparative study of different encoder-decoder architectures (U-Net, U-Net with Resnet encoders, Deeplab v3+) is presented with different loss functions.The final product is a highly valuable land cover map computed from model predictions stitched together, binarized, and refined before vectorization.

Download Full-text

Automatic Mapping of Thermokarst Landforms from Remote Sensing Images Using Deep Learning: A Case Study in the Northeastern Tibetan Plateau

Remote Sensing ◽

10.3390/rs10122067 ◽

2018 ◽

Vol 10 (12) ◽

pp. 2067 ◽

Cited By ~ 7

Author(s):

Lingcao Huang ◽

Lin Liu ◽

Liming Jiang ◽

Tingjun Zhang

Keyword(s):

Remote Sensing ◽

Deep Learning ◽

High Resolution ◽

Tibetan Plateau ◽

Training Data ◽

Fine Tuning ◽

Remote Sensing Images ◽

Northeastern Tibetan Plateau ◽

High Resolution Images

Thawing of ice-rich permafrost causes thermokarst landforms on the ground surface. Obtaining the distribution of thermokarst landforms is a prerequisite for understanding permafrost degradation and carbon exchange at local and regional scales. However, because of their diverse types and characteristics, it is challenging to map thermokarst landforms from remote sensing images. We conducted a case study towards automatically mapping a type of thermokarst landforms (i.e., thermo-erosion gullies) in a local area in the northeastern Tibetan Plateau from high-resolution images by the use of deep learning. In particular, we applied the DeepLab algorithm (based on Convolutional Neural Networks) to a 0.15-m-resolution Digital Orthophoto Map (created using aerial photographs taken by an Unmanned Aerial Vehicle). Here, we document the detailed processing flow with key steps including preparing training data, fine-tuning, inference, and post-processing. Validating against the field measurements and manual digitizing results, we obtained an F1 score of 0.74 (precision is 0.59 and recall is 1.0), showing that the proposed method can effectively map small and irregular thermokarst landforms. It is potentially viable to apply the designed method to mapping diverse thermokarst landforms in a larger area where high-resolution images and training data are available.

Download Full-text

Large-Scale Printed Chinese Character Recognition for ID Cards Using Deep Learning and Few Samples Transfer Learning

10.20944/preprints202112.0376.v1 ◽

2021 ◽

Author(s):

Yi-Quan Li ◽

Hao-Sen Chang ◽

Daw-Tung Lin

Keyword(s):

Neural Networks ◽

Deep Learning ◽

Transfer Learning ◽

Character Recognition ◽

Large Scale ◽

Data Augmentation ◽

Training Data ◽

Fine Tuning ◽

Training Dataset ◽

Model Parameters

In the field of computer vision, large-scale image classification tasks are both important and highly challenging. With the ongoing advances in deep learning and optical character recognition (OCR) technologies, neural networks designed to perform large-scale classification play an essential role in facilitating OCR systems. In this study, we developed an automatic OCR system designed to identify up to 13,070 large-scale printed Chinese characters by using deep learning neural networks and fine-tuning techniques. The proposed framework comprises four components, including training dataset synthesis and background simulation, image preprocessing and data augmentation, the process of training the model, and transfer learning. The training data synthesis procedure is composed of a character font generation step and a background simulation process. Three background models are proposed to simulate the factors of the background noise and anti-counterfeiting patterns on ID cards. To expand the diversity of the synthesized training dataset, rotation and zooming data augmentation are applied. A massive dataset comprising more than 19.6 million images was thus created to accommodate the variations in the input images and improve the learning capacity of the CNN model. Subsequently, we modified the GoogLeNet neural architecture by replacing the FC layer with a global average pooling layer to avoid overfitting caused by a massive amount of training data. Consequently, the number of model parameters was reduced. Finally, we employed the transfer learning technique to further refine the CNN model using a small number of real data samples. Experimental results show that the overall recognition performance of the proposed approach is significantly better than that of prior methods and thus demonstrate the effectiveness of proposed framework, which exhibited a recognition accuracy as high as 99.39% on the constructed real ID card dataset.

Download Full-text

Deep Learning-Based Differentiation between Mucinous Cystic Neoplasm and Serous Cystic Neoplasm in the Pancreas Using Endoscopic Ultrasonography

Diagnostics ◽

10.3390/diagnostics11061052 ◽

2021 ◽

Vol 11 (6) ◽

pp. 1052

Author(s):

Leang Sim Nguon ◽

Kangwon Seo ◽

Jung-Hyun Lim ◽

Tae-Jun Song ◽

Sung-Hyun Cho ◽

...

Keyword(s):

Decision Making ◽

Deep Learning ◽

Network Model ◽

Endoscopic Ultrasonography ◽

Data Augmentation ◽

Clinical Information ◽

Training Data ◽

Fine Tuning ◽

Cystic Neoplasm ◽

Cystic Neoplasms

Mucinous cystic neoplasms (MCN) and serous cystic neoplasms (SCN) account for a large portion of solitary pancreatic cystic neoplasms (PCN). In this study we implemented a convolutional neural network (CNN) model using ResNet50 to differentiate between MCN and SCN. The training data were collected retrospectively from 59 MCN and 49 SCN patients from two different hospitals. Data augmentation was used to enhance the size and quality of training datasets. Fine-tuning training approaches were utilized by adopting the pre-trained model from transfer learning while training selected layers. Testing of the network was conducted by varying the endoscopic ultrasonography (EUS) image sizes and positions to evaluate the network performance for differentiation. The proposed network model achieved up to 82.75% accuracy and a 0.88 (95% CI: 0.817–0.930) area under curve (AUC) score. The performance of the implemented deep learning networks in decision-making using only EUS images is comparable to that of traditional manual decision-making using EUS images along with supporting clinical information. Gradient-weighted class activation mapping (Grad-CAM) confirmed that the network model learned the features from the cyst region accurately. This study proves the feasibility of diagnosing MCN and SCN using a deep learning network model. Further improvement using more datasets is needed.

Download Full-text