Semantic Segmentation of Drone Images Using Deep Learning Models Trained with National Geospatial Data

Deep learning models have brought great breakthroughs in building extraction from high-resolution optical remote-sensing images. Among recent research, the self-attention module has called up a storm in many fields, including building extraction. However, most current deep learning models loading with the self-attention module still lose sight of the reconstruction bias’s effectiveness. Through tipping the balance between the abilities of encoding and decoding, i.e., making the decoding network be much more complex than the encoding network, the semantic segmentation ability will be reinforced. To remedy the research weakness in combing self-attention and reconstruction-bias modules for building extraction, this paper presents a U-Net architecture that combines self-attention and reconstruction-bias modules. In the encoding part, a self-attention module is added to learn the attention weights of the inputs. Through the self-attention module, the network will pay more attention to positions where there may be salient regions. In the decoding part, multiple large convolutional up-sampling operations are used for increasing the reconstruction ability. We test our model on two open available datasets: the WHU and Massachusetts Building datasets. We achieve IoU scores of 89.39% and 73.49% for the WHU and Massachusetts Building datasets, respectively. Compared with several recently famous semantic segmentation methods and representative building extraction methods, our method’s results are satisfactory.

Download Full-text

Introducing AIDE: a Software Suite for Annotating Images with Deep and Active Learning Assistance

10.5194/egusphere-egu21-12065 ◽

2021 ◽

Author(s):

Benjamin Kellenberger ◽

Devis Tuia ◽

Dan Morris

Keyword(s):

Deep Learning ◽

Active Learning ◽

Expert Knowledge ◽

Semantic Segmentation ◽

Third Party ◽

Learning Models ◽

Web Browser ◽

Web Based ◽

Model Training ◽

Bounding Boxes

Ecological research like wildlife censuses increasingly relies on data on the scale of Terabytes. For example, modern camera trap datasets contain millions of images that require prohibitive amounts of manual labour to be annotated with species, bounding boxes, and the like. Machine learning, especially deep learning [3], could greatly accelerate this task through automated predictions, but involves expansive coding and expert knowledge.In this abstract we present AIDE, the Annotation Interface for Data-driven Ecology [2]. In a first instance, AIDE is a web-based annotation suite for image labelling with support for concurrent access and scalability, up to the cloud. In a second instance, it tightly integrates deep learning models into the annotation process through active learning [7], where models learn from user-provided labels and in turn select the most relevant images for review from the large pool of unlabelled ones (Fig. 1). The result is a system where users only need to label what is required, which saves time and decreases errors due to fatigue.<img src="https://contentmanager.copernicus.org/fileStorageProxy.php?f=gnp.0402be60f60062057601161/sdaolpUECMynit/12UGE&app=m&a=0&c=131251398e575ac9974634bd0861fadc&ct=x&pn=gnp.elif&d=1" alt="">Fig. 1: AIDE offers concurrent web image labelling support and uses annotations and deep learning models in an active learning loop.AIDE includes a comprehensive set of built-in models, such as ResNet [1] for image classification, Faster R-CNN [5] and RetinaNet [4] for object detection, and U-Net [6] for semantic segmentation. All models can be customised and used without having to write a single line of code. Furthermore, AIDE accepts any third-party model with minimal implementation requirements. To complete the package, AIDE offers both user annotation and model prediction evaluation, access control, customisable model training, and more, all through the web browser.AIDE is fully open source and available under https://github.com/microsoft/aerial_wildlife_detection.&#160;References

Download Full-text

AUTOMATED MARINE OIL SPILL DETECTION USING DEEP LEARNING INSTANCE SEGMENTATION MODEL

ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences ◽

10.5194/isprs-archives-xliii-b3-2020-1271-2020 ◽

2020 ◽

Vol XLIII-B3-2020 ◽

pp. 1271-1276

Author(s):

S. T. Yekeen ◽

A.-L. Balogun

Keyword(s):

Deep Learning ◽

Oil Spill ◽

Semantic Segmentation ◽

Learning Models ◽

Marine Oil ◽

Conventional Machine ◽

Feature Pyramid ◽

Model Training ◽

Instance Segmentation ◽

Better Than

Abstract. This study developed a novel deep learning oil spill instance segmentation model using Mask-Region-based Convolutional Neural Network (Mask R-CNN) model which is a state-of-the-art computer vision model. A total of 2882 imageries containing oil spill, look-alike, ship, and land area after conducting different pre-processing activities were acquired. These images were subsequently sub-divided into 88% training and 12% for testing, equating to 2530 and 352 images respectively. The model training was conducted using transfer learning on a pre-trained ResNet 101 with COCO data as a backbone in combination with Feature Pyramid Network (FPN) architecture for the extraction of features at 30 epochs with 0.001 learning rate. The model’s performance was evaluated using precision, recall, and F1-measure which shows a higher performance than other existing models with value of 0.964, 0.969 and 0.968 respectively. As a specialized task, the study concluded that the developed deep learning instance segmentation model (Mask R-CNN) performs better than conventional machine learning models and semantic segmentation deep learning models in detection and segmentation of marine oil spill.

Download Full-text

Semantic Segmentation of Urban Buildings Using a High-Resolution Network (HRNet) with Channel and Spatial Attention Gates

Remote Sensing ◽

10.3390/rs13163087 ◽

2021 ◽

Vol 13 (16) ◽

pp. 3087

Author(s):

Seonkyeong Seong ◽

Jaewan Choi

Keyword(s):

Deep Learning ◽

High Resolution ◽

Spatial Attention ◽

Semantic Segmentation ◽

Aerial Images ◽

Building Extraction ◽

Learning Models ◽

Urban Buildings

In this study, building extraction in aerial images was performed using csAG-HRNet by applying HRNet-v2 in combination with channel and spatial attention gates. HRNet-v2 consists of transition and fusion processes based on subnetworks according to various resolutions. The channel and spatial attention gates were applied in the network to efficiently learn important features. A channel attention gate assigns weights in accordance with the importance of each channel, and a spatial attention gate assigns weights in accordance with the importance of each pixel position for the entire channel. In csAG-HRNet, csAG modules consisting of a channel attention gate and a spatial attention gate were applied to each subnetwork of stage and fusion modules in the HRNet-v2 network. In experiments using two datasets, it was confirmed that csAG-HRNet could minimize false detections based on the shapes of large buildings and small nonbuilding objects compared to existing deep learning models.

Download Full-text

Comparative Analysis of Semantic Segmentation by Using Deep Learning Models on Retinal Vessel

10.1007/978-981-16-5348-3_25 ◽

2021 ◽

pp. 313-322

Author(s):

Twinkle Tiwari ◽

Mukesh Saraswat

Keyword(s):

Deep Learning ◽

Comparative Analysis ◽

Retinal Vessel ◽

Semantic Segmentation ◽

Learning Models

Download Full-text

Light deep learning models enriched with Entangled features for RGB-D semantic segmentation

Robotics and Autonomous Systems ◽

10.1016/j.robot.2021.103862 ◽

2021 ◽

pp. 103862

Author(s):

Matteo Terreran ◽

Stefano Ghidoni

Keyword(s):

Deep Learning ◽

Semantic Segmentation ◽

Learning Models

Download Full-text

Pyramid Bayesian Method for Model Uncertainty Evaluation of Semantic Segmentation in Autonomous Driving

Automotive Innovation ◽

10.1007/s42154-021-00165-x ◽

2022 ◽

Author(s):

Yang Zhao ◽

Wei Tian ◽

Hong Cheng

Keyword(s):

Deep Learning ◽

Model Uncertainty ◽

Network Performance ◽

Bayesian Method ◽

Semantic Segmentation ◽

Autonomous Driving ◽

Uncertainty Estimation ◽

Uncertainty Evaluation ◽

Learning Models ◽

The Public

AbstractWith the fast-developing deep learning models in the field of autonomous driving, the research on the uncertainty estimation of deep learning models has also prevailed. Herein, a pyramid Bayesian deep learning method is proposed for the model uncertainty evaluation of semantic segmentation. Semantic segmentation is one of the most important perception problems in understanding visual scene, which is critical for autonomous driving. This study to optimize Bayesian SegNet for uncertainty evaluation. This paper first simplifies the network structure of Bayesian SegNet by reducing the number of MC-Dropout layer and then introduces the pyramid pooling module to improve the performance of Bayesian SegNet. mIoU and mPAvPU are used as evaluation matrics to test the proposed method on the public Cityscapes dataset. The experimental results show that the proposed method improves the sampling effect of the Bayesian SegNet, shortens the sampling time, and improves the network performance.

Download Full-text

3D Perception for Autonomous Mobile Robots Navigation Using Deep Learning for Safe Zones Detection: A Comparative Study

10.14210/cotb.v12.p072-079 ◽

2021 ◽

Author(s):

Felipe Manfio Barbosa ◽

Fernando Santos Osório

Keyword(s):

Deep Learning ◽

Mobile Robots ◽

Autonomous Navigation ◽

Intelligent Systems ◽

Semantic Segmentation ◽

Intelligent Vehicles ◽

Autonomous Mobile Robots ◽

Learning Models ◽

Generative Adversarial Network ◽

Correct Operation

Computer vision plays an important role in intelligent systems, particularly for autonomous mobile robots and intelligent vehicles. It is essential to the correct operation of such systems, increasing safety for users/passengers and also for other people in the environment. One of its many levels of analysis is semantic segmentation, which provides powerful insights in scene understanding, a task of utmost importance in autonomous navigation. Recent developments have shown the power of deep learning models applied to semantic segmentation. Besides, 3D data shows up as a richer representation of the world. Although there are many studies comparing the performances of several semantic segmentation models, they mostly consider the task over 2D images and none of them include the recent GAN models in the analysis. In this paper, we carry out the study, implementation and comparison of recent deep learning models for 3D semantic image segmentation. We consider the FCN, SegNet and Pix2Pix models. The 3D images are captured indoors and gathered in a dataset created for the scope of this project. Our main objective is to evaluate and compare the models’ performances and efficiency in detecting obstacles, safe and unsafe zones for autonomous mobile robots navigation. Considering as metrics the mean IoU values, number of parameters and inference time, our experiments show that Pix2Pix, a recent Conditional Generative Adversarial Network, outperforms the FCN and SegNet models in the

Download Full-text

Review on Semantic Segmentation of Satellite Images Using Deep Learning

International Journal for Research in Applied Science and Engineering Technology ◽

10.22214/ijraset.2021.37204 ◽

2021 ◽

Vol 9 (VII) ◽

pp. 3820-3829

Author(s):

Chandra Pal Kushwah

Keyword(s):

Image Processing ◽

Deep Learning ◽

Medical Image Analysis ◽

Satellite Image ◽

Medical Science ◽

Semantic Segmentation ◽

Video Tracking ◽

Human Interaction ◽

Learning Models ◽

Short Period

Image segmentation for applications like scene understanding, medical image analysis, robotic vision, video tracking, improving reality, and image compression is a key subject of image processing and image evaluation. Semantic segmentation is an integral aspect of image comprehension and is essential for image processing tasks. Semantic segmentation is a complex process in computer vision applications. Many techniques have been developed, from self-sufficient cars, human interaction, robotics, medical science, agriculture, and so on, to tackle the issue.In a short period, satellite imagery will provide a lot of large-scale knowledge about the earth's surfaces, saving time. With the growth & development of satellite image sensors, the recorded object resolution was improved with advanced image processing techniques. Improving the performance of deep learning models in a broad range of vision applications, important work has recently been carried out to evaluate approaches for deep learning models in image segmentation.In this paper,a detailed overview provides onImage segmentation and describes its techniques likeregion, edge, feature, threshold, and model-based. Also, provide Semantic Segmentation, Satellite imageries, and Deep learning & its Techniques like-DNN, CNN, RNN, RBM, and so on.CNN is one of the efficient deep learning techniques among all of them that can be usedwith the U-net model in further work.

Download Full-text

A Deep Learning-Based Framework for Automated Extraction of Building Footprint Polygons from Very High-Resolution Aerial Imagery

Remote Sensing ◽

10.3390/rs13183630 ◽

2021 ◽

Vol 13 (18) ◽

pp. 3630

Author(s):

Ziming Li ◽

Qinchuan Xin ◽

Ying Sun ◽

Mengying Cao

Keyword(s):

Remote Sensing ◽

Deep Learning ◽

High Resolution ◽

Semantic Segmentation ◽

Aerial Imagery ◽

Learning Models ◽

Building Footprint ◽

Bounding Boxes ◽

Very High ◽

Segmentation Models

Accurate building footprint polygons provide essential data for a wide range of urban applications. While deep learning models have been proposed to extract pixel-based building areas from remote sensing imagery, the direct vectorization of pixel-based building maps often leads to building footprint polygons with irregular shapes that are inconsistent with real building boundaries, making it difficult to use them in geospatial analysis. In this study, we proposed a novel deep learning-based framework for automated extraction of building footprint polygons (DLEBFP) from very high-resolution aerial imagery by combining deep learning models for different tasks. Our approach uses the U-Net, Cascade R-CNN, and Cascade CNN deep learning models to obtain building segmentation maps, building bounding boxes, and building corners, respectively, from very high-resolution remote sensing images. We used Delaunay triangulation to construct building footprint polygons based on the detected building corners with the constraints of building bounding boxes and building segmentation maps. Experiments on the Wuhan University building dataset and ISPRS Vaihingen dataset indicate that DLEBFP can perform well in extracting high-quality building footprint polygons. Compared with the other semantic segmentation models and the vector map generalization method, DLEBFP is able to achieve comparable mapping accuracies with semantic segmentation models on a pixel basis and generate building footprint polygons with concise edges and vertices with regular shapes that are close to the reference data. The promising performance indicates that our method has the potential to extract accurate building footprint polygons from remote sensing images for applications in geospatial analysis.

Download Full-text