SEMANTIC SEGMENTATION OF AERIAL IMAGES WITH AN ENSEMBLE OF CNNS

This paper describes a deep learning approach to semantic segmentation of very high resolution (aerial) images. Deep neural architectures hold the promise of end-to-end learning from raw images, making heuristic feature design obsolete. Over the last decade this idea has seen a revival, and in recent years deep convolutional neural networks (CNNs) have emerged as the method of choice for a range of image interpretation tasks like visual recognition and object detection. Still, standard CNNs do not lend themselves to per-pixel semantic segmentation, mainly because one of their fundamental principles is to gradually aggregate information over larger and larger image regions, making it hard to disentangle contributions from different pixels. Very recently two extensions of the CNN framework have made it possible to trace the semantic information back to a precise pixel position: deconvolutional network layers undo the spatial downsampling, and Fully Convolution Networks (FCNs) modify the fully connected classification layers of the network in such a way that the location of individual activations remains explicit. We design a FCN which takes as input intensity and range data and, with the help of aggressive deconvolution and recycling of early network layers, converts them into a pixelwise classification at full resolution. We discuss design choices and intricacies of such a network, and demonstrate that an ensemble of several networks achieves excellent results on challenging data such as the <i>ISPRS semantic labeling benchmark</i>, using only the raw data as input.

Download Full-text

SEMANTIC SEGMENTATION OF AERIAL IMAGES WITH AN ENSEMBLE OF CNNS

ISPRS Annals of Photogrammetry Remote Sensing and Spatial Information Sciences ◽

10.5194/isprsannals-iii-3-473-2016 ◽

2016 ◽

Vol III-3 ◽

pp. 473-480 ◽

Cited By ~ 93

Author(s):

D. Marmanis ◽

J. D. Wegner ◽

S. Galliani ◽

K. Schindler ◽

M. Datcu ◽

...

Keyword(s):

Visual Recognition ◽

Image Interpretation ◽

Semantic Segmentation ◽

Aerial Images ◽

Range Data ◽

Deep Convolutional Neural Networks ◽

Network Layers ◽

Full Resolution ◽

Aggregate Information ◽

Semantic Labeling

Download Full-text

Class-Wise Fully Convolutional Network for Semantic Segmentation of Remote Sensing Images

Remote Sensing ◽

10.3390/rs13163211 ◽

2021 ◽

Vol 13 (16) ◽

pp. 3211

Author(s):

Tian Tian ◽

Zhengquan Chu ◽

Qian Hu ◽

Li Ma

Keyword(s):

Remote Sensing ◽

Image Interpretation ◽

Semantic Segmentation ◽

Remote Sensing Images ◽

Feature Maps ◽

Convolutional Network ◽

Fully Convolutional Network ◽

Semantic Labeling ◽

Benchmark Datasets ◽

Semantic Label

Semantic segmentation is a fundamental task in remote sensing image interpretation, which aims to assign a semantic label for every pixel in the given image. Accurate semantic segmentation is still challenging due to the complex distributions of various ground objects. With the development of deep learning, a series of segmentation networks represented by fully convolutional network (FCN) has made remarkable progress on this problem, but the segmentation accuracy is still far from expectations. This paper focuses on the importance of class-specific features of different land cover objects, and presents a novel end-to-end class-wise processing framework for segmentation. The proposed class-wise FCN (C-FCN) is shaped in the form of an encoder-decoder structure with skip-connections, in which the encoder is shared to produce general features for all categories and the decoder is class-wise to process class-specific features. To be detailed, class-wise transition (CT), class-wise up-sampling (CU), class-wise supervision (CS), and class-wise classification (CC) modules are designed to achieve the class-wise transfer, recover the resolution of class-wise feature maps, bridge the encoder and modified decoder, and implement class-wise classifications, respectively. Class-wise and group convolutions are adopted in the architecture with regard to the control of parameter numbers. The method is tested on the public ISPRS 2D semantic labeling benchmark datasets. Experimental results show that the proposed C-FCN significantly improves the segmentation performances compared with many state-of-the-art FCN-based networks, revealing its potentials on accurate segmentation of complex remote sensing images.

Download Full-text

High-Resolution Aerial Imagery Semantic Labeling with Dense Pyramid Network

Sensors ◽

10.3390/s18113774 ◽

2018 ◽

Vol 18 (11) ◽

pp. 3774 ◽

Cited By ~ 9

Author(s):

Xuran Pan ◽

Lianru Gao ◽

Bing Zhang ◽

Fan Yang ◽

Wenzhi Liao

Keyword(s):

High Resolution ◽

Class Imbalance ◽

Semantic Segmentation ◽

Aerial Imagery ◽

Aerial Images ◽

Sensor Data ◽

Median Frequency ◽

Feature Maps ◽

Class Imbalance Problem ◽

Semantic Labeling

Semantic segmentation of high-resolution aerial images is of great importance in certain fields, but the increasing spatial resolution brings large intra-class variance and small inter-class differences that can lead to classification ambiguities. Based on high-level contextual features, the deep convolutional neural network (DCNN) is an effective method to deal with semantic segmentation of high-resolution aerial imagery. In this work, a novel dense pyramid network (DPN) is proposed for semantic segmentation. The network starts with group convolutions to deal with multi-sensor data in channel wise to extract feature maps of each channel separately; by doing so, more information from each channel can be preserved. This process is followed by the channel shuffle operation to enhance the representation ability of the network. Then, four densely connected convolutional blocks are utilized to both extract and take full advantage of features. The pyramid pooling module combined with two convolutional layers are set to fuse multi-resolution and multi-sensor features through an effective global scenery prior manner, producing the probability graph for each class. Moreover, the median frequency balanced focal loss is proposed to replace the standard cross entropy loss in the training phase to deal with the class imbalance problem. We evaluate the dense pyramid network on the International Society for Photogrammetry and Remote Sensing (ISPRS) Vaihingen and Potsdam 2D semantic labeling dataset, and the results demonstrate that the proposed framework exhibits better performances, compared to the state of the art baseline.

Download Full-text

Attention-Based Context Aware Network for Semantic Comprehension of Aerial Scenery

Sensors ◽

10.3390/s21061983 ◽

2021 ◽

Vol 21 (6) ◽

pp. 1983

Author(s):

Weipeng Shi ◽

Wenhu Qin ◽

Zhonghua Yun ◽

Peng Ping ◽

Kaiyang Wu ◽

...

Keyword(s):

High Resolution ◽

Semantic Segmentation ◽

Aerial Images ◽

Aerial Image ◽

Convolutional Network ◽

Convolutional Networks ◽

Fully Convolutional Networks ◽

Semantic Labeling ◽

Autonomous Cars ◽

High Resolution Images

It is essential for researchers to have a proper interpretation of remote sensing images (RSIs) and precise semantic labeling of their component parts. Although FCN (Fully Convolutional Networks)-like deep convolutional network architectures have been widely applied in the perception of autonomous cars, there are still two challenges in the semantic segmentation of RSIs. The first is to identify details in high-resolution images with complex scenes and to solve the class-mismatch issues; the second is to capture the edge of objects finely without being confused by the surroundings. HRNET has the characteristics of maintaining high-resolution representation by fusing feature information with parallel multi-resolution convolution branches. We adopt HRNET as a backbone and propose to incorporate the Class-Oriented Region Attention Module (CRAM) and Class-Oriented Context Fusion Module (CCFM) to analyze the relationships between classes and patch regions and between classes and local or global pixels, respectively. Thus, the perception capability of the model for the detailed part in the aerial image can be enhanced. We leverage these modules to develop an end-to-end semantic segmentation model for aerial images and validate it on the ISPRS Potsdam and Vaihingen datasets. The experimental results show that our model improves the baseline accuracy and outperforms some commonly used CNN architectures.

Download Full-text

C3Net: Cross-Modal Feature Recalibrated, Cross-Scale Semantic Aggregated and Compact Network for Semantic Segmentation of Multi-Modal High-Resolution Aerial Images

Remote Sensing ◽

10.3390/rs13030528 ◽

2021 ◽

Vol 13 (3) ◽

pp. 528

Author(s):

Zhiying Cao ◽

Wenhui Diao ◽

Xian Sun ◽

Xiaode Lyu ◽

Menglong Yan ◽

...

Keyword(s):

Remote Sensing ◽

Image Interpretation ◽

Model Performance ◽

Semantic Segmentation ◽

Aerial Images ◽

Superior Performance ◽

Model Parameters ◽

Complementary Information ◽

Modal Data ◽

The One

Semantic segmentation of multi-modal remote sensing images is an important branch of remote sensing image interpretation. Multi-modal data has been proven to provide rich complementary information to deal with complex scenes. In recent years, semantic segmentation based on deep learning methods has made remarkable achievements. It is common to simply concatenate multi-modal data or use parallel branches to extract multi-modal features separately. However, most existing works ignore the effects of noise and redundant features from different modalities, which may not lead to satisfactory results. On the one hand, existing networks do not learn the complementary information of different modalities and suppress the mutual interference between different modalities, which may lead to a decrease in segmentation accuracy. On the other hand, the introduction of multi-modal data greatly increases the running time of the pixel-level dense prediction. In this work, we propose an efficient C3Net that strikes a balance between speed and accuracy. More specifically, C3Net contains several backbones for extracting features of different modalities. Then, a plug-and-play module is designed to effectively recalibrate and aggregate multi-modal features. In order to reduce the number of model parameters while remaining the model performance, we redesign the semantic contextual extraction module based on the lightweight convolutional groups. Besides, a multi-level knowledge distillation strategy is proposed to improve the performance of the compact model. Experiments on ISPRS Vaihingen dataset demonstrate the superior performance of C3Net with 15× fewer FLOPs than the state-of-the-art baseline network while providing comparable overall accuracy.

Download Full-text

Orchard Mapping with Deep Learning Semantic Segmentation

Sensors ◽

10.3390/s21113813 ◽

2021 ◽

Vol 21 (11) ◽

pp. 3813

Author(s):

Athanasios Anagnostis ◽

Aristotelis C. Tagarakis ◽

Dimitrios Kateris ◽

Vasileios Moysiadis ◽

Claus Grøn Sørensen ◽

...

Keyword(s):

Neural Network ◽

Deep Learning ◽

Semantic Segmentation ◽

Automated Detection ◽

Aerial Images ◽

Training Dataset ◽

Field Boundary ◽

Different Seasons ◽

Detection And Localization ◽

Different Levels

This study aimed to propose an approach for orchard trees segmentation using aerial images based on a deep learning convolutional neural network variant, namely the U-net network. The purpose was the automated detection and localization of the canopy of orchard trees under various conditions (i.e., different seasons, different tree ages, different levels of weed coverage). The implemented dataset was composed of images from three different walnut orchards. The achieved variability of the dataset resulted in obtaining images that fell under seven different use cases. The best-trained model achieved 91%, 90%, and 87% accuracy for training, validation, and testing, respectively. The trained model was also tested on never-before-seen orthomosaic images or orchards based on two methods (oversampling and undersampling) in order to tackle issues with out-of-the-field boundary transparent pixels from the image. Even though the training dataset did not contain orthomosaic images, it achieved performance levels that reached up to 99%, demonstrating the robustness of the proposed approach.

Download Full-text

PCAN—Part-Based Context Attention Network for Thermal Power Plant Detection in Remote Sensing Imagery

Remote Sensing ◽

10.3390/rs13071243 ◽

2021 ◽

Vol 13 (7) ◽

pp. 1243

Author(s):

Wenxin Yin ◽

Wenhui Diao ◽

Peijin Wang ◽

Xin Gao ◽

Ya Li ◽

...

Keyword(s):

Remote Sensing ◽

Power Plants ◽

State Of The Art ◽

Thermal Power ◽

Image Interpretation ◽

Remote Sensing Image ◽

Thermal Power Plants ◽

Average Precision ◽

Deep Convolutional Neural Networks ◽

Multi Scale

The detection of Thermal Power Plants (TPPs) is a meaningful task for remote sensing image interpretation. It is a challenging task, because as facility objects TPPs are composed of various distinctive and irregular components. In this paper, we propose a novel end-to-end detection framework for TPPs based on deep convolutional neural networks. Specifically, based on the RetinaNet one-stage detector, a context attention multi-scale feature extraction network is proposed to fuse global spatial attention to strengthen the ability in representing irregular objects. In addition, we design a part-based attention module to adapt to TPPs containing distinctive components. Experiments show that the proposed method outperforms the state-of-the-art methods and can achieve 68.15% mean average precision.

Download Full-text

Understanding Rooftop PV Panel Semantic Segmentation of Satellite and Aerial Images for Better Using Machine Learning

Advances in Applied Energy ◽

10.1016/j.adapen.2021.100057 ◽

2021 ◽

pp. 100057

Author(s):

Peiran Li ◽

Haoran Zhang ◽

Zhiling Guo ◽

Suxing Lyu ◽

Jinyu Chen ◽

...

Keyword(s):

Machine Learning ◽

Semantic Segmentation ◽

Aerial Images ◽

Pv Panel

Download Full-text

Hybrid Multiple Attention Network for Semantic Segmentation in Aerial Images

IEEE Transactions on Geoscience and Remote Sensing ◽

10.1109/tgrs.2021.3065112 ◽

2021 ◽

pp. 1-18

Author(s):

Ruigang Niu ◽

Xian Sun ◽

Yu Tian ◽

Wenhui Diao ◽

Kaiqiang Chen ◽

...

Keyword(s):

Semantic Segmentation ◽

Aerial Images ◽

Attention Network

Download Full-text

Towards Scalable Economic Photovoltaic Potential Analysis Using Aerial Images and Deep Learning

Energies ◽

10.3390/en14133800 ◽

2021 ◽

Vol 14 (13) ◽

pp. 3800

Author(s):

Sebastian Krapf ◽

Nils Kemmerzell ◽

Syed Khawaja Haseeb Khawaja Haseeb Uddin ◽

Manuel Hack Hack Vázquez ◽

Fabian Netzler ◽

...

Keyword(s):

Deep Learning ◽

System Analysis ◽

State Of The Art ◽

Critical Role ◽

Semantic Segmentation ◽

Energy System ◽

Aerial Images ◽

Potential Analysis ◽

3D Data ◽

Challenges And Opportunities

Roof-mounted photovoltaic systems play a critical role in the global transition to renewable energy generation. An analysis of roof photovoltaic potential is an important tool for supporting decision-making and for accelerating new installations. State of the art uses 3D data to conduct potential analyses with high spatial resolution, limiting the study area to places with available 3D data. Recent advances in deep learning allow the required roof information from aerial images to be extracted. Furthermore, most publications consider the technical photovoltaic potential, and only a few publications determine the photovoltaic economic potential. Therefore, this paper extends state of the art by proposing and applying a methodology for scalable economic photovoltaic potential analysis using aerial images and deep learning. Two convolutional neural networks are trained for semantic segmentation of roof segments and superstructures and achieve an Intersection over Union values of 0.84 and 0.64, respectively. We calculated the internal rate of return of each roof segment for 71 buildings in a small study area. A comparison of this paper’s methodology with a 3D-based analysis discusses its benefits and disadvantages. The proposed methodology uses only publicly available data and is potentially scalable to the global level. However, this poses a variety of research challenges and opportunities, which are summarized with a focus on the application of deep learning, economic photovoltaic potential analysis, and energy system analysis.

Download Full-text