NPALOSS: NEIGHBORING PIXEL AFFINITY LOSS FOR SEMANTIC SEGMENTATION IN HIGH-RESOLUTION AERIAL IMAGERY

Abstract. The performance of semantic segmentation in high-resolution aerial imagery has been improved rapidly through the introduction of deep fully convolutional neural network (FCN). However, due to the complexity of object shapes and sizes, the labeling accuracy of small-sized objects and object boundaries still need to be improved. In this paper, we propose a neighboring pixel affinity loss (NPALoss) to improve the segmentation performance of these hard pixels. Specifically, we address the issues of how to determine the classifying difficulty of one pixel and how to get the suitable weight margin between well-classified pixels and hard pixels. Firstly, we convert the first problem into a problem that the pixel categories in the neighborhood are the same or different. Based on this idea, we build a neighboring pixel affinity map by counting the pixel-pair relationships for each pixel in the search region. Secondly, we investigate different weight transformation strategies for the affinity map to explore the suitable weight margin and avoid gradient overflow. The logarithm compression strategy is better than the normalization strategy, especially the common logarithm. Finally, combining the affinity map and logarithm compression strategy, we build NPALoss to adaptively assign different weights for each pixel. Comparative experiments are conducted on the ISPRS Vaihingen dataset and several commonly-used state-of-the-art networks. We demonstrate that our proposed approach can achieve promising results.

Download Full-text

Developing a multi-filter convolutional neural network for semantic segmentation using high-resolution aerial imagery and LiDAR data

ISPRS Journal of Photogrammetry and Remote Sensing ◽

10.1016/j.isprsjprs.2018.06.005 ◽

2018 ◽

Vol 143 ◽

pp. 3-14 ◽

Cited By ~ 22

Author(s):

Ying Sun ◽

Xinchang Zhang ◽

Qinchuan Xin ◽

Jianfeng Huang

Keyword(s):

Neural Network ◽

High Resolution ◽

Convolutional Neural Network ◽

Semantic Segmentation ◽

Aerial Imagery ◽

Lidar Data

Download Full-text

EFFICIENT SEMANTIC SEGMENTATION OF MAN-MADE SCENES USING FULLY-CONNECTED CONDITIONAL RANDOM FIELD

ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences ◽

10.5194/isprs-archives-xli-b3-633-2016 ◽

2016 ◽

Vol XLI-B3 ◽

pp. 633-640

Author(s):

Weihao Li ◽

Michael Ying Yang

Keyword(s):

Random Field ◽

State Of The Art ◽

Conditional Random Field ◽

Contextual Information ◽

Mean Field ◽

Semantic Segmentation ◽

Gaussian Kernels ◽

Previous State ◽

Fully Connected ◽

Better Than

In this paper we explore semantic segmentation of man-made scenes using fully connected conditional random field (CRF). Images of man-made scenes display strong contextual dependencies in the spatial structures. Fully connected CRFs can model long-range connections within the image of man-made scenes and make use of contextual information of scene structures. The pairwise edge potentials of fully connected CRF models are defined by a linear combination of Gaussian kernels. Using filter-based mean field algorithm, the inference is very efficient. Our experimental results demonstrate that fully connected CRF performs better than previous state-of-the-art approaches on both eTRIMS dataset and LabelMeFacade dataset.

Download Full-text

Semantic segmentation for high-resolution aerial imagery using multi-skip network and Markov random fields

2017 IEEE International Conference on Unmanned Systems (ICUS) ◽

10.1109/icus.2017.8278309 ◽

2017 ◽

Cited By ~ 1

Author(s):

Jiankun Li ◽

Wenrui Ding ◽

Hongguang Li ◽

Chunlei Liu

Keyword(s):

High Resolution ◽

Random Fields ◽

Markov Random Fields ◽

Semantic Segmentation ◽

Aerial Imagery ◽

Markov Random

Download Full-text

High-Resolution Aerial Imagery Semantic Labeling with Dense Pyramid Network

Sensors ◽

10.3390/s18113774 ◽

2018 ◽

Vol 18 (11) ◽

pp. 3774 ◽

Cited By ~ 9

Author(s):

Xuran Pan ◽

Lianru Gao ◽

Bing Zhang ◽

Fan Yang ◽

Wenzhi Liao

Keyword(s):

High Resolution ◽

Class Imbalance ◽

Semantic Segmentation ◽

Aerial Imagery ◽

Aerial Images ◽

Sensor Data ◽

Median Frequency ◽

Feature Maps ◽

Class Imbalance Problem ◽

Semantic Labeling

Semantic segmentation of high-resolution aerial images is of great importance in certain fields, but the increasing spatial resolution brings large intra-class variance and small inter-class differences that can lead to classification ambiguities. Based on high-level contextual features, the deep convolutional neural network (DCNN) is an effective method to deal with semantic segmentation of high-resolution aerial imagery. In this work, a novel dense pyramid network (DPN) is proposed for semantic segmentation. The network starts with group convolutions to deal with multi-sensor data in channel wise to extract feature maps of each channel separately; by doing so, more information from each channel can be preserved. This process is followed by the channel shuffle operation to enhance the representation ability of the network. Then, four densely connected convolutional blocks are utilized to both extract and take full advantage of features. The pyramid pooling module combined with two convolutional layers are set to fuse multi-resolution and multi-sensor features through an effective global scenery prior manner, producing the probability graph for each class. Moreover, the median frequency balanced focal loss is proposed to replace the standard cross entropy loss in the training phase to deal with the class imbalance problem. We evaluate the dense pyramid network on the International Society for Photogrammetry and Remote Sensing (ISPRS) Vaihingen and Potsdam 2D semantic labeling dataset, and the results demonstrate that the proposed framework exhibits better performances, compared to the state of the art baseline.

Download Full-text

Circle-U-Net: An Efficient Architecture for Semantic Segmentation

Algorithms ◽

10.3390/a14060159 ◽

2021 ◽

Vol 14 (6) ◽

pp. 159

Author(s):

Feng Sun ◽

Ajith Kumar V ◽

Guanci Yang ◽

Ansi Zhang ◽

Yiyun Zhang

Keyword(s):

State Of The Art ◽

Semantic Segmentation ◽

Proposed Model ◽

Segmentation Methods ◽

Deep Networks ◽

Improved Accuracy ◽

Better Than

State-of-the-art semantic segmentation methods rely too much on complicated deep networks and thus cannot train efficiently. This paper introduces a novel Circle-U-Net architecture that exceeds the original U-Net on several standards. The proposed model includes circle connect layers, which is the backbone of ResUNet-a architecture. The model possesses a contracting part with residual bottleneck and circle connect layers that capture context and expanding paths, with sampling layers and merging layers for a pixel-wise localization. The results of the experiment show that the proposed Circle-U-Net achieves an improved accuracy of 5.6676%, 2.1587% IoU (Intersection of union, IoU) and can detect 67% classes greater than U-Net, which is better than current results.

Download Full-text

Correlational Neural Networks

Neural Computation ◽

10.1162/neco_a_00801 ◽

2016 ◽

Vol 28 (2) ◽

pp. 257-285 ◽

Cited By ~ 33

Author(s):

Sarath Chandar ◽

Mitesh M. Khapra ◽

Hugo Larochelle ◽

Balaraman Ravindran

Keyword(s):

Canonical Correlation ◽

State Of The Art ◽

Representation Learning ◽

Advantages And Disadvantages ◽

Common Representation ◽

Series Of Experiments ◽

The Common ◽

Cross Language ◽

Joint Representation ◽

Better Than

Common representation learning (CRL), wherein different descriptions (or views) of the data are embedded in a common subspace, has been receiving a lot of attention recently. Two popular paradigms here are canonical correlation analysis (CCA)–based approaches and autoencoder (AE)–based approaches. CCA-based approaches learn a joint representation by maximizing correlation of the views when projected to the common subspace. AE-based methods learn a common representation by minimizing the error of reconstructing the two views. Each of these approaches has its own advantages and disadvantages. For example, while CCA-based approaches outperform AE-based approaches for the task of transfer learning, they are not as scalable as the latter. In this work, we propose an AE-based approach, correlational neural network (CorrNet), that explicitly maximizes correlation among the views when projected to the common subspace. Through a series of experiments, we demonstrate that the proposed CorrNet is better than AE and CCA with respect to its ability to learn correlated common representations. We employ CorrNet for several cross-language tasks and show that the representations learned using it perform better than the ones learned using other state-of-the-art approaches.

Download Full-text

Semantic Segmentation of High-Resolution Aerial Imagery with W-Net Models

Progress in Artificial Intelligence - Lecture Notes in Computer Science ◽

10.1007/978-3-030-30244-3_40 ◽

2019 ◽

pp. 486-498 ◽

Cited By ~ 1

Author(s):

Maria Dias ◽

João Monteiro ◽

Jacinto Estima ◽

Joel Silva ◽

Bruno Martins

Keyword(s):

High Resolution ◽

Semantic Segmentation ◽

Aerial Imagery

Download Full-text

Hourglass-ShapeNetwork Based Semantic Segmentation for High Resolution Aerial Imagery

Remote Sensing ◽

10.3390/rs9060522 ◽

2017 ◽

Vol 9 (6) ◽

pp. 522 ◽

Cited By ~ 55

Author(s):

Yu Liu ◽

Duc Minh Nguyen ◽

Nikos Deligiannis ◽

Wenrui Ding ◽

Adrian Munteanu

Keyword(s):

High Resolution ◽

Semantic Segmentation ◽

Aerial Imagery

Download Full-text

Efficient Multi-Class Semantic Segmentation of High Resolution Aerial Imagery with Dilated LinkNet

IGARSS 2019 - 2019 IEEE International Geoscience and Remote Sensing Symposium ◽

10.1109/igarss.2019.8900281 ◽

2019 ◽

Author(s):

Qingtian Zhu ◽

Yumin Zheng ◽

Yulai Jiang ◽

Junli Yang

Keyword(s):

High Resolution ◽

Semantic Segmentation ◽

Aerial Imagery

Download Full-text

Evaluation of Semantic Segmentation Methods for Land Use with Spectral Imaging Using Sentinel-2 and PNOA Imagery

Remote Sensing ◽

10.3390/rs13122292 ◽

2021 ◽

Vol 13 (12) ◽

pp. 2292

Author(s):

Oscar D. Pedrayes ◽

Darío G. Lema ◽

Daniel F. García ◽

Rubén Usamentiaga ◽

Ángela Alonso

Keyword(s):

Land Use ◽

Satellite Imagery ◽

State Of The Art ◽

Semantic Segmentation ◽

Aerial Imagery ◽

Land Use Classification ◽

Cost Performance ◽

Sampling Distance ◽

Segmentation Methods ◽

Sentinel 2

Land use classification using aerial imagery can be complex. Characteristics such as ground sampling distance, resolution, number of bands and the information these bands convey are the keys to its accuracy. Random Forest is the most widely used approach but better and more modern alternatives do exist. In this paper, state-of-the-art methods are evaluated, consisting of semantic segmentation networks such as UNet and DeepLabV3+. In addition, two datasets based on aircraft and satellite imagery are generated as a new state of the art to test land use classification. These datasets, called UOPNOA and UOS2, are publicly available. In this work, the performance of these networks and the two datasets generated are evaluated. This paper demonstrates that ground sampling distance is the most important factor in obtaining good semantic segmentation results, but a suitable number of bands can be as important. This proves that both aircraft and satellite imagery can produce good results, although for different reasons. Finally, cost performance for an inference prototype is evaluated, comparing various Microsoft Azure architectures. The evaluation concludes that using a GPU is unnecessarily costly for deployment. A GPU need only be used for training.

Download Full-text