scholarly journals NPALOSS: NEIGHBORING PIXEL AFFINITY LOSS FOR SEMANTIC SEGMENTATION IN HIGH-RESOLUTION AERIAL IMAGERY

Author(s):  
Y. Feng ◽  
W. Diao ◽  
X. Sun ◽  
J. Li ◽  
K. Chen ◽  
...  

Abstract. The performance of semantic segmentation in high-resolution aerial imagery has been improved rapidly through the introduction of deep fully convolutional neural network (FCN). However, due to the complexity of object shapes and sizes, the labeling accuracy of small-sized objects and object boundaries still need to be improved. In this paper, we propose a neighboring pixel affinity loss (NPALoss) to improve the segmentation performance of these hard pixels. Specifically, we address the issues of how to determine the classifying difficulty of one pixel and how to get the suitable weight margin between well-classified pixels and hard pixels. Firstly, we convert the first problem into a problem that the pixel categories in the neighborhood are the same or different. Based on this idea, we build a neighboring pixel affinity map by counting the pixel-pair relationships for each pixel in the search region. Secondly, we investigate different weight transformation strategies for the affinity map to explore the suitable weight margin and avoid gradient overflow. The logarithm compression strategy is better than the normalization strategy, especially the common logarithm. Finally, combining the affinity map and logarithm compression strategy, we build NPALoss to adaptively assign different weights for each pixel. Comparative experiments are conducted on the ISPRS Vaihingen dataset and several commonly-used state-of-the-art networks. We demonstrate that our proposed approach can achieve promising results.

Author(s):  
Weihao Li ◽  
Michael Ying Yang

In this paper we explore semantic segmentation of man-made scenes using fully connected conditional random field (CRF). Images of man-made scenes display strong contextual dependencies in the spatial structures. Fully connected CRFs can model long-range connections within the image of man-made scenes and make use of contextual information of scene structures. The pairwise edge potentials of fully connected CRF models are defined by a linear combination of Gaussian kernels. Using filter-based mean field algorithm, the inference is very efficient. Our experimental results demonstrate that fully connected CRF performs better than previous state-of-the-art approaches on both eTRIMS dataset and LabelMeFacade dataset.


Sensors ◽  
2018 ◽  
Vol 18 (11) ◽  
pp. 3774 ◽  
Author(s):  
Xuran Pan ◽  
Lianru Gao ◽  
Bing Zhang ◽  
Fan Yang ◽  
Wenzhi Liao

Semantic segmentation of high-resolution aerial images is of great importance in certain fields, but the increasing spatial resolution brings large intra-class variance and small inter-class differences that can lead to classification ambiguities. Based on high-level contextual features, the deep convolutional neural network (DCNN) is an effective method to deal with semantic segmentation of high-resolution aerial imagery. In this work, a novel dense pyramid network (DPN) is proposed for semantic segmentation. The network starts with group convolutions to deal with multi-sensor data in channel wise to extract feature maps of each channel separately; by doing so, more information from each channel can be preserved. This process is followed by the channel shuffle operation to enhance the representation ability of the network. Then, four densely connected convolutional blocks are utilized to both extract and take full advantage of features. The pyramid pooling module combined with two convolutional layers are set to fuse multi-resolution and multi-sensor features through an effective global scenery prior manner, producing the probability graph for each class. Moreover, the median frequency balanced focal loss is proposed to replace the standard cross entropy loss in the training phase to deal with the class imbalance problem. We evaluate the dense pyramid network on the International Society for Photogrammetry and Remote Sensing (ISPRS) Vaihingen and Potsdam 2D semantic labeling dataset, and the results demonstrate that the proposed framework exhibits better performances, compared to the state of the art baseline.


Algorithms ◽  
2021 ◽  
Vol 14 (6) ◽  
pp. 159
Author(s):  
Feng Sun ◽  
Ajith Kumar V ◽  
Guanci Yang ◽  
Ansi Zhang ◽  
Yiyun Zhang

State-of-the-art semantic segmentation methods rely too much on complicated deep networks and thus cannot train efficiently. This paper introduces a novel Circle-U-Net architecture that exceeds the original U-Net on several standards. The proposed model includes circle connect layers, which is the backbone of ResUNet-a architecture. The model possesses a contracting part with residual bottleneck and circle connect layers that capture context and expanding paths, with sampling layers and merging layers for a pixel-wise localization. The results of the experiment show that the proposed Circle-U-Net achieves an improved accuracy of 5.6676%, 2.1587% IoU (Intersection of union, IoU) and can detect 67% classes greater than U-Net, which is better than current results.


2016 ◽  
Vol 28 (2) ◽  
pp. 257-285 ◽  
Author(s):  
Sarath Chandar ◽  
Mitesh M. Khapra ◽  
Hugo Larochelle ◽  
Balaraman Ravindran

Common representation learning (CRL), wherein different descriptions (or views) of the data are embedded in a common subspace, has been receiving a lot of attention recently. Two popular paradigms here are canonical correlation analysis (CCA)–based approaches and autoencoder (AE)–based approaches. CCA-based approaches learn a joint representation by maximizing correlation of the views when projected to the common subspace. AE-based methods learn a common representation by minimizing the error of reconstructing the two views. Each of these approaches has its own advantages and disadvantages. For example, while CCA-based approaches outperform AE-based approaches for the task of transfer learning, they are not as scalable as the latter. In this work, we propose an AE-based approach, correlational neural network (CorrNet), that explicitly maximizes correlation among the views when projected to the common subspace. Through a series of experiments, we demonstrate that the proposed CorrNet is better than AE and CCA with respect to its ability to learn correlated common representations. We employ CorrNet for several cross-language tasks and show that the representations learned using it perform better than the ones learned using other state-of-the-art approaches.


Author(s):  
Maria Dias ◽  
João Monteiro ◽  
Jacinto Estima ◽  
Joel Silva ◽  
Bruno Martins

2017 ◽  
Vol 9 (6) ◽  
pp. 522 ◽  
Author(s):  
Yu Liu ◽  
Duc Minh Nguyen ◽  
Nikos Deligiannis ◽  
Wenrui Ding ◽  
Adrian Munteanu

2021 ◽  
Vol 13 (12) ◽  
pp. 2292
Author(s):  
Oscar D. Pedrayes ◽  
Darío G. Lema ◽  
Daniel F. García ◽  
Rubén Usamentiaga ◽  
Ángela Alonso

Land use classification using aerial imagery can be complex. Characteristics such as ground sampling distance, resolution, number of bands and the information these bands convey are the keys to its accuracy. Random Forest is the most widely used approach but better and more modern alternatives do exist. In this paper, state-of-the-art methods are evaluated, consisting of semantic segmentation networks such as UNet and DeepLabV3+. In addition, two datasets based on aircraft and satellite imagery are generated as a new state of the art to test land use classification. These datasets, called UOPNOA and UOS2, are publicly available. In this work, the performance of these networks and the two datasets generated are evaluated. This paper demonstrates that ground sampling distance is the most important factor in obtaining good semantic segmentation results, but a suitable number of bands can be as important. This proves that both aircraft and satellite imagery can produce good results, although for different reasons. Finally, cost performance for an inference prototype is evaluated, comparing various Microsoft Azure architectures. The evaluation concludes that using a GPU is unnecessarily costly for deployment. A GPU need only be used for training.


Sign in / Sign up

Export Citation Format

Share Document