ESPNet: Efficient Spatial Pyramid of Dilated Convolutions for Semantic Segmentation

Real-Time Semantic Segmentation Network Based on Lite Reduced Atrous Spatial Pyramid Pooling Module Group

2020 5th International Conference on Control, Robotics and Cybernetics (CRC) ◽

10.1109/crc51253.2020.9253492 ◽

2020 ◽

Author(s):

Yangsheng Tian ◽

Fangyuan Chen ◽

Haihui Wang ◽

Shuiping Zhang

Keyword(s):

Real Time ◽

Semantic Segmentation ◽

Spatial Pyramid Pooling ◽

Spatial Pyramid

Download Full-text

Adaptive Context Encoding Module for Semantic Segmentation

Electronic Imaging ◽

10.2352/issn.2470-1173.2020.10.ipas-027 ◽

2020 ◽

Vol 2020 (10) ◽

pp. 27-1-27-7

Author(s):

Congcong Wang ◽

Faouzi Alaya Cheikh ◽

Azeddine Beghdadi ◽

Ole Jakob Elle

Keyword(s):

Neural Networks ◽

State Of The Art ◽

Experimental Studies ◽

Semantic Segmentation ◽

Multiple Scale ◽

Context Information ◽

Convolution Operation ◽

Sampling Locations ◽

Spatial Pyramid Pooling ◽

Spatial Pyramid

The object sizes in images are diverse, therefore, capturing multiple scale context information is essential for semantic segmentation. Existing context aggregation methods such as pyramid pooling module (PPM) and atrous spatial pyramid pooling (ASPP) employ different pooling size or atrous rate, such that multiple scale information is captured. However, the pooling sizes and atrous rates are chosen empirically. Rethinking of ASPP leads to our observation that learnable sampling locations of the convolution operation can endow the network learnable fieldof- view, thus the ability of capturing object context information adaptively. Following this observation, in this paper, we propose an adaptive context encoding (ACE) module based on deformable convolution operation where sampling locations of the convolution operation are learnable. Our ACE module can be embedded into other Convolutional Neural Networks (CNNs) easily for context aggregation. The effectiveness of the proposed module is demonstrated on Pascal-Context and ADE20K datasets. Although our proposed ACE only consists of three deformable convolution blocks, it outperforms PPM and ASPP in terms of mean Intersection of Union (mIoU) on both datasets. All the experimental studies confirm that our proposed module is effective compared to the state-of-the-art methods.

Download Full-text

Fully Convolutional Neural Network with Augmented Atrous Spatial Pyramid Pool and Fully Connected Fusion Path for High Resolution Remote Sensing Image Segmentation

Applied Sciences ◽

10.3390/app9091816 ◽

2019 ◽

Vol 9 (9) ◽

pp. 1816 ◽

Cited By ~ 12

Author(s):

Guangsheng Chen ◽

Chao Li ◽

Wei Wei ◽

Weipeng Jing ◽

Marcin Woźniak ◽

...

Keyword(s):

Neural Network ◽

Remote Sensing ◽

Image Segmentation ◽

High Resolution ◽

Semantic Segmentation ◽

Remote Sensing Image ◽

Dilated Convolution ◽

Segmentation Task ◽

Fully Connected ◽

Spatial Pyramid

Recent developments in Convolutional Neural Networks (CNNs) have allowed for the achievement of solid advances in semantic segmentation of high-resolution remote sensing (HRRS) images. Nevertheless, the problems of poor classification of small objects and unclear boundaries caused by the characteristics of the HRRS image data have not been fully considered by previous works. To tackle these challenging problems, we propose an improved semantic segmentation neural network, which adopts dilated convolution, a fully connected (FC) fusion path and pre-trained encoder for the semantic segmentation task of HRRS imagery. The network is built with the computationally-efficient DeepLabv3 architecture, with added Augmented Atrous Spatial Pyramid Pool and FC Fusion Path layers. Dilated convolution enlarges the receptive field of feature points without decreasing the feature map resolution. The improved neural network architecture enhances HRRS image segmentation, reaching the classification accuracy of 91%, and the precision of recognition of small objects is improved. The applicability of the improved model to the remote sensing image segmentation task is verified.

Download Full-text

Waterfall Atrous Spatial Pooling Architecture for Efficient Semantic Segmentation

Sensors ◽

10.3390/s19245361 ◽

2019 ◽

Vol 19 (24) ◽

pp. 5361 ◽

Cited By ~ 6

Author(s):

Bruno Artacho ◽

Andreas Savakis

Keyword(s):

Random Fields ◽

Conditional Random Fields ◽

State Of The Art ◽

Semantic Segmentation ◽

Training Time ◽

Network Parameters ◽

Spatial Pooling ◽

Memory Footprint ◽

Accuracy Increase ◽

Spatial Pyramid

We propose a new efficient architecture for semantic segmentation, based on a “Waterfall” Atrous Spatial Pooling architecture, that achieves a considerable accuracy increase while decreasing the number of network parameters and memory footprint. The proposed Waterfall architecture leverages the efficiency of progressive filtering in the cascade architecture while maintaining multiscale fields-of-view comparable to spatial pyramid configurations. Additionally, our method does not rely on a postprocessing stage with Conditional Random Fields, which further reduces complexity and required training time. We demonstrate that the Waterfall approach with a ResNet backbone is a robust and efficient architecture for semantic segmentation obtaining state-of-the-art results with significant reduction in the number of parameters for the Pascal VOC dataset and the Cityscapes dataset.

Download Full-text

Mixed spatial pyramid pooling for semantic segmentation

Applied Soft Computing ◽

10.1016/j.asoc.2020.106209 ◽

2020 ◽

Vol 91 ◽

pp. 106209

Author(s):

Zhengyu Xia ◽

Joohee Kim

Keyword(s):

Semantic Segmentation ◽

Spatial Pyramid Pooling ◽

Spatial Pyramid

Download Full-text

Knowledge and Spatial Pyramid Distance-Based Gated Graph Attention Network for Remote Sensing Semantic Segmentation

Remote Sensing ◽

10.3390/rs13071312 ◽

2021 ◽

Vol 13 (7) ◽

pp. 1312

Author(s):

Wei Cui ◽

Xin He ◽

Meng Yao ◽

Ziwei Wang ◽

Yuanjie Hao ◽

...

Keyword(s):

Remote Sensing ◽

Prior Knowledge ◽

Receptive Fields ◽

Semantic Segmentation ◽

Limited Range ◽

Research Area ◽

Spatial Relationships ◽

Attention Network ◽

Gating Mechanism ◽

Spatial Pyramid

The pixel-based semantic segmentation methods take pixels as recognitions units, and are restricted by the limited range of receptive fields, so they cannot carry richer and higher-level semantics. These reduce the accuracy of remote sensing (RS) semantic segmentation to a certain extent. Comparing with the pixel-based methods, the graph neural networks (GNNs) usually use objects as input nodes, so they not only have relatively small computational complexity, but also can carry richer semantic information. However, the traditional GNNs are more rely on the context information of the individual samples and lack geographic prior knowledge that reflects the overall situation of the research area. Therefore, these methods may be disturbed by the confusion of “different objects with the same spectrum” or “violating the first law of geography” in some areas. To address the above problems, we propose a remote sensing semantic segmentation model called knowledge and spatial pyramid distance-based gated graph attention network (KSPGAT), which is based on prior knowledge, spatial pyramid distance and a graph attention network (GAT) with gating mechanism. The model first uses superpixels (geographical objects) to form the nodes of a graph neural network and then uses a novel spatial pyramid distance recognition algorithm to recognize the spatial relationships. Finally, based on the integration of feature similarity and the spatial relationships of geographic objects, a multi-source attention mechanism and gating mechanism are designed to control the process of node aggregation, as a result, the high-level semantics, spatial relationships and prior knowledge can be introduced into a remote sensing semantic segmentation network. The experimental results show that our model improves the overall accuracy by 4.43% compared with the U-Net Network, and 3.80% compared with the baseline GAT network.

Download Full-text

Semantic segmentation using stride spatial pyramid pooling and dual attention decoder

Pattern Recognition ◽

10.1016/j.patcog.2020.107498 ◽

2020 ◽

Vol 107 ◽

pp. 107498 ◽

Cited By ~ 1

Author(s):

Chengli Peng ◽

Jiayi Ma

Keyword(s):

Semantic Segmentation ◽

Spatial Pyramid Pooling ◽

Spatial Pyramid

Download Full-text

A Real-Time Image Semantic Segmentation Method Based on Multilabel Classification

Mathematical Problems in Engineering ◽

10.1155/2021/9963974 ◽

2021 ◽

Vol 2021 ◽

pp. 1-13

Author(s):

Ran Jin ◽

Xiaozhen Han ◽

Tongrui Yu

Keyword(s):

Deep Learning ◽

Real Time ◽

Medical Image Analysis ◽

Semantic Segmentation ◽

Classification Method ◽

Network Parameter ◽

Multilabel Classification ◽

Proposed Model ◽

Multiple Data Sets ◽

Spatial Pyramid

Image semantic segmentation as a kind of technology has been playing a crucial part in intelligent driving, medical image analysis, video surveillance, and AR. However, since the scene needs to infer more semantics from video and audio clips and the request for real-time performance becomes stricter, whetherthe single-label classification method that was usually used before or the regular manual labeling cannot meet this end. Given the excellent performance of deep learning algorithms in extensive applications, the image semantic segmentation algorithm based on deep learning framework has been brought under the spotlight of development. This paper attempts to improve the ESPNet (Efficient Spatial Pyramid of Dilated Convolutions for Semantic Segmentation) based on the multilabel classification method by the following steps. First, the standard convolution is replaced by applying Receptive Field in Deep Convolutional Neural Network in the convolution layer, to the extent that every pixel in the covered area would facilitate the ultimate feature response. Second, the ASPP (Atrous Spatial Pyramid Pooling) module is improved based on the atrous convolution, and the DB-ASPP (Delate Batch Normalization-ASPP) is proposed as a way to reducing gridding artifacts due to the multilayer atrous convolution, acquiring multiscale information, and integrating the feature information in relation to the image set. Finally, the proposed model and regular models are subject to extensive tests and comparisons on a plurality of multiple data sets. Results show that the proposed model demonstrates a good accuracy of segmentation, the smallest network parameter at 0.3 M and the fastest speed of segmentation at 25 FPS.

Download Full-text

Road Extraction by Using Atrous Spatial Pyramid Pooling Integrated Encoder-Decoder Network and Structural Similarity Loss

Remote Sensing ◽

10.3390/rs11091015 ◽

2019 ◽

Vol 11 (9) ◽

pp. 1015 ◽

Cited By ~ 23

Author(s):

Hao He ◽

Dongfang Yang ◽

Shicheng Wang ◽

Shuyang Wang ◽

Yongfei Li

Keyword(s):

Remote Sensing ◽

Traffic Management ◽

Structural Similarity ◽

Semantic Segmentation ◽

Road Extraction ◽

Remote Sensing Images ◽

The Road ◽

Segmentation Methods ◽

Spatial Pyramid Pooling ◽

Spatial Pyramid

The technology used for road extraction from remote sensing images plays an important role in urban planning, traffic management, navigation, and other geographic applications. Although deep learning methods have greatly enhanced the development of road extractions in recent years, this technology is still in its infancy. Because the characteristics of road targets are complex, the accuracy of road extractions is still limited. In addition, the ambiguous prediction of semantic segmentation methods also makes the road extraction result blurry. In this study, we improved the performance of the road extraction network by integrating atrous spatial pyramid pooling (ASPP) with an Encoder-Decoder network. The proposed approach takes advantage of ASPP’s ability to extract multiscale features and the Encoder-Decoder network’s ability to extract detailed features. Therefore, it can achieve accurate and detailed road extraction results. For the first time, we utilized the structural similarity (SSIM) as a loss function for road extraction. Therefore, the ambiguous predictions in the extraction results can be removed, and the image quality of the extracted roads can be improved. The experimental results using the Massachusetts Road dataset show that our method achieves an F1-score of 83.5% and an SSIM of 0.893. Compared with the normal U-net, our method improves the F1-score by 2.6% and the SSIM by 0.18. Therefore, it is demonstrated that the proposed approach can extract roads from remote sensing images more effectively and clearly than the other compared methods.

Download Full-text

Spatial Pyramid Based Graph Reasoning for Semantic Segmentation

2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) ◽

10.1109/cvpr42600.2020.00897 ◽

2020 ◽

Cited By ~ 3

Author(s):

Xia Li ◽

Yibo Yang ◽

Qijie Zhao ◽

Tiancheng Shen ◽

Zhouchen Lin ◽

...

Keyword(s):

Semantic Segmentation ◽

Spatial Pyramid

Download Full-text