Semantic segmentation using stride spatial pyramid pooling and dual attention decoder

Real-Time Semantic Segmentation Network Based on Lite Reduced Atrous Spatial Pyramid Pooling Module Group

2020 5th International Conference on Control, Robotics and Cybernetics (CRC) ◽

10.1109/crc51253.2020.9253492 ◽

2020 ◽

Author(s):

Yangsheng Tian ◽

Fangyuan Chen ◽

Haihui Wang ◽

Shuiping Zhang

Keyword(s):

Real Time ◽

Semantic Segmentation ◽

Spatial Pyramid Pooling ◽

Spatial Pyramid

Download Full-text

Adaptive Context Encoding Module for Semantic Segmentation

Electronic Imaging ◽

10.2352/issn.2470-1173.2020.10.ipas-027 ◽

2020 ◽

Vol 2020 (10) ◽

pp. 27-1-27-7

Author(s):

Congcong Wang ◽

Faouzi Alaya Cheikh ◽

Azeddine Beghdadi ◽

Ole Jakob Elle

Keyword(s):

Neural Networks ◽

State Of The Art ◽

Experimental Studies ◽

Semantic Segmentation ◽

Multiple Scale ◽

Context Information ◽

Convolution Operation ◽

Sampling Locations ◽

Spatial Pyramid Pooling ◽

Spatial Pyramid

The object sizes in images are diverse, therefore, capturing multiple scale context information is essential for semantic segmentation. Existing context aggregation methods such as pyramid pooling module (PPM) and atrous spatial pyramid pooling (ASPP) employ different pooling size or atrous rate, such that multiple scale information is captured. However, the pooling sizes and atrous rates are chosen empirically. Rethinking of ASPP leads to our observation that learnable sampling locations of the convolution operation can endow the network learnable fieldof- view, thus the ability of capturing object context information adaptively. Following this observation, in this paper, we propose an adaptive context encoding (ACE) module based on deformable convolution operation where sampling locations of the convolution operation are learnable. Our ACE module can be embedded into other Convolutional Neural Networks (CNNs) easily for context aggregation. The effectiveness of the proposed module is demonstrated on Pascal-Context and ADE20K datasets. Although our proposed ACE only consists of three deformable convolution blocks, it outperforms PPM and ASPP in terms of mean Intersection of Union (mIoU) on both datasets. All the experimental studies confirm that our proposed module is effective compared to the state-of-the-art methods.

Download Full-text

Mixed spatial pyramid pooling for semantic segmentation

Applied Soft Computing ◽

10.1016/j.asoc.2020.106209 ◽

2020 ◽

Vol 91 ◽

pp. 106209

Author(s):

Zhengyu Xia ◽

Joohee Kim

Keyword(s):

Semantic Segmentation ◽

Spatial Pyramid Pooling ◽

Spatial Pyramid

Download Full-text

Road Extraction by Using Atrous Spatial Pyramid Pooling Integrated Encoder-Decoder Network and Structural Similarity Loss

Remote Sensing ◽

10.3390/rs11091015 ◽

2019 ◽

Vol 11 (9) ◽

pp. 1015 ◽

Cited By ~ 23

Author(s):

Hao He ◽

Dongfang Yang ◽

Shicheng Wang ◽

Shuyang Wang ◽

Yongfei Li

Keyword(s):

Remote Sensing ◽

Traffic Management ◽

Structural Similarity ◽

Semantic Segmentation ◽

Road Extraction ◽

Remote Sensing Images ◽

The Road ◽

Segmentation Methods ◽

Spatial Pyramid Pooling ◽

Spatial Pyramid

The technology used for road extraction from remote sensing images plays an important role in urban planning, traffic management, navigation, and other geographic applications. Although deep learning methods have greatly enhanced the development of road extractions in recent years, this technology is still in its infancy. Because the characteristics of road targets are complex, the accuracy of road extractions is still limited. In addition, the ambiguous prediction of semantic segmentation methods also makes the road extraction result blurry. In this study, we improved the performance of the road extraction network by integrating atrous spatial pyramid pooling (ASPP) with an Encoder-Decoder network. The proposed approach takes advantage of ASPP’s ability to extract multiscale features and the Encoder-Decoder network’s ability to extract detailed features. Therefore, it can achieve accurate and detailed road extraction results. For the first time, we utilized the structural similarity (SSIM) as a loss function for road extraction. Therefore, the ambiguous predictions in the extraction results can be removed, and the image quality of the extracted roads can be improved. The experimental results using the Massachusetts Road dataset show that our method achieves an F1-score of 83.5% and an SSIM of 0.893. Compared with the normal U-net, our method improves the F1-score by 2.6% and the SSIM by 0.18. Therefore, it is demonstrated that the proposed approach can extract roads from remote sensing images more effectively and clearly than the other compared methods.

Download Full-text

AtICNet: semantic segmentation with atrous spatial pyramid pooling in image cascade network

EURASIP Journal on Wireless Communications and Networking ◽

10.1186/s13638-019-1445-x ◽

2019 ◽

Vol 2019 (1) ◽

Cited By ~ 1

Author(s):

Jin Chen ◽

Chuanya Wang ◽

Ying Tong

Keyword(s):

Semantic Segmentation ◽

Spatial Pyramid Pooling ◽

Spatial Pyramid

Download Full-text

Scale-aware spatial pyramid pooling with both encoder-mask and scale-attention for semantic segmentation

Neurocomputing ◽

10.1016/j.neucom.2019.11.042 ◽

2020 ◽

Vol 383 ◽

pp. 174-182 ◽

Cited By ~ 1

Author(s):

Feng Zhou ◽

Yong Hu ◽

Xukun Shen

Keyword(s):

Semantic Segmentation ◽

Spatial Pyramid Pooling ◽

Spatial Pyramid

Download Full-text

Road Network Extraction Using Atrous Spatial Pyramid Pooling

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.h74590.78919 ◽

2019 ◽

Vol 8 (9) ◽

pp. 31-33

Keyword(s):

Road Network ◽

Spatial Information ◽

Semantic Segmentation ◽

Low Level ◽

Multi Scale ◽

Road Network Extraction ◽

Proposed Model ◽

Spatial Pyramid Pooling ◽

Segmentation Image ◽

Spatial Pyramid

Road extraction from satellite images has several Applications such as geographic information system (GIS). Having an accurate and up-to-date road network database will facilitate transportation, disaster management and GPS navigation. Most active field of research for automatic extraction of road network involves semantic segmentation using convolutional neural network (CNN). Although they can produce accurate results, typically the models give up performance for accuracy and vice-versa. In this paper, we are proposing architecture for semantic segmentation of road networks using Atrous Spatial Pyramid Pooling (ASPP). The network contains residual blocks for extracting low level features. Atrous convolutions with different dilation rates are taken and spatial pyramid pooling is performed on these features for extracting the spatial information. The low level features from residual blocks are added to the multi scale context information to produce the final segmentation image. Our proposed model significantly reduces the number of parameters that are required to train the model. The proposed model was trained on the Massachusetts roads dataset and the results have shown that our model produces superior results than that of popular state-of-the art models.

Download Full-text

Learning Visual Words for Weakly-Supervised Semantic Segmentation

Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2021/136 ◽

2021 ◽

Author(s):

Lixiang Ru ◽

Bo Du ◽

Chen Wu

Keyword(s):

State Of The Art ◽

Local Maximum ◽

Semantic Segmentation ◽

Input Image ◽

Feature Maps ◽

Visual Words ◽

Fine Grained ◽

Spatial Pyramid Pooling ◽

Weakly Supervised ◽

Spatial Pyramid

Current weakly-supervised semantic segmentation (WSSS) methods with image-level labels mainly adopt class activation maps (CAM) to generate the initial pseudo labels. However, CAM usually only identifies the most discriminative object extents, which is attributed to the fact that the network doesn't need to discover the integral object to recognize image-level labels. In this work, to tackle this problem, we proposed to simultaneously learn the image-level labels and local visual word labels. Specifically, in each forward propagation, the feature maps of the input image will be encoded to visual words with a learnable codebook. By enforcing the network to classify the encoded fine-grained visual words, the generated CAM could cover more semantic regions. Besides, we also proposed a hybrid spatial pyramid pooling module that could preserve local maximum and global average values of feature maps, so that more object details and less background were considered. Based on the proposed methods, we conducted experiments on the PASCAL VOC 2012 dataset. Our proposed method achieved 67.2% mIoU on the val set and 67.3% mIoU on the test set, which outperformed recent state-of-the-art methods.

Download Full-text

Large Kernel Spatial Pyramid Pooling for Semantic Segmentation

Lecture Notes in Computer Science - Image and Graphics ◽

10.1007/978-3-030-34120-6_48 ◽

2019 ◽

pp. 595-605

Author(s):

Jiayi Yang ◽

Tianshi Hu ◽

Junli Yang ◽

Zhaoxing Zhang ◽

Yue Pan

Keyword(s):

Semantic Segmentation ◽

Spatial Pyramid Pooling ◽

Large Kernel ◽

Spatial Pyramid

Download Full-text

Cascaded hierarchical atrous spatial pyramid pooling module for semantic segmentation

Pattern Recognition ◽

10.1016/j.patcog.2020.107622 ◽

2021 ◽

Vol 110 ◽

pp. 107622

Author(s):

Xuhang Lian ◽

Yanwei Pang ◽

Jungong Han ◽

Jing Pan

Keyword(s):

Semantic Segmentation ◽

Spatial Pyramid Pooling ◽

Spatial Pyramid

Download Full-text