Real-Time Semantic Segmentation Network Based on Lite Reduced Atrous Spatial Pyramid Pooling Module Group

Author(s):  
Yangsheng Tian ◽  
Fangyuan Chen ◽  
Haihui Wang ◽  
Shuiping Zhang
2020 ◽  
Vol 2020 (10) ◽  
pp. 27-1-27-7
Author(s):  
Congcong Wang ◽  
Faouzi Alaya Cheikh ◽  
Azeddine Beghdadi ◽  
Ole Jakob Elle

The object sizes in images are diverse, therefore, capturing multiple scale context information is essential for semantic segmentation. Existing context aggregation methods such as pyramid pooling module (PPM) and atrous spatial pyramid pooling (ASPP) employ different pooling size or atrous rate, such that multiple scale information is captured. However, the pooling sizes and atrous rates are chosen empirically. Rethinking of ASPP leads to our observation that learnable sampling locations of the convolution operation can endow the network learnable fieldof- view, thus the ability of capturing object context information adaptively. Following this observation, in this paper, we propose an adaptive context encoding (ACE) module based on deformable convolution operation where sampling locations of the convolution operation are learnable. Our ACE module can be embedded into other Convolutional Neural Networks (CNNs) easily for context aggregation. The effectiveness of the proposed module is demonstrated on Pascal-Context and ADE20K datasets. Although our proposed ACE only consists of three deformable convolution blocks, it outperforms PPM and ASPP in terms of mean Intersection of Union (mIoU) on both datasets. All the experimental studies confirm that our proposed module is effective compared to the state-of-the-art methods.


2021 ◽  
Vol 2021 ◽  
pp. 1-13
Author(s):  
Ran Jin ◽  
Xiaozhen Han ◽  
Tongrui Yu

Image semantic segmentation as a kind of technology has been playing a crucial part in intelligent driving, medical image analysis, video surveillance, and AR. However, since the scene needs to infer more semantics from video and audio clips and the request for real-time performance becomes stricter, whetherthe single-label classification method that was usually used before or the regular manual labeling cannot meet this end. Given the excellent performance of deep learning algorithms in extensive applications, the image semantic segmentation algorithm based on deep learning framework has been brought under the spotlight of development. This paper attempts to improve the ESPNet (Efficient Spatial Pyramid of Dilated Convolutions for Semantic Segmentation) based on the multilabel classification method by the following steps. First, the standard convolution is replaced by applying Receptive Field in Deep Convolutional Neural Network in the convolution layer, to the extent that every pixel in the covered area would facilitate the ultimate feature response. Second, the ASPP (Atrous Spatial Pyramid Pooling) module is improved based on the atrous convolution, and the DB-ASPP (Delate Batch Normalization-ASPP) is proposed as a way to reducing gridding artifacts due to the multilayer atrous convolution, acquiring multiscale information, and integrating the feature information in relation to the image set. Finally, the proposed model and regular models are subject to extensive tests and comparisons on a plurality of multiple data sets. Results show that the proposed model demonstrates a good accuracy of segmentation, the smallest network parameter at 0.3 M and the fastest speed of segmentation at 25 FPS.


2019 ◽  
Vol 11 (9) ◽  
pp. 1015 ◽  
Author(s):  
Hao He ◽  
Dongfang Yang ◽  
Shicheng Wang ◽  
Shuyang Wang ◽  
Yongfei Li

The technology used for road extraction from remote sensing images plays an important role in urban planning, traffic management, navigation, and other geographic applications. Although deep learning methods have greatly enhanced the development of road extractions in recent years, this technology is still in its infancy. Because the characteristics of road targets are complex, the accuracy of road extractions is still limited. In addition, the ambiguous prediction of semantic segmentation methods also makes the road extraction result blurry. In this study, we improved the performance of the road extraction network by integrating atrous spatial pyramid pooling (ASPP) with an Encoder-Decoder network. The proposed approach takes advantage of ASPP’s ability to extract multiscale features and the Encoder-Decoder network’s ability to extract detailed features. Therefore, it can achieve accurate and detailed road extraction results. For the first time, we utilized the structural similarity (SSIM) as a loss function for road extraction. Therefore, the ambiguous predictions in the extraction results can be removed, and the image quality of the extracted roads can be improved. The experimental results using the Massachusetts Road dataset show that our method achieves an F1-score of 83.5% and an SSIM of 0.893. Compared with the normal U-net, our method improves the F1-score by 2.6% and the SSIM by 0.18. Therefore, it is demonstrated that the proposed approach can extract roads from remote sensing images more effectively and clearly than the other compared methods.


Road extraction from satellite images has several Applications such as geographic information system (GIS). Having an accurate and up-to-date road network database will facilitate transportation, disaster management and GPS navigation. Most active field of research for automatic extraction of road network involves semantic segmentation using convolutional neural network (CNN). Although they can produce accurate results, typically the models give up performance for accuracy and vice-versa. In this paper, we are proposing architecture for semantic segmentation of road networks using Atrous Spatial Pyramid Pooling (ASPP). The network contains residual blocks for extracting low level features. Atrous convolutions with different dilation rates are taken and spatial pyramid pooling is performed on these features for extracting the spatial information. The low level features from residual blocks are added to the multi scale context information to produce the final segmentation image. Our proposed model significantly reduces the number of parameters that are required to train the model. The proposed model was trained on the Massachusetts roads dataset and the results have shown that our model produces superior results than that of popular state-of-the art models.


Author(s):  
Lixiang Ru ◽  
Bo Du ◽  
Chen Wu

Current weakly-supervised semantic segmentation (WSSS) methods with image-level labels mainly adopt class activation maps (CAM) to generate the initial pseudo labels. However, CAM usually only identifies the most discriminative object extents, which is attributed to the fact that the network doesn't need to discover the integral object to recognize image-level labels. In this work, to tackle this problem, we proposed to simultaneously learn the image-level labels and local visual word labels. Specifically, in each forward propagation, the feature maps of the input image will be encoded to visual words with a learnable codebook. By enforcing the network to classify the encoded fine-grained visual words, the generated CAM could cover more semantic regions. Besides, we also proposed a hybrid spatial pyramid pooling module that could preserve local maximum and global average values of feature maps, so that more object details and less background were considered. Based on the proposed methods, we conducted experiments on the PASCAL VOC 2012 dataset. Our proposed method achieved 67.2% mIoU on the val set and 67.3% mIoU on the test set, which outperformed recent state-of-the-art methods.


Author(s):  
Jiayi Yang ◽  
Tianshi Hu ◽  
Junli Yang ◽  
Zhaoxing Zhang ◽  
Yue Pan

Sign in / Sign up

Export Citation Format

Share Document