Semantic Segmentation by Multi-Scale Feature Extraction Based on Grouped Dilated Convolution Module

Existing studies have shown that effective extraction of multi-scale information is a crucial factor directly related to the increase in performance of semantic segmentation. Accordingly, various methods for extracting multi-scale information have been developed. However, these methods face problems in that they require additional calculations and vast computing resources. To address these problems, this study proposes a grouped dilated convolution module that combines existing grouped convolutions and atrous spatial pyramid pooling techniques. The proposed method can learn multi-scale features more simply and effectively than existing methods. Because each convolution group has different dilations in the proposed model, they have receptive fields of different sizes and can learn features corresponding to these receptive fields. As a result, multi-scale context can be easily extracted. Moreover, optimal hyper-parameters are obtained from an in-depth analysis, and excellent segmentation performance is derived. To evaluate the proposed method, open databases of the Cambridge Driving Labeled Video Database (CamVid) and the Stanford Background Dataset (SBD) are utilized. The experimental results indicate that the proposed method shows a mean intersection over union of 73.15% based on the CamVid dataset and 72.81% based on the SBD, thereby exhibiting excellent performance compared to other state-of-the-art methods.

Download Full-text

Road Network Extraction Using Atrous Spatial Pyramid Pooling

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.h74590.78919 ◽

2019 ◽

Vol 8 (9) ◽

pp. 31-33

Keyword(s):

Road Network ◽

Spatial Information ◽

Semantic Segmentation ◽

Low Level ◽

Multi Scale ◽

Road Network Extraction ◽

Proposed Model ◽

Spatial Pyramid Pooling ◽

Segmentation Image ◽

Spatial Pyramid

Road extraction from satellite images has several Applications such as geographic information system (GIS). Having an accurate and up-to-date road network database will facilitate transportation, disaster management and GPS navigation. Most active field of research for automatic extraction of road network involves semantic segmentation using convolutional neural network (CNN). Although they can produce accurate results, typically the models give up performance for accuracy and vice-versa. In this paper, we are proposing architecture for semantic segmentation of road networks using Atrous Spatial Pyramid Pooling (ASPP). The network contains residual blocks for extracting low level features. Atrous convolutions with different dilation rates are taken and spatial pyramid pooling is performed on these features for extracting the spatial information. The low level features from residual blocks are added to the multi scale context information to produce the final segmentation image. Our proposed model significantly reduces the number of parameters that are required to train the model. The proposed model was trained on the Massachusetts roads dataset and the results have shown that our model produces superior results than that of popular state-of-the art models.

Download Full-text

Multi-Column Atrous Convolutional Neural Network for Counting Metro Passengers

Symmetry ◽

10.3390/sym12040682 ◽

2020 ◽

Vol 12 (4) ◽

pp. 682

Author(s):

Jun Zhang ◽

Gaoyi Zhu ◽

Zhizhong Wang

Keyword(s):

Neural Network ◽

Convolutional Neural Network ◽

Excellent Performance ◽

Multi Scale ◽

Density Maps ◽

Deep Feature ◽

Proposed Model ◽

Spatial Pyramid Pooling ◽

Deep Feature Extraction ◽

Symmetric Method

We propose a symmetric method of accurately estimating the number of metro passengers from an individual image. To this end, we developed a network for metro-passenger counting called MPCNet, which provides a data-driven and deep learning method of understanding highly congested scenes and accurately estimating crowds, as well as presenting high-quality density maps. The proposed MPCNet is composed of two major components: A deep convolutional neural network (CNN) as the front end, for deep feature extraction; and a multi-column atrous CNN as the back-end, with atrous spatial pyramid pooling (ASPP) to deliver multi-scale reception fields. Existing crowd-counting datasets do not adequately cover all the challenging situations considered in our work. Therefore, we collected specific subway passenger video to compile and label a large new dataset that includes 346 images with 3475 annotated heads. We conducted extensive experiments with this and other datasets to verify the effectiveness of the proposed model. Our results demonstrate the excellent performance of the proposed MPCNet.

Download Full-text

Simultaneous Segmentation of Fetal Hearts and Lungs for Medical Ultrasound Images via an Efficient Multi-scale Model Integrated With Attention Mechanism

Ultrasonic Imaging ◽

10.1177/01617346211042526 ◽

2021 ◽

pp. 016173462110425

Author(s):

Jianing Xi ◽

Jiangang Chen ◽

Zhao Wang ◽

Dean Ta ◽

Bing Lu ◽

...

Keyword(s):

Congenital Anomaly ◽

Large Scale ◽

Automatic Segmentation ◽

Receptive Fields ◽

Semantic Segmentation ◽

Attention Mechanism ◽

Scale Model ◽

Ultrasound Images ◽

Multi Scale ◽

Task Irrelevant

Large scale early scanning of fetuses via ultrasound imaging is widely used to alleviate the morbidity or mortality caused by congenital anomalies in fetal hearts and lungs. To reduce the intensive cost during manual recognition of organ regions, many automatic segmentation methods have been proposed. However, the existing methods still encounter multi-scale problem at a larger range of receptive fields of organs in images, resolution problem of segmentation mask, and interference problem of task-irrelevant features, obscuring the attainment of accurate segmentations. To achieve semantic segmentation with functions of (1) extracting multi-scale features from images, (2) compensating information of high resolution, and (3) eliminating the task-irrelevant features, we propose a multi-scale model with skip connection framework and attention mechanism integrated. The multi-scale feature extraction modules are incorporated with additive attention gate units for irrelevant feature elimination, through a U-Net framework with skip connections for information compensation. The performance of fetal heart and lung segmentation indicates the superiority of our method over the existing deep learning based approaches. Our method also shows competitive performance stability during the task of semantic segmentations, showing a promising contribution on ultrasound based prognosis of congenital anomaly in the early intervention, and alleviating the negative effects caused by congenital anomaly.

Download Full-text

Y-Net: Dual-branch Joint Network for Semantic Segmentation

ACM Transactions on Multimedia Computing Communications and Applications ◽

10.1145/3460940 ◽

2021 ◽

Vol 17 (4) ◽

pp. 1-22

Author(s):

Yizhen Chen ◽

Haifeng Hu

Keyword(s):

Feature Vector ◽

State Of The Art ◽

Computational Cost ◽

Receptive Fields ◽

Semantic Segmentation ◽

Global Context ◽

Multi Level ◽

The One ◽

Public Datasets ◽

High Level

Most existing segmentation networks are built upon a “ U -shaped” encoder–decoder structure, where the multi-level features extracted by the encoder are gradually aggregated by the decoder. Although this structure has been proven to be effective in improving segmentation performance, there are two main drawbacks. On the one hand, the introduction of low-level features brings a significant increase in calculations without an obvious performance gain. On the other hand, general strategies of feature aggregation such as addition and concatenation fuse features without considering the usefulness of each feature vector, which mixes the useful information with massive noises. In this article, we abandon the traditional “ U -shaped” architecture and propose Y-Net, a dual-branch joint network for accurate semantic segmentation. Specifically, it only aggregates the high-level features with low-resolution and utilizes the global context guidance generated by the first branch to refine the second branch. The dual branches are effectively connected through a Semantic Enhancing Module, which can be regarded as the combination of spatial attention and channel attention. We also design a novel Channel-Selective Decoder (CSD) to adaptively integrate features from different receptive fields by assigning specific channelwise weights, where the weights are input-dependent. Our Y-Net is capable of breaking through the limit of singe-branch network and attaining higher performance with less computational cost than “ U -shaped” structure. The proposed CSD can better integrate useful information and suppress interference noises. Comprehensive experiments are carried out on three public datasets to evaluate the effectiveness of our method. Eventually, our Y-Net achieves state-of-the-art performance on PASCAL VOC 2012, PASCAL Person-Part, and ADE20K dataset without pre-training on extra datasets.

Download Full-text

Concrete Cracks Detection Based on FCN with Dilated Convolution

Applied Sciences ◽

10.3390/app9132686 ◽

2019 ◽

Vol 9 (13) ◽

pp. 2686 ◽

Cited By ~ 15

Author(s):

Jianming Zhang ◽

Chaoquan Lu ◽

Jin Wang ◽

Lei Wang ◽

Xiao-Guang Yue

Keyword(s):

Crack Detection ◽

Receptive Fields ◽

Semantic Segmentation ◽

Concrete Surface ◽

Input Image ◽

Feature Maps ◽

Test Set ◽

Dilated Convolution ◽

Fully Convolutional Networks ◽

Segmentation Task

In civil engineering, the stability of concrete is of great significance to safety of people’s life and property, so it is necessary to detect concrete damage effectively. In this paper, we treat crack detection on concrete surface as a semantic segmentation task that distinguishes background from crack at the pixel level. Inspired by Fully Convolutional Networks (FCN), we propose a full convolution network based on dilated convolution for concrete crack detection, which consists of an encoder and a decoder. Specifically, we first used the residual network to extract the feature maps of the input image, designed the dilated convolutions with different dilation rates to extract the feature maps of different receptive fields, and fused the extracted features from multiple branches. Then, we exploited the stacked deconvolution to do up-sampling operator in the fused feature maps. Finally, we used the SoftMax function to classify the feature maps at the pixel level. In order to verify the validity of the model, we introduced the commonly used evaluation indicators of semantic segmentation: Pixel Accuracy (PA), Mean Pixel Accuracy (MPA), Mean Intersection over Union (MIoU), and Frequency Weighted Intersection over Union (FWIoU). The experimental results show that the proposed model converges faster and has better generalization performance on the test set by introducing dilated convolutions with different dilation rates and a multi-branch fusion strategy. Our model has a PA of 96.84%, MPA of 92.55%, MIoU of 86.05% and FWIoU of 94.22% on the test set, which is superior to other models.

Download Full-text

Multiscale Road Extraction in Remote Sensing Images

Computational Intelligence and Neuroscience ◽

10.1155/2019/2373798 ◽

2019 ◽

Vol 2019 ◽

pp. 1-9 ◽

Cited By ~ 4

Author(s):

Aziguli Wulamu ◽

Zuxian Shi ◽

Dezheng Zhang ◽

Zheyu He

Keyword(s):

Remote Sensing ◽

Network Architecture ◽

Semantic Segmentation ◽

Road Extraction ◽

Remote Sensing Images ◽

The Road ◽

Proposed Model ◽

Different Types ◽

Spatial Pyramid Pooling ◽

The One

Recent advances in convolutional neural networks (CNNs) have shown impressive results in semantic segmentation. Among the successful CNN-based methods, U-Net has achieved exciting performance. In this paper, we proposed a novel network architecture based on U-Net and atrous spatial pyramid pooling (ASPP) to deal with the road extraction task in the remote sensing field. On the one hand, U-Net structure can effectively extract valuable features; on the other hand, ASPP is able to utilize multiscale context information in remote sensing images. Compared to the baseline, this proposed model has improved the pixelwise mean Intersection over Union (mIoU) of 3 points. Experimental results show that the proposed network architecture can deal with different types of road surface extraction tasks under various terrains in Yinchuan city, solve the road connectivity problem to some extent, and has certain tolerance to shadows and occlusion.

Download Full-text

Adaptive Context Encoding Module for Semantic Segmentation

Electronic Imaging ◽

10.2352/issn.2470-1173.2020.10.ipas-027 ◽

2020 ◽

Vol 2020 (10) ◽

pp. 27-1-27-7

Author(s):

Congcong Wang ◽

Faouzi Alaya Cheikh ◽

Azeddine Beghdadi ◽

Ole Jakob Elle

Keyword(s):

Neural Networks ◽

State Of The Art ◽

Experimental Studies ◽

Semantic Segmentation ◽

Multiple Scale ◽

Context Information ◽

Convolution Operation ◽

Sampling Locations ◽

Spatial Pyramid Pooling ◽

Spatial Pyramid

The object sizes in images are diverse, therefore, capturing multiple scale context information is essential for semantic segmentation. Existing context aggregation methods such as pyramid pooling module (PPM) and atrous spatial pyramid pooling (ASPP) employ different pooling size or atrous rate, such that multiple scale information is captured. However, the pooling sizes and atrous rates are chosen empirically. Rethinking of ASPP leads to our observation that learnable sampling locations of the convolution operation can endow the network learnable fieldof- view, thus the ability of capturing object context information adaptively. Following this observation, in this paper, we propose an adaptive context encoding (ACE) module based on deformable convolution operation where sampling locations of the convolution operation are learnable. Our ACE module can be embedded into other Convolutional Neural Networks (CNNs) easily for context aggregation. The effectiveness of the proposed module is demonstrated on Pascal-Context and ADE20K datasets. Although our proposed ACE only consists of three deformable convolution blocks, it outperforms PPM and ASPP in terms of mean Intersection of Union (mIoU) on both datasets. All the experimental studies confirm that our proposed module is effective compared to the state-of-the-art methods.

Download Full-text

Multi-scale Information Diffusion Prediction with Reinforced Recurrent Networks

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2019/560 ◽

2019 ◽

Cited By ~ 7

Author(s):

Cheng Yang ◽

Jian Tang ◽

Maosong Sun ◽

Ganqu Cui ◽

Zhiyuan Liu

Keyword(s):

Information Diffusion ◽

State Of The Art ◽

Sequential Data ◽

Recurrent Networks ◽

Multi Scale ◽

Structural Context ◽

Learning Techniques ◽

Proposed Model ◽

Real World Datasets ◽

Diffusion Prediction

Information diffusion prediction is an important task which studies how information items spread among users. With the success of deep learning techniques, recurrent neural networks (RNNs) have shown their powerful capability in modeling information diffusion as sequential data. However, previous works focused on either microscopic diffusion prediction which aims at guessing the next influenced user or macroscopic diffusion prediction which estimates the total numbers of influenced users during the diffusion process. To the best of our knowledge, no previous works have suggested a unified model for both microscopic and macroscopic scales. In this paper, we propose a novel multi-scale diffusion prediction model based on reinforcement learning (RL). RL incorporates the macroscopic diffusion size information into the RNN-based microscopic diffusion model by addressing the non-differentiable problem. We also employ an effective structural context extraction strategy to utilize the underlying social graph information. Experimental results show that our proposed model outperforms state-of-the-art baseline models on both microscopic and macroscopic diffusion predictions on three real-world datasets.

Download Full-text

Semantic Locality-Aware Deformable Network for Clothing Segmentation

Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2018/106 ◽

2018 ◽

Cited By ~ 4

Author(s):

Wei Ji ◽

Xi Li ◽

Yueting Zhuang ◽

Omar El Farouk Bourahla ◽

Yixin Ji ◽

...

Keyword(s):

Learning Process ◽

State Of The Art ◽

Semantic Segmentation ◽

Small Sample ◽

Sample Problem ◽

Fine Grained ◽

Domain Specific ◽

Proposed Model ◽

Segmentation Framework ◽

Small Sample Problem

Clothing segmentation is a challenging vision problem typically implemented within a fine-grained semantic segmentation framework. Different from conventional segmentation, clothing segmentation has some domain-specific properties such as texture richness, diverse appearance variations, non-rigid geometry deformations, and small sample learning. To deal with these points, we propose a semantic locality-aware segmentation model, which adaptively attaches an original clothing image with a semantically similar (e.g., appearance or pose) auxiliary exemplar by search. Through considering the interactions of the clothing image and its exemplar, more intrinsic knowledge about the locality manifold structures of clothing images is discovered to make the learning process of small sample problem more stable and tractable. Furthermore, we present a CNN model based on the deformable convolutions to extract the non-rigid geometry-aware features for clothing images. Experimental results demonstrate the effectiveness of the proposed model against the state-of-the-art approaches.

Download Full-text

Histopathological Classification of Breast Cancer Images Using a Multi-Scale Input and Multi-Feature Network

Cancers ◽

10.3390/cancers12082031 ◽

2020 ◽

Vol 12 (8) ◽

pp. 2031 ◽

Cited By ~ 2

Author(s):

Taimoor Shakeel Sheikh ◽

Yonghee Lee ◽

Migyung Cho

Keyword(s):

State Of The Art ◽

Texture Features ◽

Feature Maps ◽

Histopathological Classification ◽

Multi Scale ◽

Machine Learning Methods ◽

Proposed Model ◽

Benchmark Datasets ◽

Histopathological Images

Diagnosis of pathologies using histopathological images can be time-consuming when many images with different magnification levels need to be analyzed. State-of-the-art computer vision and machine learning methods can help automate the diagnostic pathology workflow and thus reduce the analysis time. Automated systems can also be more efficient and accurate, and can increase the objectivity of diagnosis by reducing operator variability. We propose a multi-scale input and multi-feature network (MSI-MFNet) model, which can learn the overall structures and texture features of different scale tissues by fusing multi-resolution hierarchical feature maps from the network’s dense connectivity structure. The MSI-MFNet predicts the probability of a disease on the patch and image levels. We evaluated the performance of our proposed model on two public benchmark datasets. Furthermore, through ablation studies of the model, we found that multi-scale input and multi-feature maps play an important role in improving the performance of the model. Our proposed model outperformed the existing state-of-the-art models by demonstrating better accuracy, sensitivity, and specificity.

Download Full-text