Multi-Level and Multi-Scale Feature Aggregation Network for Semantic Segmentation in Vehicle-Mounted Scenes

The main challenges of semantic segmentation in vehicle-mounted scenes are object scale variation and trading off model accuracy and efficiency. Lightweight backbone networks for semantic segmentation usually extract single-scale features layer-by-layer only by using a fixed receptive field. Most modern real-time semantic segmentation networks heavily compromise spatial details when encoding semantics, and sacrifice accuracy for speed. Many improving strategies adopt dilated convolution and add a sub-network, in which either intensive computation or redundant parameters are brought. We propose a multi-level and multi-scale feature aggregation network (MMFANet). A spatial pyramid module is designed by cascading dilated convolutions with different receptive fields to extract multi-scale features layer-by-layer. Subseqently, a lightweight backbone network is built by reducing the feature channel capacity of the module. To improve the accuracy of our network, we design two additional modules to separately capture spatial details and high-level semantics from the backbone network without significantly increasing the computation cost. Comprehensive experimental results show that our model achieves 79.3% MIoU on the Cityscapes test dataset at a speed of 58.5 FPS, and it is more accurate than SwiftNet (75.5% MIoU). Furthermore, the number of parameters of our model is at least 53.38% less than that of other state-of-the-art models.

Download Full-text

Y-Net: Dual-branch Joint Network for Semantic Segmentation

ACM Transactions on Multimedia Computing Communications and Applications ◽

10.1145/3460940 ◽

2021 ◽

Vol 17 (4) ◽

pp. 1-22

Author(s):

Yizhen Chen ◽

Haifeng Hu

Keyword(s):

Feature Vector ◽

State Of The Art ◽

Computational Cost ◽

Receptive Fields ◽

Semantic Segmentation ◽

Global Context ◽

Multi Level ◽

The One ◽

Public Datasets ◽

High Level

Most existing segmentation networks are built upon a “ U -shaped” encoder–decoder structure, where the multi-level features extracted by the encoder are gradually aggregated by the decoder. Although this structure has been proven to be effective in improving segmentation performance, there are two main drawbacks. On the one hand, the introduction of low-level features brings a significant increase in calculations without an obvious performance gain. On the other hand, general strategies of feature aggregation such as addition and concatenation fuse features without considering the usefulness of each feature vector, which mixes the useful information with massive noises. In this article, we abandon the traditional “ U -shaped” architecture and propose Y-Net, a dual-branch joint network for accurate semantic segmentation. Specifically, it only aggregates the high-level features with low-resolution and utilizes the global context guidance generated by the first branch to refine the second branch. The dual branches are effectively connected through a Semantic Enhancing Module, which can be regarded as the combination of spatial attention and channel attention. We also design a novel Channel-Selective Decoder (CSD) to adaptively integrate features from different receptive fields by assigning specific channelwise weights, where the weights are input-dependent. Our Y-Net is capable of breaking through the limit of singe-branch network and attaining higher performance with less computational cost than “ U -shaped” structure. The proposed CSD can better integrate useful information and suppress interference noises. Comprehensive experiments are carried out on three public datasets to evaluate the effectiveness of our method. Eventually, our Y-Net achieves state-of-the-art performance on PASCAL VOC 2012, PASCAL Person-Part, and ADE20K dataset without pre-training on extra datasets.

Download Full-text

Multi-Level and Multi-Scale Feature Aggregation Using Pretrained Convolutional Neural Networks for Music Auto-Tagging

IEEE Signal Processing Letters ◽

10.1109/lsp.2017.2713830 ◽

2017 ◽

Vol 24 (8) ◽

pp. 1208-1212 ◽

Cited By ~ 35

Author(s):

Jongpil Lee ◽

Juhan Nam

Keyword(s):

Neural Networks ◽

Convolutional Neural Networks ◽

Scale Feature ◽

Multi Scale ◽

Feature Aggregation ◽

Multi Level

Download Full-text

Multi-Scale Feature Aggregation by Cross-Scale Pixel-to-Region Relation Operation for Semantic Segmentation

IEEE Robotics and Automation Letters ◽

10.1109/lra.2021.3086419 ◽

2021 ◽

Vol 6 (3) ◽

pp. 5889-5896

Author(s):

Yechao Bai ◽

Ziyuan Huang ◽

Lyuyu Shen ◽

Hongliang Guo ◽

Marcelo H. Ang Jr ◽

...

Keyword(s):

Semantic Segmentation ◽

Scale Feature ◽

Multi Scale ◽

Feature Aggregation

Download Full-text

Salient Object Detection Combining a Self-Attention Module and a Feature Pyramid Network

Electronics ◽

10.3390/electronics9101702 ◽

2020 ◽

Vol 9 (10) ◽

pp. 1702

Author(s):

Guangyu Ren ◽

Tianhong Dai ◽

Panagiotis Barmpoutis ◽

Tania Stathaki

Keyword(s):

Object Detection ◽

Receptive Fields ◽

Semantic Segmentation ◽

Salient Object Detection ◽

Object Localization ◽

Salient Object ◽

Convolutional Networks ◽

Multi Scale ◽

Fully Convolutional Networks ◽

High Level

Salient object detection has achieved great improvements by using the Fully Convolutional Networks (FCNs). However, the FCN-based U-shape architecture may cause dilution problems in the high-level semantic information during the up-sample operations in the top-down pathway. Thus, it can weaken the ability of salient object localization and produce degraded boundaries. To this end, in order to overcome this limitation, we propose a novel pyramid self-attention module (PSAM) and the adoption of an independent feature-complementing strategy. In PSAM, self-attention layers are equipped after multi-scale pyramid features to capture richer high-level features and bring larger receptive fields to the model. In addition, a channel-wise attention module is also employed to reduce the redundant features of the FPN and provide refined results. Experimental analysis demonstrates that the proposed PSAM effectively contributes to the whole model so that it outperforms state-of-the-art results over five challenging datasets. Finally, quantitative results show that PSAM generates accurate predictions and integral salient maps, which can provide further help to other computer vision tasks, such as object detection and semantic segmentation.

Download Full-text

PFMNet: Few-Shot Segmentation with Query Feature Enhancement and Multi-Scale Feature Matching

Information ◽

10.3390/info12100406 ◽

2021 ◽

Vol 12 (10) ◽

pp. 406

Author(s):

Jingyao Li ◽

Lianglun Cheng ◽

Zewen Zheng ◽

Jiahong Chen ◽

Genping Zhao ◽

...

Keyword(s):

Semantic Information ◽

Feature Matching ◽

Semantic Segmentation ◽

Average Score ◽

Scale Feature ◽

Feature Enhancement ◽

Multi Scale ◽

Support Set ◽

Shot Segmentation ◽

High Level

The datasets in the latest semantic segmentation model often need to be manually labeled for each pixel, which is time-consuming and requires much effort. General models are unable to make better predictions, for new categories of information that have never been seen before, than the few-shot segmentation that has emerged. However, the few-shot segmentation is still faced up with two challenges. One is the inadequate exploration of semantic information conveyed in the high-level features, and the other is the inconsistency of segmenting objects at different scales. To solve these two problems, we have proposed a prior feature matching network (PFMNet). It includes two novel modules: (1) the Query Feature Enhancement Module (QFEM), which makes full use of the high-level semantic information in the support set to enhance the query feature, and (2) the multi-scale feature matching module (MSFMM), which increases the matching probability of multi-scales of objects. Our method achieves an intersection over union average score of 61.3% for one-shot segmentation and 63.4% for five-shot segmentation, which surpasses the state-of-the-art results by 0.5% and 1.5%, respectively.

Download Full-text

Multi-FAN: multi-spectral mosaic super-resolution via multi-scale feature aggregation network

Machine Vision and Applications ◽

10.1007/s00138-021-01174-w ◽

2021 ◽

Vol 32 (2) ◽

Author(s):

Mehrdad Sheoiby ◽

Sadegh Aliakbarian ◽

Saeed Anwar ◽

Lars Petersson

Keyword(s):

Super Resolution ◽

Scale Feature ◽

Multi Scale ◽

Feature Aggregation

Download Full-text

MLFNet-Point Cloud Semantic Segmentation Convolution Network Based on Multi-scale Feature Fusion

IEEE Access ◽

10.1109/access.2021.3057612 ◽

2021 ◽

pp. 1-1

Author(s):

Jingfang Yang ◽

Bochang Zou ◽

Huadong Qiu ◽

Zhi Li

Keyword(s):

Point Cloud ◽

Feature Fusion ◽

Semantic Segmentation ◽

Scale Feature ◽

Multi Scale

Download Full-text

Real-time Semantic Segmentation Based on Multi-scale Feature Map Joint Pyramid Upsamping

10.1109/aeeca52519.2021.9574190 ◽

2021 ◽

Author(s):

Liang Chao ◽

Wang Xiaoyu ◽

Song Yu ◽

Jiang Changhong

Keyword(s):

Real Time ◽

Semantic Segmentation ◽

Feature Map ◽

Scale Feature ◽

Multi Scale

Download Full-text

Simultaneous Segmentation of Fetal Hearts and Lungs for Medical Ultrasound Images via an Efficient Multi-scale Model Integrated With Attention Mechanism

Ultrasonic Imaging ◽

10.1177/01617346211042526 ◽

2021 ◽

pp. 016173462110425

Author(s):

Jianing Xi ◽

Jiangang Chen ◽

Zhao Wang ◽

Dean Ta ◽

Bing Lu ◽

...

Keyword(s):

Congenital Anomaly ◽

Large Scale ◽

Automatic Segmentation ◽

Receptive Fields ◽

Semantic Segmentation ◽

Attention Mechanism ◽

Scale Model ◽

Ultrasound Images ◽

Multi Scale ◽

Task Irrelevant

Large scale early scanning of fetuses via ultrasound imaging is widely used to alleviate the morbidity or mortality caused by congenital anomalies in fetal hearts and lungs. To reduce the intensive cost during manual recognition of organ regions, many automatic segmentation methods have been proposed. However, the existing methods still encounter multi-scale problem at a larger range of receptive fields of organs in images, resolution problem of segmentation mask, and interference problem of task-irrelevant features, obscuring the attainment of accurate segmentations. To achieve semantic segmentation with functions of (1) extracting multi-scale features from images, (2) compensating information of high resolution, and (3) eliminating the task-irrelevant features, we propose a multi-scale model with skip connection framework and attention mechanism integrated. The multi-scale feature extraction modules are incorporated with additive attention gate units for irrelevant feature elimination, through a U-Net framework with skip connections for information compensation. The performance of fetal heart and lung segmentation indicates the superiority of our method over the existing deep learning based approaches. Our method also shows competitive performance stability during the task of semantic segmentations, showing a promising contribution on ultrasound based prognosis of congenital anomaly in the early intervention, and alleviating the negative effects caused by congenital anomaly.

Download Full-text