Feature fusion network based on attention mechanism for 3D semantic segmentation of point clouds

AbstractContextual information is a key factor affecting semantic segmentation. Recently, many methods have tried to use the self-attention mechanism to capture more contextual information. However, these methods with self-attention mechanism need a huge computation. In order to solve this problem, a novel self-attention network, called FFANet, is designed to efficiently capture contextual information, which reduces the amount of calculation through strip pooling and linear layers. It proposes the feature fusion (FF) module to calculate the affinity matrix. The affinity matrix can capture the relationship between pixels. Then we multiply the affinity matrix with the feature map, which can selectively increase the weight of the region of interest. Extensive experiments on the public datasets (PASCAL VOC2012, CityScapes) and remote sensing dataset (DLRSD) have been conducted and achieved Mean Iou score 74.5%, 70.3%, and 63.9% respectively. Compared with the current typical algorithms, the proposed method has achieved excellent performance.

Download Full-text

HCNET: A Point Cloud Object Detection Network Based on Height and Channel Attention

Remote Sensing ◽

10.3390/rs13245071 ◽

2021 ◽

Vol 13 (24) ◽

pp. 5071

Author(s):

Jing Zhang ◽

Jiajun Wang ◽

Da Xu ◽

Yunsong Li

Keyword(s):

Object Detection ◽

Point Cloud ◽

Feature Fusion ◽

Three Dimensional ◽

Point Clouds ◽

Autonomous Driving ◽

Attention Mechanism ◽

Uneven Distribution ◽

Adaptive Adjustment ◽

High Level

The use of LiDAR point clouds for accurate three-dimensional perception is crucial for realizing high-level autonomous driving systems. Upon considering the drawbacks of the current point cloud object-detection algorithms, this paper proposes HCNet, an algorithm that combines an attention mechanism with adaptive adjustment, starting from feature fusion and overcoming the sparse and uneven distribution of point clouds. Inspired by the basic idea of an attention mechanism, a feature-fusion structure HC module with height attention and channel attention, weighted in parallel, is proposed to perform feature-fusion on multiple pseudo images. The use of several weighting mechanisms enhances the ability of feature-information expression. Additionally, we designed an adaptively adjusted detection head that also overcomes the sparsity of the point cloud from the perspective of original information fusion. It reduces the interference caused by the uneven distribution of the point cloud from the perspective of adaptive adjustment. The results show that our HCNet has better accuracy than other one-stage-network or even two-stage-network RCNNs under some evaluation detection metrics. Additionally, it has a detection rate of 30FPS. Especially for hard samples, the algorithm in this paper has better detection performance than many existing algorithms.

Download Full-text

JSNet: Joint Instance and Semantic Segmentation of 3D Point Clouds

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i07.6994 ◽

2020 ◽

Vol 34 (07) ◽

pp. 12951-12958 ◽

Cited By ~ 3

Author(s):

Lin Zhao ◽

Wenbing Tao

Keyword(s):

Point Cloud ◽

Large Scale ◽

Feature Fusion ◽

Mean Shift ◽

Semantic Segmentation ◽

Point Clouds ◽

Semantic Features ◽

Backbone Network ◽

3D Point Clouds ◽

Instance Segmentation

In this paper, we propose a novel joint instance and semantic segmentation approach, which is called JSNet, in order to address the instance and semantic segmentation of 3D point clouds simultaneously. Firstly, we build an effective backbone network to extract robust features from the raw point clouds. Secondly, to obtain more discriminative features, a point cloud feature fusion module is proposed to fuse the different layer features of the backbone network. Furthermore, a joint instance semantic segmentation module is developed to transform semantic features into instance embedding space, and then the transformed features are further fused with instance features to facilitate instance segmentation. Meanwhile, this module also aggregates instance features into semantic feature space to promote semantic segmentation. Finally, the instance predictions are generated by applying a simple mean-shift clustering on instance embeddings. As a result, we evaluate the proposed JSNet on a large-scale 3D indoor point cloud dataset S3DIS and a part dataset ShapeNet, and compare it with existing approaches. Experimental results demonstrate our approach outperforms the state-of-the-art method in 3D instance segmentation with a significant improvement in 3D semantic prediction and our method is also beneficial for part segmentation. The source code for this work is available at https://github.com/dlinzhao/JSNet.

Download Full-text

Enet Semantic Segmentation Combined with Attention Mechanism

10.21203/rs.3.rs-425438/v1 ◽

2021 ◽

Author(s):

Wei Bai

Keyword(s):

Feature Fusion ◽

Receptive Fields ◽

Design Feature ◽

Semantic Segmentation ◽

Attention Mechanism ◽

Segmentation Algorithm ◽

Feature Maps ◽

Model Learning ◽

Simple Method ◽

Feature Map

Abstract Image semantic segmentation is one of the core tasks of computer vision. It is widely used in fields such as unmanned driving, medical image processing, geographic information systems and intelligent robots. Aiming at the problem that the existing semantic segmentation algorithm ignores the different channel and location features of the feature map and the simple method when the feature map is fused, this paper designs a semantic segmentation algorithm that combines the attention mechanism. Firstly, dilated convolution is used, and a smaller downsampling factor is used to maintain the resolution of the image and obtain the detailed information of the image. Secondly, the attention mechanism module is introduced to assign weights to different parts of the feature map, which reduces the accuracy loss. The design feature fusion module assigns weights to the feature maps of different receptive fields obtained by the two paths, and merges them together to obtain the final segmentation result. Finally, through experiments, it was verified on the Camvid, Cityscapes and PASCAL VOC2012 datasets. Mean intersection over union (MIoU) and mean pixel accuracy (MPA) are used as metrics. The method in this paper can make up for the loss of accuracy caused by downsampling while ensuring the receptive field and improving the resolution, which can better guide the model learning. And the proposed feature fusion module can better integrate the features of different receptive fields. Therefore, the proposed method can significantly improve the segmentation performance compared to the traditional method.

Download Full-text

Point Cloud Semantic Segmentation Network Based on Multi-Scale Feature Fusion

Sensors ◽

10.3390/s21051625 ◽

2021 ◽

Vol 21 (5) ◽

pp. 1625

Author(s):

Jing Du ◽

Zuning Jiang ◽

Shangfeng Huang ◽

Zongyue Wang ◽

Jinhe Su ◽

...

Keyword(s):

Network Optimization ◽

Point Cloud ◽

Semantic Information ◽

Feature Fusion ◽

Semantic Segmentation ◽

Point Clouds ◽

Scale Feature ◽

Multi Scale ◽

Sensing Applications ◽

Remote Sensing Applications

The semantic segmentation of small objects in point clouds is currently one of the most demanding tasks in photogrammetry and remote sensing applications. Multi-resolution feature extraction and fusion can significantly enhance the ability of object classification and segmentation, so it is widely used in the image field. For this motivation, we propose a point cloud semantic segmentation network based on multi-scale feature fusion (MSSCN) to aggregate the feature of a point cloud with different densities and improve the performance of semantic segmentation. In our method, random downsampling is first applied to obtain point clouds of different densities. A Spatial Aggregation Net (SAN) is then employed as the backbone network to extract local features from these point clouds, followed by concatenation of the extracted feature descriptors at different scales. Finally, a loss function is used to combine the different semantic information from point clouds of different densities for network optimization. Experiments were conducted on the S3DIS and ScanNet datasets, and our MSSCN achieved accuracies of 89.80% and 86.3%, respectively, on these datasets. Our method showed better performance than the recent methods PointNet, PointNet++, PointCNN, PointSIFT, and SAN.

Download Full-text

Deep learning-based tool wear prediction and its application for machining process using multi-scale feature fusion and channel attention mechanism

Measurement ◽

10.1016/j.measurement.2021.109254 ◽

2021 ◽

Vol 177 ◽

pp. 109254

Author(s):

Xingwei Xu ◽

Jianwen Wang ◽

Bingfu Zhong ◽

Weiwei Ming ◽

Ming Chen

Keyword(s):

Deep Learning ◽

Tool Wear ◽

Feature Fusion ◽

Attention Mechanism ◽

Machining Process ◽

Wear Prediction ◽

Scale Feature ◽

Multi Scale ◽

Tool Wear Prediction

Download Full-text

Semantic Segmentation of High Resolution Remote Sensing Images with Extra Context Attention Mechanism

2020 IEEE 20th International Conference on Communication Technology (ICCT) ◽

10.1109/icct50939.2020.9295814 ◽

2020 ◽

Author(s):

Weifu Fu ◽

Qing Peng ◽

Yanxiang Gong ◽

Mei Xie ◽

Shicheng Wang ◽

...

Keyword(s):

Remote Sensing ◽

High Resolution ◽

Semantic Segmentation ◽

Attention Mechanism ◽

Remote Sensing Images

Download Full-text

RSNet: Rail semantic segmentation network for extracting aerial railroad images

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-210349 ◽

2021 ◽

pp. 1-18

Author(s):

R.S. Rampriya ◽

Sabarinathan ◽

R. Suganya

Keyword(s):

Real Time ◽

Visual Processing ◽

Feature Fusion ◽

Semantic Segmentation ◽

Vital Role ◽

Obstacle Detection ◽

Aerial Images ◽

Computationally Efficient ◽

Fusion Algorithm ◽

Uav Images

In the near future, combo of UAV (Unmanned Aerial Vehicle) and computer vision will play a vital role in monitoring the condition of the railroad periodically to ensure passenger safety. The most significant module involved in railroad visual processing is obstacle detection, in which caution is obstacle fallen near track gage inside or outside. This leads to the importance of detecting and segment the railroad as three key regions, such as gage inside, rails, and background. Traditional railroad segmentation methods depend on either manual feature selection or expensive dedicated devices such as Lidar, which is typically less reliable in railroad semantic segmentation. Also, cameras mounted on moving vehicles like a drone can produce high-resolution images, so segmenting precise pixel information from those aerial images has been challenging due to the railroad surroundings chaos. RSNet is a multi-level feature fusion algorithm for segmenting railroad aerial images captured by UAV and proposes an attention-based efficient convolutional encoder for feature extraction, which is robust and computationally efficient and modified residual decoder for segmentation which considers only essential features and produces less overhead with higher performance even in real-time railroad drone imagery. The network is trained and tested on a railroad scenic view segmentation dataset (RSSD), which we have built from real-time UAV images and achieves 0.973 dice coefficient and 0.94 jaccard on test data that exhibits better results compared to the existing approaches like a residual unit and residual squeeze net.

Download Full-text

An Inverse Node Graph-Based Method for the Urban Scene Segmentation of 3D Point Clouds

Remote Sensing ◽

10.3390/rs13153021 ◽

2021 ◽

Vol 13 (15) ◽

pp. 3021

Author(s):

Bufan Zhao ◽

Xianghong Hua ◽

Kegen Yu ◽

Xiaoxing He ◽

Weixing Xue ◽

...

Keyword(s):

Semantic Segmentation ◽

Point Clouds ◽

Intelligent Vehicles ◽

Critical Data ◽

Multi Scale ◽

3D Point Clouds ◽

Cluster Optimization ◽

Urban Scene ◽

Processing Steps

Urban object segmentation and classification tasks are critical data processing steps in scene understanding, intelligent vehicles and 3D high-precision maps. Semantic segmentation of 3D point clouds is the foundational step in object recognition. To identify the intersecting objects and improve the accuracy of classification, this paper proposes a segment-based classification method for 3D point clouds. This method firstly divides points into multi-scale supervoxels and groups them by proposed inverse node graph (IN-Graph) construction, which does not need to define prior information about the node, it divides supervoxels by judging the connection state of edges between them. This method reaches minimum global energy by graph cutting, obtains the structural segments as completely as possible, and retains boundaries at the same time. Then, the random forest classifier is utilized for supervised classification. To deal with the mislabeling of scattered fragments, higher-order CRF with small-label cluster optimization is proposed to refine the classification results. Experiments were carried out on mobile laser scan (MLS) point dataset and terrestrial laser scan (TLS) points dataset, and the results show that overall accuracies of 97.57% and 96.39% were obtained in the two datasets. The boundaries of objects were retained well, and the method achieved a good result in the classification of cars and motorcycles. More experimental analyses have verified the advantages of the proposed method and proved the practicability and versatility of the method.

Download Full-text

A Multi-Branch Feature Fusion Strategy Based on an Attention Mechanism for Remote Sensing Image Scene Classification

Remote Sensing ◽

10.3390/rs13101950 ◽

2021 ◽

Vol 13 (10) ◽

pp. 1950

Author(s):

Cuiping Shi ◽

Xin Zhao ◽

Liguo Wang

Keyword(s):

Remote Sensing ◽

Feature Extraction ◽

Classification Accuracy ◽

Feature Fusion ◽

State Of The Art ◽

Rapid Development ◽

Remote Sensing Image ◽

Classification Performance ◽

Attention Mechanism ◽

Scene Classification

In recent years, with the rapid development of computer vision, increasing attention has been paid to remote sensing image scene classification. To improve the classification performance, many studies have increased the depth of convolutional neural networks (CNNs) and expanded the width of the network to extract more deep features, thereby increasing the complexity of the model. To solve this problem, in this paper, we propose a lightweight convolutional neural network based on attention-oriented multi-branch feature fusion (AMB-CNN) for remote sensing image scene classification. Firstly, we propose two convolution combination modules for feature extraction, through which the deep features of images can be fully extracted with multi convolution cooperation. Then, the weights of the feature are calculated, and the extracted deep features are sent to the attention mechanism for further feature extraction. Next, all of the extracted features are fused by multiple branches. Finally, depth separable convolution and asymmetric convolution are implemented to greatly reduce the number of parameters. The experimental results show that, compared with some state-of-the-art methods, the proposed method still has a great advantage in classification accuracy with very few parameters.

Download Full-text