Semantic Segmentation of Large-Scale Outdoor Point Clouds by Encoder–Decoder Shared MLPs with Multiple Losses

Semantic segmentation of large-scale outdoor 3D LiDAR point clouds becomes essential to understand the scene environment in various applications, such as geometry mapping, autonomous driving, and more. With an advantage of being a 3D metric space, 3D LiDAR point clouds, on the other hand, pose a challenge for a deep learning approach, due to their unstructured, unorder, irregular, and large-scale characteristics. Therefore, this paper presents an encoder–decoder shared multi-layer perceptron (MLP) with multiple losses, to address an issue of this semantic segmentation. The challenge rises a trade-off between efficiency and effectiveness in performance. To balance this trade-off, we proposed common mechanisms, which is simple and yet effective, by defining a random point sampling layer, an attention-based pooling layer, and a summation of multiple losses integrated with the encoder–decoder shared MLPs method for the large-scale outdoor point clouds semantic segmentation. We conducted our experiments on the following two large-scale benchmark datasets: Toronto-3D and DALES dataset. Our experimental results achieved an overall accuracy (OA) and a mean intersection over union (mIoU) of both the Toronto-3D dataset, with 83.60% and 71.03%, and the DALES dataset, with 76.43% and 59.52%, respectively. Additionally, our proposed method performed a few numbers of parameters of the model, and faster than PointNet++ by about three times during inferencing.

Download Full-text

Ground-distance segmentation of 3D LiDAR point cloud toward autonomous driving

APSIPA Transactions on Signal and Information Processing ◽

10.1017/atsip.2020.21 ◽

2020 ◽

Vol 9 ◽

Author(s):

Jian Wu ◽

Qingxiong Yang

Keyword(s):

Point Cloud ◽

Large Scale ◽

Ground Plane ◽

Semantic Segmentation ◽

Point Clouds ◽

Autonomous Driving ◽

Urban Environments ◽

Cloud Data ◽

Dense Point ◽

3D Lidar

In this paper, we study the semantic segmentation of 3D LiDAR point cloud data in urban environments for autonomous driving, and a method utilizing the surface information of the ground plane was proposed. In practice, the resolution of a LiDAR sensor installed in a self-driving vehicle is relatively low and thus the acquired point cloud is indeed quite sparse. While recent work on dense point cloud segmentation has achieved promising results, the performance is relatively low when directly applied to sparse point clouds. This paper is focusing on semantic segmentation of the sparse point clouds obtained from 32-channel LiDAR sensor with deep neural networks. The main contribution is the integration of the ground information which is used to group ground points far away from each other. Qualitative and quantitative experiments on two large-scale point cloud datasets show that the proposed method outperforms the current state-of-the-art.

Download Full-text

Go Wider: An Efficient Neural Network for Point Cloud Analysis via Group Convolutions

Applied Sciences ◽

10.3390/app10072391 ◽

2020 ◽

Vol 10 (7) ◽

pp. 2391

Author(s):

Can Chen ◽

Luca Zanotti Fragonara ◽

Antonios Tsourdos

Keyword(s):

Neural Network ◽

Neural Networks ◽

Point Cloud ◽

Large Scale ◽

Semantic Segmentation ◽

Point Clouds ◽

Autonomous Driving ◽

Fine Grained ◽

Point Cloud Analysis ◽

Cloud Analysis

In order to achieve a better performance for point cloud analysis, many researchers apply deep neural networks using stacked Multi-Layer-Perceptron (MLP) convolutions over an irregular point cloud. However, applying these dense MLP convolutions over a large amount of points (e.g., autonomous driving application) leads to limitations due to the computation and memory capabilities. To achieve higher performances but decrease the computational complexity, we propose a deep-wide neural network, named ShufflePointNet, which can exploit fine-grained local features, but also reduce redundancies using group convolution and channel shuffle operation. Unlike conventional operations that directly apply MLPs on the high-dimensional features of a point cloud, our model goes “wider” by splitting features into groups with smaller depth in advance, having the respective MLP computations applied only to a single group, which can significantly reduce complexity and computation. At the same time, we allow communication between groups by shuffling the feature channel to capture fine-grained features. We further discuss the multi-branch method for wider neural networks being also beneficial to feature extraction for point clouds. We present extensive experiments for shape classification tasks on a ModelNet40 dataset and semantic segmentation task on large scale datasets ShapeNet part, S3DIS and KITTI. Finally, we carry out an ablation study and compare our model to other state-of-the-art algorithms to show its efficiency in terms of complexity and accuracy.

Download Full-text

Towards a Meaningful 3D Map Using a 3D Lidar and a Camera

Sensors ◽

10.3390/s18082571 ◽

2018 ◽

Vol 18 (8) ◽

pp. 2571 ◽

Cited By ~ 8

Author(s):

Jongmin Jeong ◽

Tae Yoon ◽

Jin Park

Keyword(s):

Large Scale ◽

Semantic Information ◽

Semantic Segmentation ◽

Point Clouds ◽

Semantic Mapping ◽

Alignment Error ◽

Semantic Labeling ◽

Moving Vehicles ◽

3D Lidar

Semantic 3D maps are required for various applications including robot navigation and surveying, and their importance has significantly increased. Generally, existing studies on semantic mapping were camera-based approaches that could not be operated in large-scale environments owing to their computational burden. Recently, a method of combining a 3D Lidar with a camera was introduced to address this problem, and a 3D Lidar and a camera were also utilized for semantic 3D mapping. In this study, our algorithm consists of semantic mapping and map refinement. In the semantic mapping, a GPS and an IMU are integrated to estimate the odometry of the system, and subsequently, the point clouds measured from a 3D Lidar are registered by using this information. Furthermore, we use the latest CNN-based semantic segmentation to obtain semantic information on the surrounding environment. To integrate the point cloud with semantic information, we developed incremental semantic labeling including coordinate alignment, error minimization, and semantic information fusion. Additionally, to improve the quality of the generated semantic map, the map refinement is processed in a batch. It enhances the spatial distribution of labels and removes traces produced by moving vehicles effectively. We conduct experiments on challenging sequences to demonstrate that our algorithm outperforms state-of-the-art methods in terms of accuracy and intersection over union.

Download Full-text

Transformer Meets Convolution: A Bilateral Awareness Network for Semantic Segmentation of Very Fine Resolution Urban Scene Images

Remote Sensing ◽

10.3390/rs13163065 ◽

2021 ◽

Vol 13 (16) ◽

pp. 3065

Author(s):

Libo Wang ◽

Rui Li ◽

Dongzhi Wang ◽

Chenxi Duan ◽

Teng Wang ◽

...

Keyword(s):

Large Scale ◽

Texture Features ◽

Semantic Segmentation ◽

Autonomous Driving ◽

Research Field ◽

Learning Approaches ◽

Fine Grained ◽

Urban Scene ◽

Fine Resolution ◽

With Memory

Semantic segmentation from very fine resolution (VFR) urban scene images plays a significant role in several application scenarios including autonomous driving, land cover classification, urban planning, etc. However, the tremendous details contained in the VFR image, especially the considerable variations in scale and appearance of objects, severely limit the potential of the existing deep learning approaches. Addressing such issues represents a promising research field in the remote sensing community, which paves the way for scene-level landscape pattern analysis and decision making. In this paper, we propose a Bilateral Awareness Network which contains a dependency path and a texture path to fully capture the long-range relationships and fine-grained details in VFR images. Specifically, the dependency path is conducted based on the ResT, a novel Transformer backbone with memory-efficient multi-head self-attention, while the texture path is built on the stacked convolution operation. In addition, using the linear attention mechanism, a feature aggregation module is designed to effectively fuse the dependency features and texture features. Extensive experiments conducted on the three large-scale urban scene image segmentation datasets, i.e., ISPRS Vaihingen dataset, ISPRS Potsdam dataset, and UAVid dataset, demonstrate the effectiveness of our BANet. Specifically, a 64.6% mIoU is achieved on the UAVid dataset.

Download Full-text

BushNet: Effective semantic segmentation of bush in large-scale point clouds

Computers and Electronics in Agriculture ◽

10.1016/j.compag.2021.106653 ◽

2022 ◽

Vol 193 ◽

pp. 106653

Author(s):

Hejun Wei ◽

Enyong Xu ◽

Jinlai Zhang ◽

Yanmei Meng ◽

Jin Wei ◽

...

Keyword(s):

Large Scale ◽

Semantic Segmentation ◽

Point Clouds ◽

Scale Point

Download Full-text

TUM-MLS-2016: An Annotated Mobile LiDAR Dataset of the TUM City Campus for Semantic Point Cloud Interpretation in Urban Areas

Remote Sensing ◽

10.3390/rs12111875 ◽

2020 ◽

Vol 12 (11) ◽

pp. 1875 ◽

Cited By ~ 1

Author(s):

Jingwei Zhu ◽

Joachim Gehrung ◽

Rong Huang ◽

Björn Borgmann ◽

Zhenghao Sun ◽

...

Keyword(s):

Test Data ◽

Urban Areas ◽

Point Cloud ◽

Large Scale ◽

Point Clouds ◽

Semantic Interpretation ◽

3D Point Clouds ◽

Semantic Labeling ◽

Benchmark Datasets ◽

Semantic Point

In the past decade, a vast amount of strategies, methods, and algorithms have been developed to explore the semantic interpretation of 3D point clouds for extracting desirable information. To assess the performance of the developed algorithms or methods, public standard benchmark datasets should invariably be introduced and used, which serve as an indicator and ruler in the evaluation and comparison. In this work, we introduce and present large-scale Mobile LiDAR point clouds acquired at the city campus of the Technical University of Munich, which have been manually annotated and can be used for the evaluation of related algorithms and methods for semantic point cloud interpretation. We created three datasets from a measurement campaign conducted in April 2016, including a benchmark dataset for semantic labeling, test data for instance segmentation, and test data for annotated single 360 ° laser scans. These datasets cover an urban area of approximately 1 km long roadways and include more than 40 million annotated points with eight classes of objects labeled. Moreover, experiments were carried out with results from several baseline methods compared and analyzed, revealing the quality of this dataset and its effectiveness when using it for performance evaluation.

Download Full-text

Edge-Convolution Point Net for Semantic Segmentation of Large-Scale Point Clouds

IGARSS 2019 - 2019 IEEE International Geoscience and Remote Sensing Symposium ◽

10.1109/igarss.2019.8899303 ◽

2019 ◽

Cited By ~ 1

Author(s):

Jhonatan Contreras ◽

Joachim Denzler

Keyword(s):

Large Scale ◽

Semantic Segmentation ◽

Point Clouds ◽

Scale Point

Download Full-text

Real-Time Semantic Segmentation with Dual Encoder and Self-Attention Mechanism for Autonomous Driving

Sensors ◽

10.3390/s21238072 ◽

2021 ◽

Vol 21 (23) ◽

pp. 8072

Author(s):

Yu-Bang Chang ◽

Chieh Tsai ◽

Chang-Hong Lin ◽

Poki Chen

Keyword(s):

Deep Learning ◽

Real Time ◽

Network Architecture ◽

Semantic Segmentation ◽

Autonomous Driving ◽

Attention Mechanism ◽

Trade Off ◽

Segmentation Methods ◽

General Semantic ◽

Deep Learning Model

As the techniques of autonomous driving become increasingly valued and universal, real-time semantic segmentation has become very popular and challenging in the field of deep learning and computer vision in recent years. However, in order to apply the deep learning model to edge devices accompanying sensors on vehicles, we need to design a structure that has the best trade-off between accuracy and inference time. In previous works, several methods sacrificed accuracy to obtain a faster inference time, while others aimed to find the best accuracy under the condition of real time. Nevertheless, the accuracies of previous real-time semantic segmentation methods still have a large gap compared to general semantic segmentation methods. As a result, we propose a network architecture based on a dual encoder and a self-attention mechanism. Compared with preceding works, we achieved a 78.6% mIoU with a speed of 39.4 FPS with a 1024 × 2048 resolution on a Cityscapes test submission.

Download Full-text