Spatial Aggregation Net: Point Cloud Semantic Segmentation Based on Multi-Directional Convolution

Semantic segmentation of 3D point clouds plays a vital role in autonomous driving, 3D maps, and smart cities, etc. Recent work such as PointSIFT shows that spatial structure information can improve the performance of semantic segmentation. Motivated by this phenomenon, we propose Spatial Aggregation Net (SAN) for point cloud semantic segmentation. SAN is based on multi-directional convolution scheme that utilizes the spatial structure information of point cloud. Firstly, Octant-Search is employed to capture the neighboring points around each sampled point. Secondly, we use multi-directional convolution to extract information from different directions of sampled points. Finally, max-pooling is used to aggregate information from different directions. The experimental results conducted on ScanNet database show that the proposed SAN has comparable results with state-of-the-art algorithms such as PointNet, PointNet++, and PointSIFT, etc. In particular, our method has better performance on flat, small objects, and the edge areas that connect objects. Moreover, our model has good trade-off in segmentation accuracy and time complexity.

Download Full-text

Semantic Segmentation of 3D Point Cloud Based on Spatial Eight-Quadrant Kernel Convolution

Remote Sensing ◽

10.3390/rs13163140 ◽

2021 ◽

Vol 13 (16) ◽

pp. 3140

Author(s):

Liman Liu ◽

Jinjin Yu ◽

Longyu Tan ◽

Wanjuan Su ◽

Lin Zhao ◽

...

Keyword(s):

Point Cloud ◽

Poor Performance ◽

Semantic Segmentation ◽

Point Clouds ◽

Small Object ◽

Fine Grained ◽

3D Point Clouds ◽

Kernel Convolution ◽

Segmentation Accuracy ◽

Indoor Scenes

In order to deal with the problem that some existing semantic segmentation networks for 3D point clouds generally have poor performance on small objects, a Spatial Eight-Quadrant Kernel Convolution (SEQKC) algorithm is proposed to enhance the ability of the network for extracting fine-grained features from 3D point clouds. As a result, the semantic segmentation accuracy of small objects in indoor scenes can be improved. To be specific, in the spherical space of the point cloud neighborhoods, a kernel point with attached weights is constructed in each octant, the distances between the kernel point and the points in its neighborhood are calculated, and the distance and the kernel points’ weights are used together to weight the point cloud features in the neighborhood space. In this case, the relationship between points are modeled, so that the local fine-grained features of the point clouds can be extracted by the SEQKC. Based on the SEQKC, we design a downsampling module for point clouds, and embed it into classical semantic segmentation networks (PointNet++, PointSIFT and PointConv) for semantic segmentation. Experimental results on benchmark dataset ScanNet V2 show that SEQKC-based PointNet++, PointSIFT and PointConv outperform the original networks about 1.35–2.12% in terms of MIoU, and they effectively improve the semantic segmentation performance of the networks for small objects of indoor scenes, e.g., the segmentation accuracy of small object “picture” is improved from 0.70% of PointNet++ to 10.37% of SEQKC-PointNet++.

Download Full-text

Point Cloud Semantic Segmentation Using a Deep Learning Framework for Cultural Heritage

Remote Sensing ◽

10.3390/rs12061005 ◽

2020 ◽

Vol 12 (6) ◽

pp. 1005 ◽

Cited By ~ 7

Author(s):

Roberto Pierdicca ◽

Marina Paolanti ◽

Francesca Matrone ◽

Massimo Martini ◽

Christian Morbidoni ◽

...

Keyword(s):

Deep Learning ◽

Cultural Heritage ◽

Point Cloud ◽

Semantic Segmentation ◽

Point Clouds ◽

Information Modeling ◽

Dynamic Graph ◽

Historical Building ◽

Architectural Elements ◽

3D Point Clouds

In the Digital Cultural Heritage (DCH) domain, the semantic segmentation of 3D Point Clouds with Deep Learning (DL) techniques can help to recognize historical architectural elements, at an adequate level of detail, and thus speed up the process of modeling of historical buildings for developing BIM models from survey data, referred to as HBIM (Historical Building Information Modeling). In this paper, we propose a DL framework for Point Cloud segmentation, which employs an improved DGCNN (Dynamic Graph Convolutional Neural Network) by adding meaningful features such as normal and colour. The approach has been applied to a newly collected DCH Dataset which is publicy available: ArCH (Architectural Cultural Heritage) Dataset. This dataset comprises 11 labeled points clouds, derived from the union of several single scans or from the integration of the latter with photogrammetric surveys. The involved scenes are both indoor and outdoor, with churches, chapels, cloisters, porticoes and loggias covered by a variety of vaults and beared by many different types of columns. They belong to different historical periods and different styles, in order to make the dataset the least possible uniform and homogeneous (in the repetition of the architectural elements) and the results as general as possible. The experiments yield high accuracy, demonstrating the effectiveness and suitability of the proposed approach.

Download Full-text

JSNet: Joint Instance and Semantic Segmentation of 3D Point Clouds

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i07.6994 ◽

2020 ◽

Vol 34 (07) ◽

pp. 12951-12958 ◽

Cited By ~ 3

Author(s):

Lin Zhao ◽

Wenbing Tao

Keyword(s):

Point Cloud ◽

Large Scale ◽

Feature Fusion ◽

Mean Shift ◽

Semantic Segmentation ◽

Point Clouds ◽

Semantic Features ◽

Backbone Network ◽

3D Point Clouds ◽

Instance Segmentation

In this paper, we propose a novel joint instance and semantic segmentation approach, which is called JSNet, in order to address the instance and semantic segmentation of 3D point clouds simultaneously. Firstly, we build an effective backbone network to extract robust features from the raw point clouds. Secondly, to obtain more discriminative features, a point cloud feature fusion module is proposed to fuse the different layer features of the backbone network. Furthermore, a joint instance semantic segmentation module is developed to transform semantic features into instance embedding space, and then the transformed features are further fused with instance features to facilitate instance segmentation. Meanwhile, this module also aggregates instance features into semantic feature space to promote semantic segmentation. Finally, the instance predictions are generated by applying a simple mean-shift clustering on instance embeddings. As a result, we evaluate the proposed JSNet on a large-scale 3D indoor point cloud dataset S3DIS and a part dataset ShapeNet, and compare it with existing approaches. Experimental results demonstrate our approach outperforms the state-of-the-art method in 3D instance segmentation with a significant improvement in 3D semantic prediction and our method is also beneficial for part segmentation. The source code for this work is available at https://github.com/dlinzhao/JSNet.

Download Full-text

GENERATION OF GROUND TRUTH DATASETS FOR THE ANALYSIS OF 3D POINT CLOUDS IN URBAN SCENES ACQUIRED VIA DIFFERENT SENSORS

ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences ◽

10.5194/isprs-archives-xlii-3-2009-2018 ◽

2018 ◽

Vol XLII-3 ◽

pp. 2009-2015 ◽

Cited By ~ 1

Author(s):

Y. Xu ◽

Z. Sun ◽

R. Boerner ◽

T. Koch ◽

L. Hoegner ◽

...

Keyword(s):

Point Cloud ◽

Spatial Information ◽

Ground Truth ◽

Semantic Segmentation ◽

Point Clouds ◽

Urban Scenes ◽

3D Space ◽

3D Point Clouds ◽

Testing Site ◽

Voxel Grid

In this work, we report a novel way of generating ground truth dataset for analyzing point cloud from different sensors and the validation of algorithms. Instead of directly labeling large amount of 3D points requiring time consuming manual work, a multi-resolution 3D voxel grid for the testing site is generated. Then, with the help of a set of basic labeled points from the reference dataset, we can generate a 3D labeled space of the entire testing site with different resolutions. Specifically, an octree-based voxel structure is applied to voxelize the annotated reference point cloud, by which all the points are organized by 3D grids of multi-resolutions. When automatically annotating the new testing point clouds, a voting based approach is adopted to the labeled points within multiple resolution voxels, in order to assign a semantic label to the 3D space represented by the voxel. Lastly, robust line- and plane-based fast registration methods are developed for aligning point clouds obtained via various sensors. Benefiting from the labeled 3D spatial information, we can easily create new annotated 3D point clouds of different sensors of the same scene directly by considering the corresponding labels of 3D space the points located, which would be convenient for the validation and evaluation of algorithms related to point cloud interpretation and semantic segmentation.

Download Full-text

TESSERAE3D: A BENCHMARK FOR TESSERAE SEMANTIC SEGMENTATION IN 3D POINT CLOUDS

ISPRS Annals of Photogrammetry Remote Sensing and Spatial Information Sciences ◽

10.5194/isprs-annals-v-2-2021-121-2021 ◽

2021 ◽

Vol V-2-2021 ◽

pp. 121-128

Author(s):

A. Kharroubi ◽

L. Van Wersch ◽

R. Billen ◽

F. Poux

Keyword(s):

Point Cloud ◽

Semantic Segmentation ◽

Point Clouds ◽

3D Point Cloud ◽

Learning Models ◽

Semantic Classes ◽

3D Point Clouds ◽

Wide Range ◽

Extraction Pattern ◽

Automated Processes

Abstract. 3D point cloud of mosaic tesserae is used by heritage researchers, restorers and archaeologists for digital investigations. Information extraction, pattern analysis and semantic assignment are necessary to complement the geometric information. Automated processes that can speed up the task are highly sought after, especially new supervised approaches. However, the availability of labelled data necessary for training supervised learning models is a significant constraint. This paper introduces Tesserae3D, a 3D point cloud benchmark dataset for training and evaluating machine learning models, applied to mosaic tesserae segmentation. It is a publicly available, very high density and coloured dataset, accompanied by a standard multi-class semantic segmentation baseline. It consists of about 502 million points and contains 11 semantic classes covering a wide range of tesserae types. We propose a semantic segmentation baseline building on radiometric and covariance features fed to ensemble learning methods. The results delineate an achievable 89% F1-score and are made available under https://github.com/akharroubi/Tesserae3D, providing a simple interface to improve the score based on feedback from the research community.

Download Full-text

Review: Deep Learning on 3D Point Clouds

Remote Sensing ◽

10.3390/rs12111729 ◽

2020 ◽

Vol 12 (11) ◽

pp. 1729 ◽

Cited By ~ 4

Author(s):

Saifullahi Aminu Bello ◽

Shangshu Yu ◽

Cheng Wang ◽

Jibril Muhmmad Adam ◽

Jonathan Li

Keyword(s):

Deep Learning ◽

Point Cloud ◽

General Structure ◽

Point Clouds ◽

Autonomous Driving ◽

Point Cloud Data ◽

3D Vision ◽

Cloud Data ◽

3D Point Clouds ◽

Learning Techniques

A point cloud is a set of points defined in a 3D metric space. Point clouds have become one of the most significant data formats for 3D representation and are gaining increased popularity as a result of the increased availability of acquisition devices, as well as seeing increased application in areas such as robotics, autonomous driving, and augmented and virtual reality. Deep learning is now the most powerful tool for data processing in computer vision and is becoming the most preferred technique for tasks such as classification, segmentation, and detection. While deep learning techniques are mainly applied to data with a structured grid, the point cloud, on the other hand, is unstructured. The unstructuredness of point clouds makes the use of deep learning for its direct processing very challenging. This paper contains a review of the recent state-of-the-art deep learning techniques, mainly focusing on raw point cloud data. The initial work on deep learning directly with raw point cloud data did not model local regions; therefore, subsequent approaches model local regions through sampling and grouping. More recently, several approaches have been proposed that not only model the local regions but also explore the correlation between points in the local regions. From the survey, we conclude that approaches that model local regions and take into account the correlation between points in the local regions perform better. Contrary to existing reviews, this paper provides a general structure for learning with raw point clouds, and various methods were compared based on the general structure. This work also introduces the popular 3D point cloud benchmark datasets and discusses the application of deep learning in popular 3D vision tasks, including classification, segmentation, and detection.

Download Full-text

Multi-Scale Attentive Aggregation for LiDAR Point Cloud Segmentation

Remote Sensing ◽

10.3390/rs13040691 ◽

2021 ◽

Vol 13 (4) ◽

pp. 691

Author(s):

Xiaoxiao Geng ◽

Shunping Ji ◽

Meng Lu ◽

Lingli Zhao

Keyword(s):

Point Cloud ◽

Semantic Segmentation ◽

Point Clouds ◽

Feature Representation ◽

Channel Structure ◽

Structure Information ◽

Multi Scale ◽

Point Cloud Segmentation ◽

Global Consistency ◽

Decoder Architecture

Semantic segmentation of LiDAR point clouds has implications in self-driving, robots, and augmented reality, among others. In this paper, we propose a Multi-Scale Attentive Aggregation Network (MSAAN) to achieve the global consistency of point cloud feature representation and super segmentation performance. First, upon a baseline encoder-decoder architecture for point cloud segmentation, namely, RandLA-Net, an attentive skip connection was proposed to replace the commonly used concatenation to balance the encoder and decoder features of the same scales. Second, a channel attentive enhancement module was introduced to the local attention enhancement module to boost the local feature discriminability and aggregate the local channel structure information. Third, we developed a multi-scale feature aggregation method to capture the global structure of a point cloud from both the encoder and the decoder. The experimental results reported that our MSAAN significantly outperformed state-of-the-art methods, i.e., at least 15.3% mIoU improvement for scene-2 of CSPC dataset, 5.2% for scene-5 of CSPC dataset, and 6.6% for Toronto3D dataset.

Download Full-text

Ground-distance segmentation of 3D LiDAR point cloud toward autonomous driving

APSIPA Transactions on Signal and Information Processing ◽

10.1017/atsip.2020.21 ◽

2020 ◽

Vol 9 ◽

Author(s):

Jian Wu ◽

Qingxiong Yang

Keyword(s):

Point Cloud ◽

Large Scale ◽

Ground Plane ◽

Semantic Segmentation ◽

Point Clouds ◽

Autonomous Driving ◽

Urban Environments ◽

Cloud Data ◽

Dense Point ◽

3D Lidar

In this paper, we study the semantic segmentation of 3D LiDAR point cloud data in urban environments for autonomous driving, and a method utilizing the surface information of the ground plane was proposed. In practice, the resolution of a LiDAR sensor installed in a self-driving vehicle is relatively low and thus the acquired point cloud is indeed quite sparse. While recent work on dense point cloud segmentation has achieved promising results, the performance is relatively low when directly applied to sparse point clouds. This paper is focusing on semantic segmentation of the sparse point clouds obtained from 32-channel LiDAR sensor with deep neural networks. The main contribution is the integration of the ground information which is used to group ground points far away from each other. Qualitative and quantitative experiments on two large-scale point cloud datasets show that the proposed method outperforms the current state-of-the-art.

Download Full-text

DEEP LEARNING FOR SEMANTIC SEGMENTATION OF 3D POINT CLOUD

ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences ◽

10.5194/isprs-archives-xlii-2-w15-735-2019 ◽

2019 ◽

Vol XLII-2/W15 ◽

pp. 735-742 ◽

Cited By ~ 6

Author(s):

E. S. Malinverni ◽

R. Pierdicca ◽

M. Paolanti ◽

M. Martini ◽

C. Morbidoni ◽

...

Keyword(s):

Deep Learning ◽

Cultural Heritage ◽

Point Cloud ◽

Cultural Landscapes ◽

Three Dimensional ◽

Semantic Segmentation ◽

Point Clouds ◽

Training Data ◽

Historical Building ◽

3D Point Clouds

<p><strong>Abstract.</strong> Cultural Heritage is a testimony of past human activity, and, as such, its objects exhibit great variety in their nature, size and complexity; from small artefacts and museum items to cultural landscapes, from historical building and ancient monuments to city centers and archaeological sites. Cultural Heritage around the globe suffers from wars, natural disasters and human negligence. The importance of digital documentation is well recognized and there is an increasing pressure to document our heritage both nationally and internationally. For this reason, the three-dimensional scanning and modeling of sites and artifacts of cultural heritage have remarkably increased in recent years. The semantic segmentation of point clouds is an essential step of the entire pipeline; in fact, it allows to decompose complex architectures in single elements, which are then enriched with meaningful information within Building Information Modelling software. Notwithstanding, this step is very time consuming and completely entrusted on the manual work of domain experts, far from being automatized. This work describes a method to label and cluster automatically a point cloud based on a supervised Deep Learning approach, using a state-of-the-art Neural Network called PointNet++. Despite other methods are known, we have choose PointNet++ as it reached significant results for classifying and segmenting 3D point clouds. PointNet++ has been tested and improved, by training the network with annotated point clouds coming from a real survey and to evaluate how performance changes according to the input training data. It can result of great interest for the research community dealing with the point cloud semantic segmentation, since it makes public a labelled dataset of CH elements for further tests.</p>

Download Full-text

HYBRID GEOREFERENCING, ENHANCEMENT AND CLASSIFICATION OF ULTRA-HIGH RESOLUTION UAV LIDAR AND IMAGE POINT CLOUDS FOR MONITORING APPLICATIONS

ISPRS Annals of Photogrammetry Remote Sensing and Spatial Information Sciences ◽

10.5194/isprs-annals-v-2-2020-727-2020 ◽

2020 ◽

Vol V-2-2020 ◽

pp. 727-734

Author(s):

N. Haala ◽

M. Kölle ◽

M. Cramer ◽

D. Laupheimer ◽

G. Mandlburger ◽

...

Keyword(s):

Point Cloud ◽

Image Data ◽

Semantic Segmentation ◽

Point Clouds ◽

Airborne Lidar ◽

Image Texture ◽

Image Point ◽

Joint Orientation ◽

3D Point Clouds ◽

3D Data

Abstract. This paper presents a study on the potential of ultra-high accurate UAV-based 3D data capture by combining both imagery and LiDAR data. Our work is motivated by a project aiming at the monitoring of subsidence in an area of mixed use. Thus, it covers built-up regions in a village with a ship lock as the main object of interest as well as regions of agricultural use. In order to monitor potential subsidence in the order of 10 mm/year, we aim at sub-centimeter accuracies of the respective 3D point clouds. We show that hybrid georeferencing helps to increase the accuracy of the adjusted LiDAR point cloud by integrating results from photogrammetric block adjustment to improve the time-dependent trajectory corrections. As our main contribution, we demonstrate that joint orientation of laser scans and images in a hybrid adjustment framework significantly improves the relative and absolute height accuracies. By these means, accuracies corresponding to the GSD of the integrated imagery can be achieved. Image data can also help to enhance the LiDAR point clouds. As an example, integrating results from Multi-View Stereo potentially increases the point density from airborne LiDAR. Furthermore, image texture can support 3D point cloud classification. This semantic segmentation discussed in the final part of the paper is a prerequisite for further enhancement and analysis of the captured point cloud.

Download Full-text