scholarly journals Point Cloud Semantic Segmentation Using a Deep Learning Framework for Cultural Heritage

2020 ◽  
Vol 12 (6) ◽  
pp. 1005 ◽  
Author(s):  
Roberto Pierdicca ◽  
Marina Paolanti ◽  
Francesca Matrone ◽  
Massimo Martini ◽  
Christian Morbidoni ◽  
...  

In the Digital Cultural Heritage (DCH) domain, the semantic segmentation of 3D Point Clouds with Deep Learning (DL) techniques can help to recognize historical architectural elements, at an adequate level of detail, and thus speed up the process of modeling of historical buildings for developing BIM models from survey data, referred to as HBIM (Historical Building Information Modeling). In this paper, we propose a DL framework for Point Cloud segmentation, which employs an improved DGCNN (Dynamic Graph Convolutional Neural Network) by adding meaningful features such as normal and colour. The approach has been applied to a newly collected DCH Dataset which is publicy available: ArCH (Architectural Cultural Heritage) Dataset. This dataset comprises 11 labeled points clouds, derived from the union of several single scans or from the integration of the latter with photogrammetric surveys. The involved scenes are both indoor and outdoor, with churches, chapels, cloisters, porticoes and loggias covered by a variety of vaults and beared by many different types of columns. They belong to different historical periods and different styles, in order to make the dataset the least possible uniform and homogeneous (in the repetition of the architectural elements) and the results as general as possible. The experiments yield high accuracy, demonstrating the effectiveness and suitability of the proposed approach.

Author(s):  
E. S. Malinverni ◽  
R. Pierdicca ◽  
M. Paolanti ◽  
M. Martini ◽  
C. Morbidoni ◽  
...  

<p><strong>Abstract.</strong> Cultural Heritage is a testimony of past human activity, and, as such, its objects exhibit great variety in their nature, size and complexity; from small artefacts and museum items to cultural landscapes, from historical building and ancient monuments to city centers and archaeological sites. Cultural Heritage around the globe suffers from wars, natural disasters and human negligence. The importance of digital documentation is well recognized and there is an increasing pressure to document our heritage both nationally and internationally. For this reason, the three-dimensional scanning and modeling of sites and artifacts of cultural heritage have remarkably increased in recent years. The semantic segmentation of point clouds is an essential step of the entire pipeline; in fact, it allows to decompose complex architectures in single elements, which are then enriched with meaningful information within Building Information Modelling software. Notwithstanding, this step is very time consuming and completely entrusted on the manual work of domain experts, far from being automatized. This work describes a method to label and cluster automatically a point cloud based on a supervised Deep Learning approach, using a state-of-the-art Neural Network called PointNet++. Despite other methods are known, we have choose PointNet++ as it reached significant results for classifying and segmenting 3D point clouds. PointNet++ has been tested and improved, by training the network with annotated point clouds coming from a real survey and to evaluate how performance changes according to the input training data. It can result of great interest for the research community dealing with the point cloud semantic segmentation, since it makes public a labelled dataset of CH elements for further tests.</p>


Author(s):  
Massimo Martini ◽  
Roberto Pierdicca ◽  
Marina Paolanti ◽  
Ramona Quattrini ◽  
Eva Savina Malinverni ◽  
...  

In the Cultural Heritage (CH) domain, the semantic segmentation of 3D point clouds with Deep Learning (DL) techniques allows to recognize historical architectural elements, at a suitable level of detail, and hence expedite the process of modelling historical buildings for the development of BIM models from survey data. However, it is more difficult to collect a balanced dataset of labelled architectural elements for training a network. In fact, the CH objects are unique, and it is challenging for the network to recognize this kind of data. In recent years, Generative Networks have proven to be proper for generating new data. Starting from such premises, in this paper Generative Networks have been used for augmenting a CH dataset. In particular, the performances of three state-of-art Generative Networks such as PointGrow, PointFLow and PointGMM have been compared in terms of Jensen-Shannon Divergence (JSD), the Minimum Matching Distance-Chamfer Distance (MMD-CD) and the Minimum Matching Distance-Earth Mover’s Distance (MMD-EMD). The objects generated have been used for augmenting two classes of ArCH dataset, which are columns and windows. Then a DGCNN-Mod network was trained and tested for the semantic segmentation task, comparing the performance in the case of the ArCH dataset without and with augmentation.


2020 ◽  
Vol 9 (9) ◽  
pp. 535
Author(s):  
Francesca Matrone ◽  
Eleonora Grilli ◽  
Massimo Martini ◽  
Marina Paolanti ◽  
Roberto Pierdicca ◽  
...  

In recent years semantic segmentation of 3D point clouds has been an argument that involves different fields of application. Cultural heritage scenarios have become the subject of this study mainly thanks to the development of photogrammetry and laser scanning techniques. Classification algorithms based on machine and deep learning methods allow to process huge amounts of data as 3D point clouds. In this context, the aim of this paper is to make a comparison between machine and deep learning methods for large 3D cultural heritage classification. Then, considering the best performances of both techniques, it proposes an architecture named DGCNN-Mod+3Dfeat that combines the positive aspects and advantages of these two methodologies for semantic segmentation of cultural heritage point clouds. To demonstrate the validity of our idea, several experiments from the ArCH benchmark are reported and commented.


Sensors ◽  
2021 ◽  
Vol 21 (4) ◽  
pp. 1228
Author(s):  
Ting On Chan ◽  
Linyuan Xia ◽  
Yimin Chen ◽  
Wei Lang ◽  
Tingting Chen ◽  
...  

Ancient pagodas are usually parts of hot tourist spots in many oriental countries due to their unique historical backgrounds. They are usually polygonal structures comprised by multiple floors, which are separated by eaves. In this paper, we propose a new method to investigate both the rotational and reflectional symmetry of such polygonal pagodas through developing novel geometric models to fit to the 3D point clouds obtained from photogrammetric reconstruction. The geometric model consists of multiple polygonal pyramid/prism models but has a common central axis. The method was verified by four datasets collected by an unmanned aerial vehicle (UAV) and a hand-held digital camera. The results indicate that the models fit accurately to the pagodas’ point clouds. The symmetry was realized by rotating and reflecting the pagodas’ point clouds after a complete leveling of the point cloud was achieved using the estimated central axes. The results show that there are RMSEs of 5.04 cm and 5.20 cm deviated from the perfect (theoretical) rotational and reflectional symmetries, respectively. This concludes that the examined pagodas are highly symmetric, both rotationally and reflectionally. The concept presented in the paper not only work for polygonal pagodas, but it can also be readily transformed and implemented for other applications for other pagoda-like objects such as transmission towers.


2021 ◽  
Vol 11 (19) ◽  
pp. 8996
Author(s):  
Yuwei Cao ◽  
Marco Scaioni

In current research, fully supervised Deep Learning (DL) techniques are employed to train a segmentation network to be applied to point clouds of buildings. However, training such networks requires large amounts of fine-labeled buildings’ point-cloud data, presenting a major challenge in practice because they are difficult to obtain. Consequently, the application of fully supervised DL for semantic segmentation of buildings’ point clouds at LoD3 level is severely limited. In order to reduce the number of required annotated labels, we proposed a novel label-efficient DL network that obtains per-point semantic labels of LoD3 buildings’ point clouds with limited supervision, named 3DLEB-Net. In general, it consists of two steps. The first step (Autoencoder, AE) is composed of a Dynamic Graph Convolutional Neural Network (DGCNN) encoder and a folding-based decoder. It is designed to extract discriminative global and local features from input point clouds by faithfully reconstructing them without any label. The second step is the semantic segmentation network. By supplying a small amount of task-specific supervision, a segmentation network is proposed for semantically segmenting the encoded features acquired from the pre-trained AE. Experimentally, we evaluated our approach based on the Architectural Cultural Heritage (ArCH) dataset. Compared to the fully supervised DL methods, we found that our model achieved state-of-the-art results on the unseen scenes, with only 10% of labeled training data from fully supervised methods as input. Moreover, we conducted a series of ablation studies to show the effectiveness of the design choices of our model.


Sensors ◽  
2020 ◽  
Vol 20 (8) ◽  
pp. 2161 ◽  
Author(s):  
Arnadi Murtiyoso ◽  
Pierre Grussenmeyer

3D heritage documentation has seen a surge in the past decade due to developments in reality-based 3D recording techniques. Several methods such as photogrammetry and laser scanning are becoming ubiquitous amongst architects, archaeologists, surveyors, and conservators. The main result of these methods is a 3D representation of the object in the form of point clouds. However, a solely geometric point cloud is often insufficient for further analysis, monitoring, and model predicting of the heritage object. The semantic annotation of point clouds remains an interesting research topic since traditionally it requires manual labeling and therefore a lot of time and resources. This paper proposes an automated pipeline to segment and classify multi-scalar point clouds in the case of heritage object. This is done in order to perform multi-level segmentation from the scale of a historical neighborhood up until that of architectural elements, specifically pillars and beams. The proposed workflow involves an algorithmic approach in the form of a toolbox which includes various functions covering the semantic segmentation of large point clouds into smaller, more manageable and semantically labeled clusters. The first part of the workflow will explain the segmentation and semantic labeling of heritage complexes into individual buildings, while a second part will discuss the use of the same toolbox to segment the resulting buildings further into architectural elements. The toolbox was tested on several historical buildings and showed promising results. The ultimate intention of the project is to help the manual point cloud labeling, especially when confronted with the large training data requirements of machine learning-based algorithms.


Sensors ◽  
2020 ◽  
Vol 20 (12) ◽  
pp. 3568 ◽  
Author(s):  
Takayuki Shinohara ◽  
Haoyi Xiu ◽  
Masashi Matsuoka

In the computer vision field, many 3D deep learning models that directly manage 3D point clouds (proposed after PointNet) have been published. Moreover, deep learning-based-techniques have demonstrated state-of-the-art performance for supervised learning tasks on 3D point cloud data, such as classification and segmentation tasks for open datasets in competitions. Furthermore, many researchers have attempted to apply these deep learning-based techniques to 3D point clouds observed by aerial laser scanners (ALSs). However, most of these studies were developed for 3D point clouds without radiometric information. In this paper, we investigate the possibility of using a deep learning method to solve the semantic segmentation task of airborne full-waveform light detection and ranging (lidar) data that consists of geometric information and radiometric waveform data. Thus, we propose a data-driven semantic segmentation model called the full-waveform network (FWNet), which handles the waveform of full-waveform lidar data without any conversion process, such as projection onto a 2D grid or calculating handcrafted features. Our FWNet is based on a PointNet-based architecture, which can extract the local and global features of each input waveform data, along with its corresponding geographical coordinates. Subsequently, the classifier consists of 1D convolutional operational layers, which predict the class vector corresponding to the input waveform from the extracted local and global features. Our trained FWNet achieved higher scores in its recall, precision, and F1 score for unseen test data—higher scores than those of previously proposed methods in full-waveform lidar data analysis domain. Specifically, our FWNet achieved a mean recall of 0.73, a mean precision of 0.81, and a mean F1 score of 0.76. We further performed an ablation study, that is assessing the effectiveness of our proposed method, of the above-mentioned metric. Moreover, we investigated the effectiveness of our PointNet based local and global feature extraction method using the visualization of the feature vector. In this way, we have shown that our network for local and global feature extraction allows training with semantic segmentation without requiring expert knowledge on full-waveform lidar data or translation into 2D images or voxels.


2020 ◽  
Vol 12 (1) ◽  
pp. 178 ◽  
Author(s):  
Jinming Zhang ◽  
Xiangyun Hu ◽  
Hengming Dai ◽  
ShenRun Qu

It is difficult to extract a digital elevation model (DEM) from an airborne laser scanning (ALS) point cloud in a forest area because of the irregular and uneven distribution of ground and vegetation points. Machine learning, especially deep learning methods, has shown powerful feature extraction in accomplishing point cloud classification. However, most of the existing deep learning frameworks, such as PointNet, dynamic graph convolutional neural network (DGCNN), and SparseConvNet, cannot consider the particularity of ALS point clouds. For large-scene laser point clouds, the current data preprocessing methods are mostly based on random sampling, which is not suitable for DEM extraction tasks. In this study, we propose a novel data sampling algorithm for the data preparation of patch-based training and classification named T-Sampling. T-Sampling uses the set of the lowest points in a certain area as basic points with other points added to supplement it, which can guarantee the integrity of the terrain in the sampling area. In the learning part, we propose a new convolution model based on terrain named Tin-EdgeConv that fully considers the spatial relationship between ground and non-ground points when constructing a directed graph. We design a new network based on Tin-EdgeConv to extract local features and use PointNet architecture to extract global context information. Finally, we combine this information effectively with a designed attention fusion module. These aspects are important in achieving high classification accuracy. We evaluate the proposed method by using large-scale data from forest areas. Results show that our method is more accurate than existing algorithms.


2019 ◽  
Vol 8 (5) ◽  
pp. 213 ◽  
Author(s):  
Florent Poux ◽  
Roland Billen

Automation in point cloud data processing is central in knowledge discovery within decision-making systems. The definition of relevant features is often key for segmentation and classification, with automated workflows presenting the main challenges. In this paper, we propose a voxel-based feature engineering that better characterize point clusters and provide strong support to supervised or unsupervised classification. We provide different feature generalization levels to permit interoperable frameworks. First, we recommend a shape-based feature set (SF1) that only leverages the raw X, Y, Z attributes of any point cloud. Afterwards, we derive relationship and topology between voxel entities to obtain a three-dimensional (3D) structural connectivity feature set (SF2). Finally, we provide a knowledge-based decision tree to permit infrastructure-related classification. We study SF1/SF2 synergy on a new semantic segmentation framework for the constitution of a higher semantic representation of point clouds in relevant clusters. Finally, we benchmark the approach against novel and best-performing deep-learning methods while using the full S3DIS dataset. We highlight good performances, easy-integration, and high F1-score (> 85%) for planar-dominant classes that are comparable to state-of-the-art deep learning.


Author(s):  
E. Widyaningrum ◽  
M. K. Fajari ◽  
R. C. Lindenbergh ◽  
M. Hahn

Abstract. Automation of 3D LiDAR point cloud processing is expected to increase the production rate of many applications including automatic map generation. Fast development on high-end hardware has boosted the expansion of deep learning research for 3D classification and segmentation. However, deep learning requires large amount of high quality training samples. The generation of training samples for accurate classification results, especially for airborne point cloud data, is still problematic. Moreover, which customized features should be used best for segmenting airborne point cloud data is still unclear. This paper proposes semi-automatic point cloud labelling and examines the potential of combining different tailor-made features for pointwise semantic segmentation of an airborne point cloud. We implement a Dynamic Graph CNN (DGCNN) approach to classify airborne point cloud data into four land cover classes: bare-land, trees, buildings and roads. The DGCNN architecture is chosen as this network relates two approaches, PointNet and graph CNNs, to exploit the geometric relationships between points. For experiments, we train an airborne point cloud and co-aligned orthophoto of the Surabaya city area of Indonesia to DGCNN using three different tailor-made feature combinations: points with RGB (Red, Green, Blue) color, points with original LiDAR features (Intensity, Return number, Number of returns) so-called IRN, and points with two spectral colors and Intensity (Red, Green, Intensity) so-called RGI. The overall accuracy of the testing area indicates that using RGB information gives the best segmentation results of 81.05% while IRN and RGI gives accuracy values of 76.13%, and 79.81%, respectively.


Sign in / Sign up

Export Citation Format

Share Document