Learning Rotations

Many problems in computer vision today are solved via deep learning. Tasks like pose estimation from images, pose estimation from point clouds or structure from motion can all be formulated as a regression on rotations. However, there is no unique way of parametrizing rotations mathematically: matrices, quaternions, axis-angle representation or Euler angles are all commonly used in the field. Some of them, however, present intrinsic limitations, including discontinuities, gimbal lock or antipodal symmetry. These limitations may make the learning of rotations via neural networks a challenging problem, potentially introducing large errors. Following recent literature, we propose three case studies: a sanity check, a pose estimation from 3D point clouds and an inverse kinematic problem. We do so by employing a full geometric algebra (GA) description of rotations. We compare the GA formulation with a 6D continuous representation previously presented in the literature in terms of regression error and reconstruction accuracy. We empirically demonstrate that parametrizing rotations as bivectors outperforms the 6D representation. The GA approach overcomes the continuity issue of representations as the 6D representation does, but it also needs fewer parameters to be learned and offers an enhanced robustness to noise. GA hence provides a broader framework for describing rotations in a simple and compact way that is suitable for regression tasks via deep learning, showing high regression accuracy and good generalizability in realistic high-noise scenarios.

Download Full-text

Deep learning based pose estimation method using 3D point clouds

10.1109/m2vip49856.2021.9665142 ◽

2021 ◽

Author(s):

Haowen Wang ◽

Shangyou Ai ◽

Chungang Zhuang ◽

Zhenhua Xiong

Keyword(s):

Deep Learning ◽

Pose Estimation ◽

Estimation Method ◽

Point Clouds ◽

3D Point Clouds

Download Full-text

A Coarse-to-Fine Method for Estimating the Axis Pose Based on 3D Point Clouds in Robotic Cylindrical Shaft-in-Hole Assembly

Sensors ◽

10.3390/s21124064 ◽

2021 ◽

Vol 21 (12) ◽

pp. 4064

Author(s):

Can Li ◽

Ping Chen ◽

Xin Xu ◽

Xinyu Wang ◽

Aijun Yin

Keyword(s):

Pose Estimation ◽

Point Clouds ◽

Geometric Constraints ◽

Admittance Control ◽

3D Vision ◽

Axis Orientation ◽

3D Point Clouds ◽

Object Pose Estimation ◽

Traditional Approaches ◽

Coarse To Fine

In this work, we propose a novel coarse-to-fine method for object pose estimation coupled with admittance control to promote robotic shaft-in-hole assembly. Considering that traditional approaches to locate the hole by force sensing are time-consuming, we employ 3D vision to estimate the axis pose of the hole. Thus, robots can locate the target hole in both position and orientation and enable the shaft to move into the hole along the axis orientation. In our method, first, the raw point cloud of a hole is processed to acquire the keypoints. Then, a coarse axis is extracted according to the geometric constraints between the surface normals and axis. Lastly, axis refinement is performed on the coarse axis to achieve higher precision. Practical experiments verified the effectiveness of the axis pose estimation. The assembly strategy composed of axis pose estimation and admittance control was effectively applied to the robotic shaft-in-hole assembly.

Download Full-text

Parsing of Urban Facades from 3D Point Clouds Based on a Novel Multi-View Domain

Photogrammetric Engineering & Remote Sensing ◽

10.14358/pers.87.4.283 ◽

2021 ◽

Vol 87 (4) ◽

pp. 283-293

Author(s):

Wei Wang ◽

Yuan Xu ◽

Yingchao Ren ◽

Gang Wang

Keyword(s):

Deep Learning ◽

Prior Knowledge ◽

Performance Improvement ◽

Data Distribution ◽

Point Clouds ◽

Learning Models ◽

Data Set ◽

3D Point Clouds ◽

Segmentation Accuracy ◽

The Mean

Recently, performance improvement in facade parsing from 3D point clouds has been brought about by designing more complex network structures, which cost huge computing resources and do not take full advantage of prior knowledge of facade structure. Instead, from the perspective of data distribution, we construct a new hierarchical mesh multi-view data domain based on the characteristics of facade objects to achieve fusion of deep-learning models and prior knowledge, thereby significantly improving segmentation accuracy. We comprehensively evaluate the current mainstream method on the RueMonge 2014 data set and demonstrate the superiority of our method. The mean intersection-over-union index on the facade-parsing task reached 76.41%, which is 2.75% higher than the current best result. In addition, through comparative experiments, the reasons for the performance improvement of the proposed method are further analyzed.

Download Full-text

FWNet: Semantic Segmentation for Full-Waveform LiDAR Data Using Deep Learning

Sensors ◽

10.3390/s20123568 ◽

2020 ◽

Vol 20 (12) ◽

pp. 3568 ◽

Cited By ~ 2

Author(s):

Takayuki Shinohara ◽

Haoyi Xiu ◽

Masashi Matsuoka

Keyword(s):

Deep Learning ◽

Semantic Segmentation ◽

Point Clouds ◽

Lidar Data ◽

Global Features ◽

Waveform Data ◽

Full Waveform ◽

3D Point Clouds ◽

Waveform Lidar ◽

Full Waveform Lidar

In the computer vision field, many 3D deep learning models that directly manage 3D point clouds (proposed after PointNet) have been published. Moreover, deep learning-based-techniques have demonstrated state-of-the-art performance for supervised learning tasks on 3D point cloud data, such as classification and segmentation tasks for open datasets in competitions. Furthermore, many researchers have attempted to apply these deep learning-based techniques to 3D point clouds observed by aerial laser scanners (ALSs). However, most of these studies were developed for 3D point clouds without radiometric information. In this paper, we investigate the possibility of using a deep learning method to solve the semantic segmentation task of airborne full-waveform light detection and ranging (lidar) data that consists of geometric information and radiometric waveform data. Thus, we propose a data-driven semantic segmentation model called the full-waveform network (FWNet), which handles the waveform of full-waveform lidar data without any conversion process, such as projection onto a 2D grid or calculating handcrafted features. Our FWNet is based on a PointNet-based architecture, which can extract the local and global features of each input waveform data, along with its corresponding geographical coordinates. Subsequently, the classifier consists of 1D convolutional operational layers, which predict the class vector corresponding to the input waveform from the extracted local and global features. Our trained FWNet achieved higher scores in its recall, precision, and F1 score for unseen test data—higher scores than those of previously proposed methods in full-waveform lidar data analysis domain. Specifically, our FWNet achieved a mean recall of 0.73, a mean precision of 0.81, and a mean F1 score of 0.76. We further performed an ablation study, that is assessing the effectiveness of our proposed method, of the above-mentioned metric. Moreover, we investigated the effectiveness of our PointNet based local and global feature extraction method using the visualization of the feature vector. In this way, we have shown that our network for local and global feature extraction allows training with semantic segmentation without requiring expert knowledge on full-waveform lidar data or translation into 2D images or voxels.

Download Full-text

An analytical and a Deep Learning model for solving the inverse kinematic problem of an industrial parallel robot

Computers & Industrial Engineering ◽

10.1016/j.cie.2020.106682 ◽

2020 ◽

pp. 106682

Author(s):

Juan S. Toquica ◽

Patrícia S. Oliveira ◽

Witenberg S.R. Souza ◽

José Maurício S.T. Motta ◽

Díbio L. Borges

Keyword(s):

Deep Learning ◽

Parallel Robot ◽

Learning Model ◽

Inverse Kinematic ◽

Inverse Kinematic Problem ◽

Deep Learning Model

Download Full-text

Comparing Machine and Deep Learning Methods for Large 3D Heritage Semantic Segmentation

ISPRS International Journal of Geo-Information ◽

10.3390/ijgi9090535 ◽

2020 ◽

Vol 9 (9) ◽

pp. 535

Author(s):

Francesca Matrone ◽

Eleonora Grilli ◽

Massimo Martini ◽

Marina Paolanti ◽

Roberto Pierdicca ◽

...

Keyword(s):

Deep Learning ◽

Cultural Heritage ◽

Laser Scanning ◽

Semantic Segmentation ◽

Point Clouds ◽

Classification Algorithms ◽

Learning Methods ◽

3D Point Clouds ◽

The Subject

In recent years semantic segmentation of 3D point clouds has been an argument that involves different fields of application. Cultural heritage scenarios have become the subject of this study mainly thanks to the development of photogrammetry and laser scanning techniques. Classification algorithms based on machine and deep learning methods allow to process huge amounts of data as 3D point clouds. In this context, the aim of this paper is to make a comparison between machine and deep learning methods for large 3D cultural heritage classification. Then, considering the best performances of both techniques, it proposes an architecture named DGCNN-Mod+3Dfeat that combines the positive aspects and advantages of these two methodologies for semantic segmentation of cultural heritage point clouds. To demonstrate the validity of our idea, several experiments from the ArCH benchmark are reported and commented.

Download Full-text

Deep Learning Applied to Vegetation Identification and Removal Using Multidimensional Aerial Data

Sensors ◽

10.3390/s20216187 ◽

2020 ◽

Vol 20 (21) ◽

pp. 6187

Author(s):

Milena F. Pinto ◽

Aurelio G. Melo ◽

Leonardo M. Honório ◽

André L. M. Marcato ◽

André G. S. Conceição ◽

...

Keyword(s):

Deep Learning ◽

Three Dimensional ◽

Point Clouds ◽

Color Filter ◽

Structural Problems ◽

3D Point Clouds ◽

Common Resource ◽

Complete Inspection ◽

Colored Point ◽

Covering Vegetation

When performing structural inspection, the generation of three-dimensional (3D) point clouds is a common resource. Those are usually generated from photogrammetry or through laser scan techniques. However, a significant drawback for complete inspection is the presence of covering vegetation, hiding possible structural problems, and making difficult the acquisition of proper object surfaces in order to provide a reliable diagnostic. Therefore, this research’s main contribution is developing an effective vegetation removal methodology through the use of a deep learning structure that is capable of identifying and extracting covering vegetation in 3D point clouds. The proposed approach uses pre and post-processing filtering stages that take advantage of colored point clouds, if they are available, or operate independently. The results showed high classification accuracy and good effectiveness when compared with similar methods in the literature. After this step, if color is available, then a color filter is applied, enhancing the results obtained. Besides, the results are analyzed in light of real Structure From Motion (SFM) reconstruction data, which further validates the proposed method. This research also presented a colored point cloud library of bushes built for the work used by other studies in the field.

Download Full-text

Fast template matching and pose estimation in 3D point clouds

Computers & Graphics ◽

10.1016/j.cag.2018.12.007 ◽

2019 ◽

Vol 79 ◽

pp. 36-45 ◽

Cited By ~ 3

Author(s):

Richard Vock ◽

Alexander Dieckmann ◽

Sebastian Ochmann ◽

Reinhard Klein

Keyword(s):

Pose Estimation ◽

Template Matching ◽

Point Clouds ◽

3D Point Clouds

Download Full-text

Point Cloud Semantic Segmentation Using a Deep Learning Framework for Cultural Heritage

Remote Sensing ◽

10.3390/rs12061005 ◽

2020 ◽

Vol 12 (6) ◽

pp. 1005 ◽

Cited By ~ 7

Author(s):

Roberto Pierdicca ◽

Marina Paolanti ◽

Francesca Matrone ◽

Massimo Martini ◽

Christian Morbidoni ◽

...

Keyword(s):

Deep Learning ◽

Cultural Heritage ◽

Point Cloud ◽

Semantic Segmentation ◽

Point Clouds ◽

Information Modeling ◽

Dynamic Graph ◽

Historical Building ◽

Architectural Elements ◽

3D Point Clouds

In the Digital Cultural Heritage (DCH) domain, the semantic segmentation of 3D Point Clouds with Deep Learning (DL) techniques can help to recognize historical architectural elements, at an adequate level of detail, and thus speed up the process of modeling of historical buildings for developing BIM models from survey data, referred to as HBIM (Historical Building Information Modeling). In this paper, we propose a DL framework for Point Cloud segmentation, which employs an improved DGCNN (Dynamic Graph Convolutional Neural Network) by adding meaningful features such as normal and colour. The approach has been applied to a newly collected DCH Dataset which is publicy available: ArCH (Architectural Cultural Heritage) Dataset. This dataset comprises 11 labeled points clouds, derived from the union of several single scans or from the integration of the latter with photogrammetric surveys. The involved scenes are both indoor and outdoor, with churches, chapels, cloisters, porticoes and loggias covered by a variety of vaults and beared by many different types of columns. They belong to different historical periods and different styles, in order to make the dataset the least possible uniform and homogeneous (in the repetition of the architectural elements) and the results as general as possible. The experiments yield high accuracy, demonstrating the effectiveness and suitability of the proposed approach.

Download Full-text

Simulation and deep learning on point clouds for robot grasping

Assembly Automation ◽

10.1108/aa-07-2020-0096 ◽

2021 ◽

Vol ahead-of-print (ahead-of-print) ◽

Author(s):

Zhengtuo Wang ◽

Yuetong Xu ◽

Guanhua Xu ◽

Jianzhong Fu ◽

Jiongyan Yu ◽

...

Keyword(s):

Deep Learning ◽

Pose Estimation ◽

Point Clouds ◽

Estimation Algorithm ◽

Training Data ◽

Learning Method ◽

Data Set ◽

Content Type ◽

Experimental Platform ◽

Robot Grasping

Purpose In this work, the authors aim to provide a set of convenient methods for generating training data, and then develop a deep learning method based on point clouds to estimate the pose of target for robot grasping. Design/methodology/approach This work presents a deep learning method PointSimGrasp on point clouds for robot grasping. In PointSimGrasp, a point cloud emulator is introduced to generate training data and a pose estimation algorithm, which, based on deep learning, is designed. After trained with the emulation data set, the pose estimation algorithm could estimate the pose of target. Findings In experiment part, an experimental platform is built, which contains a six-axis industrial robot, a binocular structured-light sensor and a base platform with adjustable inclination. A data set that contains three subsets is set up on the experimental platform. After trained with the emulation data set, the PointSimGrasp is tested on the experimental data set, and an average translation error of about 2–3 mm and an average rotation error of about 2–5 degrees are obtained. Originality/value The contributions are as follows: first, a deep learning method on point clouds is proposed to estimate 6D pose of target; second, a convenient training method for pose estimation algorithm is presented and a point cloud emulator is introduced to generate training data; finally, an experimental platform is built, and the PointSimGrasp is tested on the platform.

Download Full-text