The Fusion Strategy of 2D and 3D Information Based on Deep Learning: A Review

Jianghong Zhao; Yinrui Wang; Yuee Cao; Ming Guo; Xianfeng Huang; Ruiju Zhang; Xintong Dou; Xinyu Niu; Yuanyuan Cui; Jun Wang

doi:10.3390/rs13204029

The Fusion Strategy of 2D and 3D Information Based on Deep Learning: A Review

Remote Sensing ◽

10.3390/rs13204029 ◽

2021 ◽

Vol 13 (20) ◽

pp. 4029

Author(s):

Jianghong Zhao ◽

Yinrui Wang ◽

Yuee Cao ◽

Ming Guo ◽

Xianfeng Huang ◽

...

Keyword(s):

Deep Learning ◽

Information Fusion ◽

Information Integration ◽

Point Clouds ◽

Future Research ◽

Accuracy Improvement ◽

3D Point Clouds ◽

3D Information ◽

2D And 3D ◽

Research Domain

Recently, researchers have realized a number of achievements involving deep-learning-based neural networks for the tasks of segmentation and detection based on 2D images, 3D point clouds, etc. Using 2D and 3D information fusion for the advantages of compensation and accuracy improvement has become a hot research topic. However, there are no critical reviews focusing on the fusion strategies of 2D and 3D information integration based on various data for segmentation and detection, which are the basic tasks of computer vision. To boost the development of this research domain, the existing representative fusion strategies are collected, introduced, categorized, and summarized in this paper. In addition, the general structures of different kinds of fusion strategies were firstly abstracted and categorized, which may inspire researchers. Moreover, according to the methods included in this paper, the 2D information and 3D information of different methods come from various kinds of data. Furthermore, suitable datasets are introduced and comparatively summarized to support the relative research. Last but not least, we put forward some open challenges and promising directions for future research.

Download Full-text

Generating Bird’s Eye View from Egocentric RGB Videos

Wireless Communications and Mobile Computing ◽

10.1155/2021/7479473 ◽

2021 ◽

Vol 2021 ◽

pp. 1-11

Author(s):

Vanita Jain ◽

Qiming Wu ◽

Shivam Grover ◽

Kshitij Sidana ◽

Gopal Chaudhary ◽

...

Keyword(s):

Deep Learning ◽

State Of The Art ◽

Point Clouds ◽

Two Dimensions ◽

The Other ◽

Future Research ◽

3D Point Clouds ◽

Image Translation ◽

Rgb Images ◽

Path Prediction

In this paper, we present a method for generating bird’s eye video from egocentric RGB videos. Working with egocentric views is tricky since such the view is highly warped and prone to occlusions. On the other hand, a bird’s eye view has a consistent scaling in at least the two dimensions it shows. Moreover, most of the state-of-the-art systems for tasks such as path prediction are built for bird’s eye views of the subjects. We present a deep learning-based approach that transfers the egocentric RGB images captured from a dashcam of a car to bird’s eye view. This is a task of view translation, and we perform two experiments. The first one uses an image-to-image translation method, and the other uses a video-to-video translation. We compare the results of our work with homographic transformation, and our SSIM values are better by a margin of 77% and 14.4%, and the RMSE errors are lower by 40% and 14.6% for image-to-image translation and video-to-video translation, respectively. We also visually show the efficacy and limitations of each method with helpful insights for future research. Compared to previous works that use homography and LIDAR for 3D point clouds, our work is more generalizable and does not require any expensive equipment.

Download Full-text

Comparison of Depth Camera and Terrestrial Laser Scanner in Monitoring Structural Deflections

Sensors ◽

10.3390/s21010201 ◽

2020 ◽

Vol 21 (1) ◽

pp. 201

Author(s):

Michael Bekele Maru ◽

Donghwan Lee ◽

Kassahun Demissie Tola ◽

Seunghee Park

Keyword(s):

Optical Sensors ◽

Point Cloud ◽

Three Dimensional ◽

Laser Scanner ◽

Point Clouds ◽

Depth Camera ◽

Terrestrial Laser Scanner ◽

Cloud Data ◽

3D Point Clouds ◽

3D Information

Modeling a structure in the virtual world using three-dimensional (3D) information enhances our understanding, while also aiding in the visualization, of how a structure reacts to any disturbance. Generally, 3D point clouds are used for determining structural behavioral changes. Light detection and ranging (LiDAR) is one of the crucial ways by which a 3D point cloud dataset can be generated. Additionally, 3D cameras are commonly used to develop a point cloud containing many points on the external surface of an object around it. The main objective of this study was to compare the performance of optical sensors, namely a depth camera (DC) and terrestrial laser scanner (TLS) in estimating structural deflection. We also utilized bilateral filtering techniques, which are commonly used in image processing, on the point cloud data for enhancing their accuracy and increasing the application prospects of these sensors in structure health monitoring. The results from these sensors were validated by comparing them with the outputs from a linear variable differential transformer sensor, which was mounted on the beam during an indoor experiment. The results showed that the datasets obtained from both the sensors were acceptable for nominal deflections of 3 mm and above because the error range was less than ±10%. However, the result obtained from the TLS were better than those obtained from the DC.

Download Full-text

Parsing of Urban Facades from 3D Point Clouds Based on a Novel Multi-View Domain

Photogrammetric Engineering & Remote Sensing ◽

10.14358/pers.87.4.283 ◽

2021 ◽

Vol 87 (4) ◽

pp. 283-293

Author(s):

Wei Wang ◽

Yuan Xu ◽

Yingchao Ren ◽

Gang Wang

Keyword(s):

Deep Learning ◽

Prior Knowledge ◽

Performance Improvement ◽

Data Distribution ◽

Point Clouds ◽

Learning Models ◽

Data Set ◽

3D Point Clouds ◽

Segmentation Accuracy ◽

The Mean

Recently, performance improvement in facade parsing from 3D point clouds has been brought about by designing more complex network structures, which cost huge computing resources and do not take full advantage of prior knowledge of facade structure. Instead, from the perspective of data distribution, we construct a new hierarchical mesh multi-view data domain based on the characteristics of facade objects to achieve fusion of deep-learning models and prior knowledge, thereby significantly improving segmentation accuracy. We comprehensively evaluate the current mainstream method on the RueMonge 2014 data set and demonstrate the superiority of our method. The mean intersection-over-union index on the facade-parsing task reached 76.41%, which is 2.75% higher than the current best result. In addition, through comparative experiments, the reasons for the performance improvement of the proposed method are further analyzed.

Download Full-text

FWNet: Semantic Segmentation for Full-Waveform LiDAR Data Using Deep Learning

Sensors ◽

10.3390/s20123568 ◽

2020 ◽

Vol 20 (12) ◽

pp. 3568 ◽

Cited By ~ 2

Author(s):

Takayuki Shinohara ◽

Haoyi Xiu ◽

Masashi Matsuoka

Keyword(s):

Deep Learning ◽

Semantic Segmentation ◽

Point Clouds ◽

Lidar Data ◽

Global Features ◽

Waveform Data ◽

Full Waveform ◽

3D Point Clouds ◽

Waveform Lidar ◽

Full Waveform Lidar

In the computer vision field, many 3D deep learning models that directly manage 3D point clouds (proposed after PointNet) have been published. Moreover, deep learning-based-techniques have demonstrated state-of-the-art performance for supervised learning tasks on 3D point cloud data, such as classification and segmentation tasks for open datasets in competitions. Furthermore, many researchers have attempted to apply these deep learning-based techniques to 3D point clouds observed by aerial laser scanners (ALSs). However, most of these studies were developed for 3D point clouds without radiometric information. In this paper, we investigate the possibility of using a deep learning method to solve the semantic segmentation task of airborne full-waveform light detection and ranging (lidar) data that consists of geometric information and radiometric waveform data. Thus, we propose a data-driven semantic segmentation model called the full-waveform network (FWNet), which handles the waveform of full-waveform lidar data without any conversion process, such as projection onto a 2D grid or calculating handcrafted features. Our FWNet is based on a PointNet-based architecture, which can extract the local and global features of each input waveform data, along with its corresponding geographical coordinates. Subsequently, the classifier consists of 1D convolutional operational layers, which predict the class vector corresponding to the input waveform from the extracted local and global features. Our trained FWNet achieved higher scores in its recall, precision, and F1 score for unseen test data—higher scores than those of previously proposed methods in full-waveform lidar data analysis domain. Specifically, our FWNet achieved a mean recall of 0.73, a mean precision of 0.81, and a mean F1 score of 0.76. We further performed an ablation study, that is assessing the effectiveness of our proposed method, of the above-mentioned metric. Moreover, we investigated the effectiveness of our PointNet based local and global feature extraction method using the visualization of the feature vector. In this way, we have shown that our network for local and global feature extraction allows training with semantic segmentation without requiring expert knowledge on full-waveform lidar data or translation into 2D images or voxels.

Download Full-text

Comparing Machine and Deep Learning Methods for Large 3D Heritage Semantic Segmentation

ISPRS International Journal of Geo-Information ◽

10.3390/ijgi9090535 ◽

2020 ◽

Vol 9 (9) ◽

pp. 535

Author(s):

Francesca Matrone ◽

Eleonora Grilli ◽

Massimo Martini ◽

Marina Paolanti ◽

Roberto Pierdicca ◽

...

Keyword(s):

Deep Learning ◽

Cultural Heritage ◽

Laser Scanning ◽

Semantic Segmentation ◽

Point Clouds ◽

Classification Algorithms ◽

Learning Methods ◽

3D Point Clouds ◽

The Subject

In recent years semantic segmentation of 3D point clouds has been an argument that involves different fields of application. Cultural heritage scenarios have become the subject of this study mainly thanks to the development of photogrammetry and laser scanning techniques. Classification algorithms based on machine and deep learning methods allow to process huge amounts of data as 3D point clouds. In this context, the aim of this paper is to make a comparison between machine and deep learning methods for large 3D cultural heritage classification. Then, considering the best performances of both techniques, it proposes an architecture named DGCNN-Mod+3Dfeat that combines the positive aspects and advantages of these two methodologies for semantic segmentation of cultural heritage point clouds. To demonstrate the validity of our idea, several experiments from the ArCH benchmark are reported and commented.

Download Full-text

Deep Learning Applied to Vegetation Identification and Removal Using Multidimensional Aerial Data

Sensors ◽

10.3390/s20216187 ◽

2020 ◽

Vol 20 (21) ◽

pp. 6187

Author(s):

Milena F. Pinto ◽

Aurelio G. Melo ◽

Leonardo M. Honório ◽

André L. M. Marcato ◽

André G. S. Conceição ◽

...

Keyword(s):

Deep Learning ◽

Three Dimensional ◽

Point Clouds ◽

Color Filter ◽

Structural Problems ◽

3D Point Clouds ◽

Common Resource ◽

Complete Inspection ◽

Colored Point ◽

Covering Vegetation

When performing structural inspection, the generation of three-dimensional (3D) point clouds is a common resource. Those are usually generated from photogrammetry or through laser scan techniques. However, a significant drawback for complete inspection is the presence of covering vegetation, hiding possible structural problems, and making difficult the acquisition of proper object surfaces in order to provide a reliable diagnostic. Therefore, this research’s main contribution is developing an effective vegetation removal methodology through the use of a deep learning structure that is capable of identifying and extracting covering vegetation in 3D point clouds. The proposed approach uses pre and post-processing filtering stages that take advantage of colored point clouds, if they are available, or operate independently. The results showed high classification accuracy and good effectiveness when compared with similar methods in the literature. After this step, if color is available, then a color filter is applied, enhancing the results obtained. Besides, the results are analyzed in light of real Structure From Motion (SFM) reconstruction data, which further validates the proposed method. This research also presented a colored point cloud library of bushes built for the work used by other studies in the field.

Download Full-text

AN INTEGRATED VISION SYSTEM FOR ALV NAVIGATION

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s0218001400000593 ◽

2000 ◽

Vol 14 (07) ◽

pp. 929-940 ◽

Cited By ~ 1

Author(s):

XIUQING YE ◽

JILIN LIU ◽

WEIKANG GU

Keyword(s):

Information Fusion ◽

Vision System ◽

Feasible Region ◽

Vehicle Navigation ◽

3D Vision ◽

Land Vehicle ◽

3D Information ◽

2D And 3D

In this paper an integrated vision system for autonomous land vehicle is described. The vision system includes 2D and 3D vision modules and information fusion module. The task of 2D vision is to provide the physical and geometry information of road, and the task of 3D vision system is to detect the obstacles in the surrounding. Fusion module combines 2D and 3D information to generate a feasible region provided for vehicle navigation.

Download Full-text

EVALUATING UNMANNED AERIAL PLATFORMS FOR CULTURAL HERITAGE LARGE SCALE MAPPING

ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences ◽

10.5194/isprsarchives-xli-b5-355-2016 ◽

2016 ◽

Vol XLI-B5 ◽

pp. 355-362 ◽

Cited By ~ 3

Author(s):

A. Georgopoulos ◽

C. Oikonomou ◽

E. Adamopoulos ◽

E. K. Stathopoulou

Keyword(s):

Cultural Heritage ◽

Optical Sensors ◽

Large Scale ◽

Point Clouds ◽

Short Review ◽

Data Set ◽

Heritage Sites ◽

3D Point Clouds ◽

Aerial Platforms ◽

3D Information

When it comes to large scale mapping of limited areas especially for cultural heritage sites, things become critical. Optical and non-optical sensors are developed to such sizes and weights that can be lifted by such platforms, like e.g. LiDAR units. At the same time there is an increase in emphasis on solutions that enable users to get access to 3D information faster and cheaper. Considering the multitude of platforms, cameras and the advancement of algorithms in conjunction with the increase of available computing power this challenge should and indeed is further investigated. In this paper a short review of the UAS technologies today is attempted. A discussion follows as to their applicability and advantages, depending on their specifications, which vary immensely. The on-board cameras available are also compared and evaluated for large scale mapping. Furthermore a thorough analysis, review and experimentation with different software implementations of Structure from Motion and Multiple View Stereo algorithms, able to process such dense and mostly unordered sequence of digital images is also conducted and presented. As test data set, we use a rich optical and thermal data set from both fixed wing and multi-rotor platforms over an archaeological excavation with adverse height variations and using different cameras. Dense 3D point clouds, digital terrain models and orthophotos have been produced and evaluated for their radiometric as well as metric qualities.

Download Full-text

Deformation monitoring using laser scanned point clouds and BIM

MATEC Web of Conferences ◽

10.1051/matecconf/201824501002 ◽

2018 ◽

Vol 245 ◽

pp. 01002 ◽

Cited By ~ 4

Author(s):

Vladimir Badenko ◽

Dmitry Volgin ◽

Sergey Lytkin

Keyword(s):

Laser Scanning ◽

Point Clouds ◽

Deformation Monitoring ◽

Future Research ◽

Cloud Processing ◽

3D Point Clouds ◽

Point Cloud Processing ◽

Informational Model ◽

Technology Gaps

Laser scanning is an essential method for monitoring of the operation of buildings or structures. It involves creating as-is BIM from point clouds obtained from laser scanning. In this article we present our workflow for the generation of information model from 3D point clouds of concrete tetrapod blocks on navigable structure C-1. Point cloud processing method for making informational model for long term monitoring is described. As a result of the research BIM model with each tetrapod was created for deformational monitoring in the comparison with next year model. Finally, we identify and discuss technology gaps that need to be addressed in future research.

Download Full-text

Point Cloud Semantic Segmentation Using a Deep Learning Framework for Cultural Heritage

Remote Sensing ◽

10.3390/rs12061005 ◽

2020 ◽

Vol 12 (6) ◽

pp. 1005 ◽

Cited By ~ 7

Author(s):

Roberto Pierdicca ◽

Marina Paolanti ◽

Francesca Matrone ◽

Massimo Martini ◽

Christian Morbidoni ◽

...

Keyword(s):

Deep Learning ◽

Cultural Heritage ◽

Point Cloud ◽

Semantic Segmentation ◽

Point Clouds ◽

Information Modeling ◽

Dynamic Graph ◽

Historical Building ◽

Architectural Elements ◽

3D Point Clouds

In the Digital Cultural Heritage (DCH) domain, the semantic segmentation of 3D Point Clouds with Deep Learning (DL) techniques can help to recognize historical architectural elements, at an adequate level of detail, and thus speed up the process of modeling of historical buildings for developing BIM models from survey data, referred to as HBIM (Historical Building Information Modeling). In this paper, we propose a DL framework for Point Cloud segmentation, which employs an improved DGCNN (Dynamic Graph Convolutional Neural Network) by adding meaningful features such as normal and colour. The approach has been applied to a newly collected DCH Dataset which is publicy available: ArCH (Architectural Cultural Heritage) Dataset. This dataset comprises 11 labeled points clouds, derived from the union of several single scans or from the integration of the latter with photogrammetric surveys. The involved scenes are both indoor and outdoor, with churches, chapels, cloisters, porticoes and loggias covered by a variety of vaults and beared by many different types of columns. They belong to different historical periods and different styles, in order to make the dataset the least possible uniform and homogeneous (in the repetition of the architectural elements) and the results as general as possible. The experiments yield high accuracy, demonstrating the effectiveness and suitability of the proposed approach.

Download Full-text