Progress of visual depth estimation and point cloud mapping

2021 ◽  
Vol 36 (6) ◽  
pp. 896-911
Author(s):  
Yuan-feng CHEN ◽  
1993 ◽  
Vol 31 (2) ◽  
pp. 125-127 ◽  
Author(s):  
Pieter Jan Stappers ◽  
Patrick E. Waller

2020 ◽  
Vol 34 (07) ◽  
pp. 11856-11864
Author(s):  
Quang-Hieu Pham ◽  
Mikaela Angelina Uy ◽  
Binh-Son Hua ◽  
Duc Thanh Nguyen ◽  
Gemma Roig ◽  
...  

In this work, we present a novel method to learn a local cross-domain descriptor for 2D image and 3D point cloud matching. Our proposed method is a dual auto-encoder neural network that maps 2D and 3D input into a shared latent space representation. We show that such local cross-domain descriptors in the shared embedding are more discriminative than those obtained from individual training in 2D and 3D domains. To facilitate the training process, we built a new dataset by collecting ≈ 1.4 millions of 2D-3D correspondences with various lighting conditions and settings from publicly available RGB-D scenes. Our descriptor is evaluated in three main experiments: 2D-3D matching, cross-domain retrieval, and sparse-to-dense depth estimation. Experimental results confirm the robustness of our approach as well as its competitive performance not only in solving cross-domain tasks but also in being able to generalize to solve sole 2D and 3D tasks. Our dataset and code are released publicly at https://hkust-vgd.github.io/lcd.


2021 ◽  
Vol 13 (22) ◽  
pp. 4569
Author(s):  
Liyang Zhou ◽  
Zhuang Zhang ◽  
Hanqing Jiang ◽  
Han Sun ◽  
Hujun Bao ◽  
...  

This paper presents an accurate and robust dense 3D reconstruction system for detail preserving surface modeling of large-scale scenes from multi-view images, which we named DP-MVS. Our system performs high-quality large-scale dense reconstruction, which preserves geometric details for thin structures, especially for linear objects. Our framework begins with a sparse reconstruction carried out by an incremental Structure-from-Motion. Based on the reconstructed sparse map, a novel detail preserving PatchMatch approach is applied for depth estimation of each image view. The estimated depth maps of multiple views are then fused to a dense point cloud in a memory-efficient way, followed by a detail-aware surface meshing method to extract the final surface mesh of the captured scene. Experiments on ETH3D benchmark show that the proposed method outperforms other state-of-the-art methods on F1-score, with the running time more than 4 times faster. More experiments on large-scale photo collections demonstrate the effectiveness of the proposed framework for large-scale scene reconstruction in terms of accuracy, completeness, memory saving, and time efficiency.


2021 ◽  
Vol 10 (3) ◽  
pp. 144
Author(s):  
Asmamaw A Gebrehiwot ◽  
Leila Hashemi-Beni

Flood occurrence is increasing due to the expansion of urbanization and extreme weather like hurricanes; hence, research on methods of inundation monitoring and mapping has increased to reduce the severe impacts of flood disasters. This research studies and compares two methods for inundation depth estimation using UAV images and topographic data. The methods consist of three main stages: (1) extracting flooded areas and create 2D inundation polygons using deep learning; (2) reconstructing 3D water surface using the polygons and topographic data; and (3) deriving a water depth map using the 3D reconstructed water surface and a pre-flood DEM. The two methods are different at reconstructing the 3D water surface (stage 2). The first method uses structure from motion (SfM) for creating a point cloud of the area from overlapping UAV images, and the water polygons resulted from stage 1 is applied for water point cloud classification. While the second method reconstructs the water surface by intersecting the water polygons and a pre-flood DEM created using the pre-flood LiDAR data. We evaluate the proposed methods for inundation depth mapping over the Town of Princeville during a flooding event during Hurricane Matthew. The methods are compared and validated using the USGS gauge water level data acquired during the flood event. The RMSEs for water depth using the SfM method and integrated method based on deep learning and DEM were 0.34m and 0.26m, respectively.


2019 ◽  
Vol 2019 ◽  
pp. 1-10
Author(s):  
Jianzhong Yuan ◽  
Wujie Zhou ◽  
Sijia Lv ◽  
Yuzhen Chen

In order to obtain the distances between the surrounding objects and the vehicle in the traffic scene in front of the vehicle, a monocular visual depth estimation method based on the depthwise separable convolutional neural network is proposed in this study. First, features containing shallow depth information were extracted from the RGB images using the convolution layers and maximum pooling layers. Subsampling operations were also performed on these images. Subsequently, features containing advanced depth information were extracted using a block based on an ensemble of convolution layers and a block based on depth separable convolution layers. The output from all different blocks is combined afterwards. Finally, transposed convolution layers were used for upsampling the feature maps to the same size with the original RGB image. During the upsampling process, skip connections were used to merge the features containing shallow depth information that was obtained from the convolution operation through the depthwise separable convolution layers. The depthwise separable convolution layers can provide more accurate depth information features for estimating the monocular visual depth. At the same time, they require reduced computational cost and fewer parameter numbers while providing a similar level (or slightly better) computing performance. Integrating multiple simple convolutions into a block not only increases the overall depth of the neural network but also enables a more accurate extraction of the advanced features in the neural network. Combining the output from multiple blocks can prevent the loss of features containing important depth information. The testing results show that the depthwise separable convolutional neural network provides a superior performance than the other monocular visual depth estimation methods. Therefore, applying depthwise separable convolution layers in the neural network is a more effective and accurate approach for estimating the visual depth.


Author(s):  
Cem Karaoguz ◽  
Thomas H Weisswange ◽  
Tobias Rodemann ◽  
Britta Wrede ◽  
Constantin A Rothkopf

Sign in / Sign up

Export Citation Format

Share Document