scholarly journals View Synthesis: LiDAR Camera versus Depth Estimation

Author(s):  
Yupeng Xie ◽  
Sarah Fachada ◽  
Daniele Bonatto ◽  
Mehrdad Teratani ◽  
Gauthier Lafruit

Depth-Image-Based Rendering (DIBR) can synthesize a virtual view image from a set of multiview images and corresponding depth maps. However, this requires an accurate depth map estimation that incurs a high compu- tational cost over several minutes per frame in DERS (MPEG-I’s Depth Estimation Reference Software) even by using a high-class computer. LiDAR cameras can thus be an alternative solution to DERS in real-time DIBR ap- plications. We compare the quality of a low-cost LiDAR camera, the Intel Realsense LiDAR L515 calibrated and configured adequately, with DERS using MPEG-I’s Reference View Synthesizer (RVS). In IV-PSNR, the LiDAR camera reaches 32.2dB view synthesis quality with a 15cm camera baseline and 40.3dB with a 2cm baseline. Though DERS outperforms the LiDAR camera with 4.2dB, the latter provides a better quality-performance trade- off. However, visual inspection demonstrates that LiDAR’s virtual views have even slightly higher quality than with DERS in most tested low-texture scene areas, except for object borders. Overall, we highly recommend using LiDAR cameras over advanced depth estimation methods (like DERS) in real-time DIBR applications. Neverthe- less, this requires delicate calibration with multiple tools further exposed in the paper.

2020 ◽  
Vol 10 (5) ◽  
pp. 1562 ◽  
Author(s):  
Xiaodong Chen ◽  
Haitao Liang ◽  
Huaiyuan Xu ◽  
Siyu Ren ◽  
Huaiyu Cai ◽  
...  

Depth image-based rendering (DIBR) plays an important role in 3D video and free viewpoint video synthesis. However, artifacts might occur in the synthesized view due to viewpoint changes and stereo depth estimation errors. Holes are usually out-of-field regions and disocclusions, and filling them appropriately becomes a challenge. In this paper, a virtual view synthesis approach based on asymmetric bidirectional DIBR is proposed. A depth image preprocessing method is applied to detect and correct unreliable depth values around the foreground edges. For the primary view, all pixels are warped to the virtual view by the modified DIBR method. For the auxiliary view, only the selected regions are warped, which contain the contents that are not visible in the primary view. This approach reduces the computational cost and prevents irrelevant foreground pixels from being warped to the holes. During the merging process, a color correction approach is introduced to make the result appear more natural. In addition, a depth-guided inpainting method is proposed to handle the remaining holes in the merged image. Experimental results show that, compared with bidirectional DIBR, the proposed rendering method can reduce about 37% rendering time and achieve 97% hole reduction. In terms of visual quality and objective evaluation, our approach performs better than the previous methods.


Electronics ◽  
2020 ◽  
Vol 9 (6) ◽  
pp. 906
Author(s):  
Hui-Yu Huang ◽  
Shao-Yu Huang

The recent emergence of three-dimensional (3D) movies and 3D television (TV) indicates an increasing interest in 3D content. Stereoscopic displays have enabled visual experiences to be enhanced, allowing the world to be viewed in 3D. Virtual view synthesis is the key technology to present 3D content, and depth image-based rendering (DIBR) is a classic virtual view synthesis method. With a texture image and its corresponding depth map, a virtual view can be generated using the DIBR technique. The depth and camera parameters are used to project the entire pixel in the image to the 3D world coordinate system. The results in the world coordinates are then reprojected into the virtual view, based on 3D warping. However, these projections will result in cracks (holes). Hence, we herein propose a new method of DIBR for free viewpoint videos to solve the hole problem due to these projection processes. First, the depth map is preprocessed to reduce the number of holes, which does not produce large-scale geometric distortions; subsequently, improved 3D warping projection is performed collectively to create the virtual view. A median filter is used to filter the hole regions in the virtual view, followed by 3D inverse warping blending to remove the holes. Next, brightness adjustment and adaptive image blending are performed. Finally, the synthesized virtual view is obtained using the inpainting method. Experimental results verify that our proposed method can produce a pleasant visibility of the synthetized virtual view, maintain a high peak signal-to-noise ratio (PSNR) value, and efficiently decrease execution time compared with state-of-the-art methods.


Entropy ◽  
2021 ◽  
Vol 23 (5) ◽  
pp. 546
Author(s):  
Zhenni Li ◽  
Haoyi Sun ◽  
Yuliang Gao ◽  
Jiao Wang

Depth maps obtained through sensors are often unsatisfactory because of their low-resolution and noise interference. In this paper, we propose a real-time depth map enhancement system based on a residual network which uses dual channels to process depth maps and intensity maps respectively and cancels the preprocessing process, and the algorithm proposed can achieve real-time processing speed at more than 30 fps. Furthermore, the FPGA design and implementation for depth sensing is also introduced. In this FPGA design, intensity image and depth image are captured by the dual-camera synchronous acquisition system as the input of neural network. Experiments on various depth map restoration shows our algorithms has better performance than existing LRMC, DE-CNN and DDTF algorithms on standard datasets and has a better depth map super-resolution, and our FPGA completed the test of the system to ensure that the data throughput of the USB 3.0 interface of the acquisition system is stable at 226 Mbps, and support dual-camera to work at full speed, that is, 54 fps@ (1280 × 960 + 328 × 248 × 3).


Sensors ◽  
2020 ◽  
Vol 21 (1) ◽  
pp. 15
Author(s):  
Filippo Aleotti ◽  
Giulio Zaccaroni ◽  
Luca Bartolomei ◽  
Matteo Poggi ◽  
Fabio Tosi ◽  
...  

Depth perception is paramount for tackling real-world problems, ranging from autonomous driving to consumer applications. For the latter, depth estimation from a single image would represent the most versatile solution since a standard camera is available on almost any handheld device. Nonetheless, two main issues limit the practical deployment of monocular depth estimation methods on such devices: (i) the low reliability when deployed in the wild and (ii) the resources needed to achieve real-time performance, often not compatible with low-power embedded systems. Therefore, in this paper, we deeply investigate all these issues, showing how they are both addressable by adopting appropriate network design and training strategies. Moreover, we also outline how to map the resulting networks on handheld devices to achieve real-time performance. Our thorough evaluation highlights the ability of such fast networks to generalize well to new environments, a crucial feature required to tackle the extremely varied contexts faced in real applications. Indeed, to further support this evidence, we report experimental results concerning real-time, depth-aware augmented reality and image blurring with smartphones in the wild.


Sensors ◽  
2019 ◽  
Vol 19 (20) ◽  
pp. 4434 ◽  
Author(s):  
Sangwon Kim ◽  
Jaeyeal Nam ◽  
Byoungchul Ko

Depth estimation is a crucial and fundamental problem in the computer vision field. Conventional methods re-construct scenes using feature points extracted from multiple images; however, these approaches require multiple images and thus are not easily implemented in various real-time applications. Moreover, the special equipment required by hardware-based approaches using 3D sensors is expensive. Therefore, software-based methods for estimating depth from a single image using machine learning or deep learning are emerging as new alternatives. In this paper, we propose an algorithm that generates a depth map in real time using a single image and an optimized lightweight efficient neural network (L-ENet) algorithm instead of physical equipment, such as an infrared sensor or multi-view camera. Because depth values have a continuous nature and can produce locally ambiguous results, pixel-wise prediction with ordinal depth range classification was applied in this study. In addition, in our method various convolution techniques are applied to extract a dense feature map, and the number of parameters is greatly reduced by reducing the network layer. By using the proposed L-ENet algorithm, an accurate depth map can be generated from a single image quickly and, in a comparison with the ground truth, we can produce depth values closer to those of the ground truth with small errors. Experiments confirmed that the proposed L-ENet can achieve a significantly improved estimation performance over the state-of-the-art algorithms in depth estimation based on a single image.


2018 ◽  
Vol 28 (12) ◽  
pp. 3437-3451 ◽  
Author(s):  
Yuan Yuan ◽  
Gene Cheung ◽  
Patrick Le Callet ◽  
Pascal Frossard ◽  
H. Vicky Zhao

Sign in / Sign up

Export Citation Format

Share Document