scholarly journals Semi-Automatic Depth Map Generation In Unconstrained Images And Video Sequences For 2D To Stereoscopic 3D Conversion

Author(s):  
Raymond Phan

In this work, we describe a system for accurately estimating depth through synthetic depth maps in unconstrained conventional monocular images and video sequences, to semi-automatically convert these into their stereoscopic 3D counterparts. With current accepted industry efforts, this conversion process is performed automatically in a black box fashion, or manually converted using human operators to extract features and objects on a frame by frame basis, known as rotoscopers. Automatic conversion is the least labour intensive, but allows little to no user intervention, and error correction can be difficult. Manual is the most accurate, providing the most control, but very time consuming, and is prohibitive for use to all but the largest production studios. Noting the merits and disadvantages between these two methods, a semi-automatic method blends the two together, allowing for faster and accurate conversion, while decreasing time for releasing 3D content for user digest. Semi-automatic methods require the user to place user-defined strokes over the image, or over several keyframes in the case of video, corresponding to a rough estimate of the depths in the scene at these strokes. After, the rest of the depths are determined, creating depth maps to generate stereoscopic 3D content, and Depth Image Based Rendering is employed to generate the artificial views. Here, depth map estimation can be considered as a multi-label image segmentation problem: each class is a depth value. Additionally, for video, we allow the option of labeling only the first frame, and the strokes are propagated using one of two techniques: A modified computer vision object tracking algorithm, and edge-aware temporally consistent optical flow./p pFundamentally, this work combines the merits of two well-respected segmentation algorithms: Graph Cuts and Random Walks. The diffusion of depths, with smooth gradients from Random Walks, combined with the edge preserving properties from Graph Cuts can create the best possible result. To demonstrate that the proposed framework generates good quality stereoscopic content with minimal effort, we create results and compare to the current best known semi-automatic conversion framework. We also show that our results are more suitable for human perception in comparison to this framework.

2021 ◽  
Author(s):  
Raymond Phan

In this work, we describe a system for accurately estimating depth through synthetic depth maps in unconstrained conventional monocular images and video sequences, to semi-automatically convert these into their stereoscopic 3D counterparts. With current accepted industry efforts, this conversion process is performed automatically in a black box fashion, or manually converted using human operators to extract features and objects on a frame by frame basis, known as rotoscopers. Automatic conversion is the least labour intensive, but allows little to no user intervention, and error correction can be difficult. Manual is the most accurate, providing the most control, but very time consuming, and is prohibitive for use to all but the largest production studios. Noting the merits and disadvantages between these two methods, a semi-automatic method blends the two together, allowing for faster and accurate conversion, while decreasing time for releasing 3D content for user digest. Semi-automatic methods require the user to place user-defined strokes over the image, or over several keyframes in the case of video, corresponding to a rough estimate of the depths in the scene at these strokes. After, the rest of the depths are determined, creating depth maps to generate stereoscopic 3D content, and Depth Image Based Rendering is employed to generate the artificial views. Here, depth map estimation can be considered as a multi-label image segmentation problem: each class is a depth value. Additionally, for video, we allow the option of labeling only the first frame, and the strokes are propagated using one of two techniques: A modified computer vision object tracking algorithm, and edge-aware temporally consistent optical flow./p pFundamentally, this work combines the merits of two well-respected segmentation algorithms: Graph Cuts and Random Walks. The diffusion of depths, with smooth gradients from Random Walks, combined with the edge preserving properties from Graph Cuts can create the best possible result. To demonstrate that the proposed framework generates good quality stereoscopic content with minimal effort, we create results and compare to the current best known semi-automatic conversion framework. We also show that our results are more suitable for human perception in comparison to this framework.


Entropy ◽  
2021 ◽  
Vol 23 (5) ◽  
pp. 546
Author(s):  
Zhenni Li ◽  
Haoyi Sun ◽  
Yuliang Gao ◽  
Jiao Wang

Depth maps obtained through sensors are often unsatisfactory because of their low-resolution and noise interference. In this paper, we propose a real-time depth map enhancement system based on a residual network which uses dual channels to process depth maps and intensity maps respectively and cancels the preprocessing process, and the algorithm proposed can achieve real-time processing speed at more than 30 fps. Furthermore, the FPGA design and implementation for depth sensing is also introduced. In this FPGA design, intensity image and depth image are captured by the dual-camera synchronous acquisition system as the input of neural network. Experiments on various depth map restoration shows our algorithms has better performance than existing LRMC, DE-CNN and DDTF algorithms on standard datasets and has a better depth map super-resolution, and our FPGA completed the test of the system to ensure that the data throughput of the USB 3.0 interface of the acquisition system is stable at 226 Mbps, and support dual-camera to work at full speed, that is, 54 fps@ (1280 × 960 + 328 × 248 × 3).


2019 ◽  
Vol 11 (10) ◽  
pp. 204 ◽  
Author(s):  
Dogan ◽  
Haddad ◽  
Ekmekcioglu ◽  
Kondoz

When it comes to evaluating perceptual quality of digital media for overall quality of experience assessment in immersive video applications, typically two main approaches stand out: Subjective and objective quality evaluation. On one hand, subjective quality evaluation offers the best representation of perceived video quality assessed by the real viewers. On the other hand, it consumes a significant amount of time and effort, due to the involvement of real users with lengthy and laborious assessment procedures. Thus, it is essential that an objective quality evaluation model is developed. The speed-up advantage offered by an objective quality evaluation model, which can predict the quality of rendered virtual views based on the depth maps used in the rendering process, allows for faster quality assessments for immersive video applications. This is particularly important given the lack of a suitable reference or ground truth for comparing the available depth maps, especially when live content services are offered in those applications. This paper presents a no-reference depth map quality evaluation model based on a proposed depth map edge confidence measurement technique to assist with accurately estimating the quality of rendered (virtual) views in immersive multi-view video content. The model is applied for depth image-based rendering in multi-view video format, providing comparable evaluation results to those existing in the literature, and often exceeding their performance.


2021 ◽  
Author(s):  
Mohammad Fawaz

This thesis proposes an adaptive method for 2D to 3D conversion of images using a user-aided process based on Graph Cuts and Random Walks. Given user-defined labelling that correspond to a rough estimate of depth, the system produces a depth map which, combined with a 2D image can be used to synthesize a stereoscopic image pair. The work presented here is an extension of work done previously combining the popular Graph Cuts and Random Walks image segmentation algorithms. Specifically, the previous approach has been made adaptive by removing empirically determined constants; as well the quality of the results has been improved. This is achieved by feeding information from the Graph Cuts result into the Random Walks process in two different ways, and using edge and spatial information to adapt various weights. This thesis also presents a practical application which allows for a user to go through the entire process of 2D to 3D conversion using the method proposed in this work. The application is written using MATLAB, and allows a user to generate and edit depth maps intuitively and also allows a user to synthesize additional views of the image for display on 3D capable devices.


Electronics ◽  
2020 ◽  
Vol 9 (6) ◽  
pp. 906
Author(s):  
Hui-Yu Huang ◽  
Shao-Yu Huang

The recent emergence of three-dimensional (3D) movies and 3D television (TV) indicates an increasing interest in 3D content. Stereoscopic displays have enabled visual experiences to be enhanced, allowing the world to be viewed in 3D. Virtual view synthesis is the key technology to present 3D content, and depth image-based rendering (DIBR) is a classic virtual view synthesis method. With a texture image and its corresponding depth map, a virtual view can be generated using the DIBR technique. The depth and camera parameters are used to project the entire pixel in the image to the 3D world coordinate system. The results in the world coordinates are then reprojected into the virtual view, based on 3D warping. However, these projections will result in cracks (holes). Hence, we herein propose a new method of DIBR for free viewpoint videos to solve the hole problem due to these projection processes. First, the depth map is preprocessed to reduce the number of holes, which does not produce large-scale geometric distortions; subsequently, improved 3D warping projection is performed collectively to create the virtual view. A median filter is used to filter the hole regions in the virtual view, followed by 3D inverse warping blending to remove the holes. Next, brightness adjustment and adaptive image blending are performed. Finally, the synthesized virtual view is obtained using the inpainting method. Experimental results verify that our proposed method can produce a pleasant visibility of the synthetized virtual view, maintain a high peak signal-to-noise ratio (PSNR) value, and efficiently decrease execution time compared with state-of-the-art methods.


2020 ◽  
Vol 2020 (2) ◽  
pp. 140-1-140-6
Author(s):  
Yuzhong Jiao ◽  
Kayton Wai Keung Cheung ◽  
Mark Ping Chan Mok ◽  
Yiu Kei Li

Computer generated 2D plus Depth (2D+Z) images are common input data for 3D display with depth image-based rendering (DIBR) technique. Due to their simplicity, linear interpolation methods are usually used to convert low-resolution images into high-resolution images for not only depth maps but also 2D RGB images. However linear methods suffer from zigzag artifacts in both depth map and RGB images, which severely affects the 3D visual experience. In this paper, spatial distance-based interpolation algorithm for computer generated 2D+Z images is proposed. The method interpolates RGB images with the help of depth and edge information from depth maps. Spatial distance from interpolated pixel to surrounding available pixels is utilized to obtain the weight factors of surrounding pixels. Experiment results show that such spatial distance-based interpolation can achieve sharp edges and less artifacts for 2D RGB images. Naturally, it can improve the performance of 3D display. Since bilinear interpolation is used in homogenous areas, the proposed algorithm keeps low computational complexity.


2021 ◽  
Author(s):  
Mohammad Fawaz

This thesis proposes an adaptive method for 2D to 3D conversion of images using a user-aided process based on Graph Cuts and Random Walks. Given user-defined labelling that correspond to a rough estimate of depth, the system produces a depth map which, combined with a 2D image can be used to synthesize a stereoscopic image pair. The work presented here is an extension of work done previously combining the popular Graph Cuts and Random Walks image segmentation algorithms. Specifically, the previous approach has been made adaptive by removing empirically determined constants; as well the quality of the results has been improved. This is achieved by feeding information from the Graph Cuts result into the Random Walks process in two different ways, and using edge and spatial information to adapt various weights. This thesis also presents a practical application which allows for a user to go through the entire process of 2D to 3D conversion using the method proposed in this work. The application is written using MATLAB, and allows a user to generate and edit depth maps intuitively and also allows a user to synthesize additional views of the image for display on 3D capable devices.


Author(s):  
Takuya Matsuo ◽  
Naoki Kodera ◽  
Norishige Fukushima ◽  
Yutaka Ishibashi

In this paper, we propose a renement lter for depth maps. The lter convolutes an image and a depth map with a cross computed kernel. We call the lter joint trilateral lter. Main advantages of the proposed method are that the lter ts outlines of objects in the depth map to silhouettes in the im- age, and the lter reduces Gaussian noise in other areas. The eects reduce rendering artifacts when a free viewpoint image is generated by point cloud ren- dering and depth image based rendering techniques. Additionally, their computational cost is independent of depth ranges. Thus we can obtain accurate depth maps with the lower cost than the conventional ap- proaches, which require Markov random eld based optimization methods. Experimental results show that the accuracy of the depth map in edge areas goes up and its running time decreases. In addition, the lter improves the accuracy of edges in the depth map from Kinect sensor. As results, the quality of the rendering image is improved.


2021 ◽  
Vol 11 (6) ◽  
pp. 2666
Author(s):  
Hafiz Muhammad Usama Hassan Alvi ◽  
Muhammad Shahid Farid ◽  
Muhammad Hassan Khan ◽  
Marcin Grzegorzek

Emerging 3D-related technologies such as augmented reality, virtual reality, mixed reality, and stereoscopy have gained remarkable growth due to their numerous applications in the entertainment, gaming, and electromedical industries. In particular, the 3D television (3DTV) and free-viewpoint television (FTV) enhance viewers’ television experience by providing immersion. They need an infinite number of views to provide a full parallax to the viewer, which is not practical due to various financial and technological constraints. Therefore, novel 3D views are generated from a set of available views and their depth maps using depth-image-based rendering (DIBR) techniques. The quality of a DIBR-synthesized image may be compromised for several reasons, e.g., inaccurate depth estimation. Since depth is important in this application, inaccuracies in depth maps lead to different textural and structural distortions that degrade the quality of the generated image and result in a poor quality of experience (QoE). Therefore, quality assessment DIBR-generated images are essential to guarantee an appreciative QoE. This paper aims at estimating the quality of DIBR-synthesized images and proposes a novel 3D objective image quality metric. The proposed algorithm aims to measure both textural and structural distortions in the DIBR image by exploiting the contrast sensitivity and the Hausdorff distance, respectively. The two measures are combined to estimate an overall quality score. The experimental evaluations performed on the benchmark MCL-3D dataset show that the proposed metric is reliable and accurate, and performs better than existing 2D and 3D quality assessment metrics.


Sign in / Sign up

Export Citation Format

Share Document