scholarly journals Merging Static and Dynamic Depth Cues with Optical-Flow Recovery for Creating Stereo Videos

2013 ◽  
Vol 2013 ◽  
pp. 1-12
Author(s):  
Fang-Hsuan Cheng ◽  
Tze-Yun Sung

A method for estimating the depth information of a general monocular image sequence and then creating a 3D stereo video is proposed. Distinguishing between foreground and background is possible without additional information, and then foreground pixels are moved to create the binocular image. The proposed depth estimation method is based on coarse-to-fine strategy. By applying the CID method in the spatial domain, the sharpness and the contrast of an image can be improved by the distance of the region based on its color. Then a coarse depth map of the image can be generated. An optical-flow method based on temporal information is then used to search and compare the block motion status between previous and current frames, and then the distance of the block can be estimated according to the amount of block motion. Finally, the static and motion depth information is integrated to create the fine depth map. By shifting foreground pixels based on the depth information, a binocular image pair can be created. A sense of 3D stereo can be obtained without glasses by an autostereoscopic 3D display.

Author(s):  
Binglin Niu ◽  
Mengxia Tang ◽  
Xuelin Chen

Perceiving the three-dimensional structure of the surrounding environment and analyzing it for autonomous movement is an indispensable element for robots to operate in scenes. Recovering depth information and the three-dimensional spatial structure from monocular images is a basic mission of computer vision. For the objects in the image, there are many scenes that may produce it. This paper proposes to use a supervised end-to-end network to perform depth estimation without relying on any subsequent processing operations, such as probabilistic graphic models and other extra fine steps. This paper uses an encoder-decoder structure with feature pyramid to complete the prediction of dense depth maps. The encoder adopts ResNeXt-50 network to achieve main features from the original image. The feature pyramid structure can merge high and low level information with each other, and the feature information is not lost. The decoder utilizes the transposed convolutional and the convolutional layer to connect as an up-sampling structure to expand the resolution of the output. The structure adopted in this paper is applied to the indoor dataset NYU Depth v2 to obtain better prediction results than other methods. The experimental results show that on the NYU Depth v2 dataset, our method achieves the best results on 5 indicators and the sub-optimal results on 1 indicator.


Sensors ◽  
2018 ◽  
Vol 19 (1) ◽  
pp. 53 ◽  
Author(s):  
Abiel Aguilar-González ◽  
Miguel Arias-Estrada ◽  
François Berry

Applications such as autonomous navigation, robot vision, and autonomous flying require depth map information of a scene. Depth can be estimated by using a single moving camera (depth from motion). However, the traditional depth from motion algorithms have low processing speeds and high hardware requirements that limit the embedded capabilities. In this work, we propose a hardware architecture for depth from motion that consists of a flow/depth transformation and a new optical flow algorithm. Our optical flow formulation consists in an extension of the stereo matching problem. A pixel-parallel/window-parallel approach where a correlation function based on the sum of absolute difference (SAD) computes the optical flow is proposed. Further, in order to improve the SAD, the curl of the intensity gradient as a preprocessing step is proposed. Experimental results demonstrated that it is possible to reach higher accuracy (90% of accuracy) compared with previous Field Programmable Gate Array (FPGA)-based optical flow algorithms. For the depth estimation, our algorithm delivers dense maps with motion and depth information on all image pixels, with a processing speed up to 128 times faster than that of previous work, making it possible to achieve high performance in the context of embedded applications.


Sensors ◽  
2019 ◽  
Vol 19 (7) ◽  
pp. 1708 ◽  
Author(s):  
Daniel Stanley Tan ◽  
Chih-Yuan Yao ◽  
Conrado Ruiz ◽  
Kai-Lung Hua

Depth has been a valuable piece of information for perception tasks such as robot grasping, obstacle avoidance, and navigation, which are essential tasks for developing smart homes and smart cities. However, not all applications have the luxury of using depth sensors or multiple cameras to obtain depth information. In this paper, we tackle the problem of estimating the per-pixel depths from a single image. Inspired by the recent works on generative neural network models, we formulate the task of depth estimation as a generative task where we synthesize an image of the depth map from a single Red, Green, and Blue (RGB) input image. We propose a novel generative adversarial network that has an encoder-decoder type generator with residual transposed convolution blocks trained with an adversarial loss. Quantitative and qualitative experimental results demonstrate the effectiveness of our approach over several depth estimation works.


2021 ◽  
Vol 38 (5) ◽  
pp. 1485-1493
Author(s):  
Yasasvy Tadepalli ◽  
Meenakshi Kollati ◽  
Swaraja Kuraparthi ◽  
Padmavathi Kora

Monocular depth estimation is a hot research topic in autonomous car driving. Deep convolution neural networks (DCNN) comprising encoder and decoder with transfer learning are exploited in the proposed work for monocular depth map estimation of two-dimensional images. Extracted CNN features from initial stages are later upsampled using a sequence of Bilinear UpSampling and convolution layers to reconstruct the depth map. The encoder forms the feature extraction part, and the decoder forms the image reconstruction part. EfficientNetB0, a new architecture is used with pretrained weights as encoder. It is a revolutionary architecture with smaller model parameters yet achieving higher efficiencies than the architectures of state-of-the-art, pretrained networks. EfficientNet-B0 is compared with two other pretrained networks, the DenseNet-121 and ResNet50 models. Each of these three models are used in encoding stage for features extraction followed by bilinear method of UpSampling in the decoder. The Monocular image is an ill-posed problem and is thus considered as a regression problem. So the metrics used in the proposed work are F1-score, Jaccard score and Mean Actual Error (MAE) etc., between the original and the reconstructed image. The results convey that EfficientNet-B0 outperforms in validation loss, F1-score and Jaccard score compared to DenseNet-121 and ResNet-50 models.


Sensors ◽  
2019 ◽  
Vol 20 (1) ◽  
pp. 95 ◽  
Author(s):  
Jinhwan Kim ◽  
Kyung Taek Lim ◽  
Kilyoung Ko ◽  
Eunbie Ko ◽  
Gyuseong Cho

Obtaining the in-depth information of radioactive contaminants is crucial for determining the most cost-effective decommissioning strategy. The main limitations of a burial depth analysis lie in the assumptions that foreknowledge of buried radioisotopes present at the site is always available and that only a single radioisotope is present. We present an advanced depth estimation method using Bayesian inference, which does not rely on those assumptions. Thus, we identified low-level radioactive contaminants buried in a substance and then estimated their depths and activities. To evaluate the performance of the proposed method, several spectra were obtained using a 3 × 3 inch hand-held NaI (Tl) detector exposed to Cs-137, Co-60, Na-22, Am-241, Eu-152, and Eu-154 sources (less than 1μCi) that were buried in a sandbox at depths of up to 15 cm. The experimental results showed that this method is capable of correctly detecting not only a single but also multiple radioisotopes that are buried in sand. Furthermore, it can provide a good approximation of the burial depth and activity of the identified sources in terms of the mean and 95% credible interval in a single measurement. Lastly, we demonstrate that the proposed technique is rarely susceptible to short acquisition time and gain-shift effects.


Information ◽  
2018 ◽  
Vol 9 (12) ◽  
pp. 320 ◽  
Author(s):  
Vanel Lazcano

Optical flow is defined as the motion field of pixels between two consecutive images. Traditionally, in order to estimate pixel motion field (or optical flow), an energy model is proposed. This energy model is composed of (i) a data term and (ii) a regularization term. The data term is an optical flow error estimation and the regularization term imposes spatial smoothness. Traditional variational models use a linearization in the data term. This linearized version of data term fails when the displacement of the object is larger than its own size. Recently, the precision of the optical flow method has been increased due to the use of additional information, obtained from correspondences computed between two images obtained by different methods such as SIFT, deep-matching, and exhaustive search. This work presents an empirical study in order to evaluate different strategies for locating exhaustive correspondences improving flow estimation. We considered a different location for matching random locations, uniform locations, and locations on maximum gradient magnitude. Additionally, we tested the combination of large and medium gradients with uniform locations. We evaluated our methodology in the MPI-Sintel database, which represents the state-of-the-art evaluation databases. Our results in MPI-Sintel show that our proposal outperforms classical methods such as Horn-Schunk, TV-L1, and LDOF, and our method performs similar to MDP-Flow.


Author(s):  
Shaocheng Jia ◽  
Xin Pei ◽  
Zi Yang ◽  
Shan Tian ◽  
Yun Yue

Depth information from still 2D images plays an important role in automated driving, driving safety, and robotics. Monocular depth estimation is considered as an ill-posed and inherently ambiguous problem in general, and a tight issue is how to obtain global information efficiently since pure convolutional neural networks (CNNs) merely extract the local information. To end that, some previous works utilized conditional random fields (CRFs) to obtain the global information, but it is notoriously difficult to optimize. In this paper, a novel hybrid neural network is proposed to solve that, and concurrently a dense depth map is predicted from the monocular still image. Specifically: first, the deep residual network is utilized to obtain multi-scale local information and then feature correlation (FCL) blocks are used to correlate these features. Finally, the feature selection attention-based mechanism is adopted to fuse the multi-layer features, and the multi-layer recurrent neural networks (RNNs) are utilized with bidirectional long short-term memory (Bi-LSTM) unit as the output layer. Furthermore, a novel logarithm exponential average error (LEAE) is proposed to overcome over-weighted problem. The multi-scale feature correlation network (MFCN) is evaluated on large-scale KITTI benchmarks (LKT), which is a subset of KITTI raw dataset, and NYU depth v2. The experiments indicate that the proposed unified network outperforms existing methods. This method also updates the state-of-the-art performance on LKT datasets. Importantly, the depth estimation method can be widely used for collision risk assessment and avoidance in driving assistance systems or automated pilot systems to achieve safety in a more economical and convenient way.


2013 ◽  
Vol 479-480 ◽  
pp. 839-843
Author(s):  
Fang Hsuan Cheng ◽  
Yu Pang Chang

t has been proposed in this paper an idea of refining depth map obtained according to local stereo matching. Energy was calculated based on the entire image, meanwhile, energy minimization concept was adopted, and the area obtained according to color segmentation algorithm was adopted too. The lower the energy of an image, the better depth quality will be generated. The color feature and depth value among different regions and their neighboring regions are used to define the relation between the smooth and occluded regions in the energy function. Then the region energy was calculated repeatedly until the change was insignificant or the number of iterations was reached. The corrected left and right view was used first to perform local stereo matching to get initial depth estimation. The color information of the left view was used to perform color segmentation, and then the segmented region and initial depth estimation were used to calculate the parameter of disparity plane for each region. This process was performed iteratively on the disparity plane, where a more reasonable depth map can be obtained while the energy cost is minimized. From the experimental result, it is proved that the depth map after refinement showed better object shape and smooth region density as compared to that of the initial depth map.


2021 ◽  
Vol 2021 ◽  
pp. 1-10
Author(s):  
Yunzhang Du ◽  
Qian Zhang ◽  
Dingkang Hua ◽  
Jiaqi Hou ◽  
Bin Wang ◽  
...  

The light field is an important way to record the spatial information of the target scene. The purpose of this paper is to obtain depth information through the processing of light field information and provide a basis for intelligent medical treatment. In this paper, we first design an attention module to extract the features of light field images and connect all the features as a feature map to generate an attention image. Then, the attention map is integrated with the convolution layer in the neural network in the form of weights to enhance the weight of the subaperture viewpoint, which is more meaningful for depth estimation. Finally, the obtained initial depth results were optimized. The experimental results show that the MSE, PSNR, and SSIM of the depth map obtained by this method are increased by about 13%, 10 dB, and 4%, respectively, in some scenarios with good performance.


Sign in / Sign up

Export Citation Format

Share Document