scholarly journals Domain gap in adapting self-supervised depth estimation methods for stereo-endoscopy

2020 ◽  
Vol 6 (1) ◽  
Author(s):  
Lalith Sharan ◽  
Lukas Burger ◽  
Georgii Kostiuchik ◽  
Ivo Wolf ◽  
Matthias Karck ◽  
...  

AbstractIn endoscopy, depth estimation is a task that potentially helps in quantifying visual information for better scene understanding. A plethora of depth estimation algorithms have been proposed in the computer vision community. The endoscopic domain however, differs from the typical depth estimation scenario due to differences in the setup and nature of the scene. Furthermore, it is unfeasible to obtain ground truth depth information owing to an unsuitable detection range of off-the-shelf depth sensors and difficulties in setting up a depth-sensor in a surgical environment. In this paper, an existing self-supervised approach, called Monodepth [1], from the field of autonomous driving is applied to a novel dataset of stereo-endoscopic images from reconstructive mitral valve surgery. While it is already known that endoscopic scenes are more challenging than outdoor driving scenes, the paper performs experiments to quantify the comparison, and describe the domain gap and challenges involved in the transfer of these methods.

Electronics ◽  
2019 ◽  
Vol 8 (10) ◽  
pp. 1179 ◽  
Author(s):  
Tao Huang ◽  
Shuanfeng Zhao ◽  
Longlong Geng ◽  
Qian Xu

To take full advantage of the information of images captured by drones and given that most existing monocular depth estimation methods based on supervised learning require vast quantities of corresponding ground truth depth data for training, the model of unsupervised monocular depth estimation based on residual neural network of coarse–refined feature extractions for drone is therefore proposed. As a virtual camera is introduced through a deep residual convolution neural network based on coarse–refined feature extractions inspired by the principle of binocular depth estimation, the unsupervised monocular depth estimation has become an image reconstruction problem. To improve the performance of our model for monocular depth estimation, the following innovations are proposed. First, the pyramid processing for input image is proposed to build the topological relationship between the resolution of input image and the depth of input image, which can improve the sensitivity of depth information from a single image and reduce the impact of input image resolution on depth estimation. Second, the residual neural network of coarse–refined feature extractions for corresponding image reconstruction is designed to improve the accuracy of feature extraction and solve the contradiction between the calculation time and the numbers of network layers. In addition, to predict high detail output depth maps, the long skip connections between corresponding layers in the neural network of coarse feature extractions and deconvolution neural network of refined feature extractions are designed. Third, the loss of corresponding image reconstruction based on the structural similarity index (SSIM), the loss of approximate disparity smoothness and the loss of depth map are united as a novel training loss to better train our model. The experimental results show that our model has superior performance on the KITTI dataset composed by corresponding left view and right view and Make3D dataset composed by image and corresponding ground truth depth map compared to the state-of-the-art monocular depth estimation methods and basically meet the requirements for depth information of images captured by drones when our model is trained on KITTI.


Sensors ◽  
2020 ◽  
Vol 20 (9) ◽  
pp. 2567
Author(s):  
Dong-hoon Kwak ◽  
Seung-ho Lee

Modern image processing techniques use three-dimensional (3D) images, which contain spatial information such as depth and scale, in addition to visual information. These images are indispensable in virtual reality, augmented reality (AR), and autonomous driving applications. We propose a novel method to estimate monocular depth using a cycle generative adversarial network (GAN) and segmentation. In this paper, we propose a method for estimating depth information by combining segmentation. It uses three processes: segmentation and depth estimation, adversarial loss calculations, and cycle consistency loss calculations. The cycle consistency loss calculation process evaluates the similarity of two images when they are restored to their original forms after being estimated separately from two adversarial losses. To evaluate the objective reliability of the proposed method, we compared our proposed method with other monocular depth estimation (MDE) methods using the NYU Depth Dataset V2. Our results show that the benchmark value for our proposed method is better than other methods. Therefore, we demonstrated that our proposed method is more efficient in determining depth estimation.


Sensors ◽  
2020 ◽  
Vol 21 (1) ◽  
pp. 15
Author(s):  
Filippo Aleotti ◽  
Giulio Zaccaroni ◽  
Luca Bartolomei ◽  
Matteo Poggi ◽  
Fabio Tosi ◽  
...  

Depth perception is paramount for tackling real-world problems, ranging from autonomous driving to consumer applications. For the latter, depth estimation from a single image would represent the most versatile solution since a standard camera is available on almost any handheld device. Nonetheless, two main issues limit the practical deployment of monocular depth estimation methods on such devices: (i) the low reliability when deployed in the wild and (ii) the resources needed to achieve real-time performance, often not compatible with low-power embedded systems. Therefore, in this paper, we deeply investigate all these issues, showing how they are both addressable by adopting appropriate network design and training strategies. Moreover, we also outline how to map the resulting networks on handheld devices to achieve real-time performance. Our thorough evaluation highlights the ability of such fast networks to generalize well to new environments, a crucial feature required to tackle the extremely varied contexts faced in real applications. Indeed, to further support this evidence, we report experimental results concerning real-time, depth-aware augmented reality and image blurring with smartphones in the wild.


Sensors ◽  
2020 ◽  
Vol 20 (8) ◽  
pp. 2272 ◽  
Author(s):  
Faisal Khan ◽  
Saqib Salahuddin ◽  
Hossein Javidnia

Monocular depth estimation from Red-Green-Blue (RGB) images is a well-studied ill-posed problem in computer vision which has been investigated intensively over the past decade using Deep Learning (DL) approaches. The recent approaches for monocular depth estimation mostly rely on Convolutional Neural Networks (CNN). Estimating depth from two-dimensional images plays an important role in various applications including scene reconstruction, 3D object-detection, robotics and autonomous driving. This survey provides a comprehensive overview of this research topic including the problem representation and a short description of traditional methods for depth estimation. Relevant datasets and 13 state-of-the-art deep learning-based approaches for monocular depth estimation are reviewed, evaluated and discussed. We conclude this paper with a perspective towards future research work requiring further investigation in monocular depth estimation challenges.


Sensors ◽  
2019 ◽  
Vol 19 (7) ◽  
pp. 1708 ◽  
Author(s):  
Daniel Stanley Tan ◽  
Chih-Yuan Yao ◽  
Conrado Ruiz ◽  
Kai-Lung Hua

Depth has been a valuable piece of information for perception tasks such as robot grasping, obstacle avoidance, and navigation, which are essential tasks for developing smart homes and smart cities. However, not all applications have the luxury of using depth sensors or multiple cameras to obtain depth information. In this paper, we tackle the problem of estimating the per-pixel depths from a single image. Inspired by the recent works on generative neural network models, we formulate the task of depth estimation as a generative task where we synthesize an image of the depth map from a single Red, Green, and Blue (RGB) input image. We propose a novel generative adversarial network that has an encoder-decoder type generator with residual transposed convolution blocks trained with an adversarial loss. Quantitative and qualitative experimental results demonstrate the effectiveness of our approach over several depth estimation works.


2020 ◽  
Vol 7 (2) ◽  
pp. 4-7
Author(s):  
Shadi Saleh ◽  
Shanmugapriyan Manoharan ◽  
Wolfram Hardt

Depth is a vital prerequisite for the fulfillment of various tasks such as perception, navigation, and planning. Estimating depth using only a single image is a challenging task since the analytic mapping is not available between the intensity image and its depth where the features cue of the context is usually absent in the single image. Furthermore, most current researchers rely on the supervised Learning approach to handle depth estimation. Therefore, the demand for recorded ground truth depth is important at the training time, which is actually tricky and costly. This study presents two approaches (unsupervised learning and semi-supervised learning) to learn the depth information using only a single RGB-image. The main objective of depth estimation is to extract a representation of the spatial structure of the environment and to restore the 3D shape and visual appearance of objects in imagery.


2020 ◽  
Vol 6 ◽  
pp. e317
Author(s):  
Dmitrii Maslov ◽  
Ilya Makarov

Autonomous driving highly depends on depth information for safe driving. Recently, major improvements have been taken towards improving both supervised and self-supervised methods for depth reconstruction. However, most of the current approaches focus on single frame depth estimation, where quality limit is hard to beat due to limitations of supervised learning of deep neural networks in general. One of the way to improve quality of existing methods is to utilize temporal information from frame sequences. In this paper, we study intelligent ways of integrating recurrent block in common supervised depth estimation pipeline. We propose a novel method, which takes advantage of the convolutional gated recurrent unit (convGRU) and convolutional long short-term memory (convLSTM). We compare use of convGRU and convLSTM blocks and determine the best model for real-time depth estimation task. We carefully study training strategy and provide new deep neural networks architectures for the task of depth estimation from monocular video using information from past frames based on attention mechanism. We demonstrate the efficiency of exploiting temporal information by comparing our best recurrent method with existing image-based and video-based solutions for monocular depth reconstruction.


2020 ◽  
Vol 2020 (14) ◽  
pp. 391-1-391-7
Author(s):  
Jieyu Li ◽  
Robert L. Stevenson

This paper presents an algorithm for indoor layout estimation and reconstruction through the fusion of a sequence of captured images and LiDAR data sets. In the proposed system, a movable platform collects both intensity images and 2D LiDAR information. Pose estimation and semantic segmentation is computed jointly by aligning the LiDAR points to line segments from the images. For indoor scenes with walls orthogonal to floor, the alignment problem is decoupled into top-down view projection and a 2D similarity transformation estimation and solved by the recursive random sample consensus (R-RANSAC) algorithm. Hypotheses can be generated, evaluated and optimized by integrating new scans as the platform moves throughout the environment. The proposed method avoids the need of extensive prior training or a cuboid layout assumption, which is more effective and practical compared to most previous indoor layout estimation methods. Multi-sensor fusion allows the capability of providing accurate depth estimation and high resolution visual information.


Sensors ◽  
2020 ◽  
Vol 20 (4) ◽  
pp. 1156
Author(s):  
Eu-Tteum Baek ◽  
Hyung-Jeong Yang ◽  
Soo-Hyung Kim ◽  
Gueesang Lee ◽  
Hieyong Jeong

A distance map captured using a time-of-flight (ToF) depth sensor has fundamental problems, such as ambiguous depth information in shiny or dark surfaces, optical noise, and mismatched boundaries. Severe depth errors exist in shiny and dark surfaces owing to excess reflection and excess absorption of light, respectively. Dealing with this problem has been a challenge due to the inherent hardware limitations of ToF, which measures the distance using the number of reflected photons. This study proposes a distance error correction method using three ToF sensors, set to different integration times to address the ambiguity in depth information. First, the three ToF depth sensors are installed horizontally at different integration times to capture distance maps at different integration times. Given the amplitude maps and error regions are estimated based on the amount of light, the estimated error regions are refined by exploiting the accurate depth information from the neighboring depth sensors that use different integration times. Moreover, we propose a new optical noise reduction filter that considers the distribution of the depth information biased toward one side. Experimental results verified that the proposed method overcomes the drawbacks of ToF cameras and provides enhanced distance maps.


2021 ◽  
Vol 2 (5) ◽  
Author(s):  
Róbert-Adrian Rill ◽  
Kinga Bettina Faragó

AbstractAutonomous driving technologies, including monocular vision-based approaches, are in the forefront of industrial and research communities, since they are expected to have a significant impact on economy and society. However, they have limitations in terms of crash avoidance because of the rarity of labeled data for collisions in everyday traffic, as well as due to the complexity of driving situations. In this work, we propose a simple method based solely on monocular vision to overcome the data scarcity problem and to promote forward collision avoidance systems. We exploit state-of-the-art deep learning-based optical flow and monocular depth estimation methods, as well as object detection to estimate the speed of the ego-vehicle and to identify the lead vehicle, respectively. The proposed method utilizes car stop situations as collision surrogates to obtain data for time to collision estimation. We evaluate this approach on our own driving videos, collected using a spherical camera and smart glasses. Our results indicate that similar accuracy can be achieved on both video sources: the external road view from the car’s, and the ego-centric view from the driver’s perspective. Additionally, we set forth the possibility of using spherical cameras as opposed to traditional cameras for vision-based automotive sensing.


Sign in / Sign up

Export Citation Format

Share Document