scholarly journals Fast Depth Estimation in a Single Image Using Lightweight Efficient Neural Network

Sensors ◽  
2019 ◽  
Vol 19 (20) ◽  
pp. 4434 ◽  
Author(s):  
Sangwon Kim ◽  
Jaeyeal Nam ◽  
Byoungchul Ko

Depth estimation is a crucial and fundamental problem in the computer vision field. Conventional methods re-construct scenes using feature points extracted from multiple images; however, these approaches require multiple images and thus are not easily implemented in various real-time applications. Moreover, the special equipment required by hardware-based approaches using 3D sensors is expensive. Therefore, software-based methods for estimating depth from a single image using machine learning or deep learning are emerging as new alternatives. In this paper, we propose an algorithm that generates a depth map in real time using a single image and an optimized lightweight efficient neural network (L-ENet) algorithm instead of physical equipment, such as an infrared sensor or multi-view camera. Because depth values have a continuous nature and can produce locally ambiguous results, pixel-wise prediction with ordinal depth range classification was applied in this study. In addition, in our method various convolution techniques are applied to extract a dense feature map, and the number of parameters is greatly reduced by reducing the network layer. By using the proposed L-ENet algorithm, an accurate depth map can be generated from a single image quickly and, in a comparison with the ground truth, we can produce depth values closer to those of the ground truth with small errors. Experiments confirmed that the proposed L-ENet can achieve a significantly improved estimation performance over the state-of-the-art algorithms in depth estimation based on a single image.

Electronics ◽  
2019 ◽  
Vol 8 (10) ◽  
pp. 1179 ◽  
Author(s):  
Tao Huang ◽  
Shuanfeng Zhao ◽  
Longlong Geng ◽  
Qian Xu

To take full advantage of the information of images captured by drones and given that most existing monocular depth estimation methods based on supervised learning require vast quantities of corresponding ground truth depth data for training, the model of unsupervised monocular depth estimation based on residual neural network of coarse–refined feature extractions for drone is therefore proposed. As a virtual camera is introduced through a deep residual convolution neural network based on coarse–refined feature extractions inspired by the principle of binocular depth estimation, the unsupervised monocular depth estimation has become an image reconstruction problem. To improve the performance of our model for monocular depth estimation, the following innovations are proposed. First, the pyramid processing for input image is proposed to build the topological relationship between the resolution of input image and the depth of input image, which can improve the sensitivity of depth information from a single image and reduce the impact of input image resolution on depth estimation. Second, the residual neural network of coarse–refined feature extractions for corresponding image reconstruction is designed to improve the accuracy of feature extraction and solve the contradiction between the calculation time and the numbers of network layers. In addition, to predict high detail output depth maps, the long skip connections between corresponding layers in the neural network of coarse feature extractions and deconvolution neural network of refined feature extractions are designed. Third, the loss of corresponding image reconstruction based on the structural similarity index (SSIM), the loss of approximate disparity smoothness and the loss of depth map are united as a novel training loss to better train our model. The experimental results show that our model has superior performance on the KITTI dataset composed by corresponding left view and right view and Make3D dataset composed by image and corresponding ground truth depth map compared to the state-of-the-art monocular depth estimation methods and basically meet the requirements for depth information of images captured by drones when our model is trained on KITTI.


Sensors ◽  
2020 ◽  
Vol 21 (1) ◽  
pp. 15
Author(s):  
Filippo Aleotti ◽  
Giulio Zaccaroni ◽  
Luca Bartolomei ◽  
Matteo Poggi ◽  
Fabio Tosi ◽  
...  

Depth perception is paramount for tackling real-world problems, ranging from autonomous driving to consumer applications. For the latter, depth estimation from a single image would represent the most versatile solution since a standard camera is available on almost any handheld device. Nonetheless, two main issues limit the practical deployment of monocular depth estimation methods on such devices: (i) the low reliability when deployed in the wild and (ii) the resources needed to achieve real-time performance, often not compatible with low-power embedded systems. Therefore, in this paper, we deeply investigate all these issues, showing how they are both addressable by adopting appropriate network design and training strategies. Moreover, we also outline how to map the resulting networks on handheld devices to achieve real-time performance. Our thorough evaluation highlights the ability of such fast networks to generalize well to new environments, a crucial feature required to tackle the extremely varied contexts faced in real applications. Indeed, to further support this evidence, we report experimental results concerning real-time, depth-aware augmented reality and image blurring with smartphones in the wild.


Author(s):  
Muhammad Tariq Mahmood ◽  
Tae-Sun Choi

Three-dimensional (3D) shape reconstruction is a fundamental problem in machine vision applications. Shape from focus (SFF) is one of the passive optical methods for 3D shape recovery, which uses degree of focus as a cue to estimate 3D shape. In this approach, usually a single focus measure operator is applied to measure the focus quality of each pixel in image sequence. However, the applicability of a single focus measure is limited to estimate accurately the depth map for diverse type of real objects. To address this problem, we introduce the development of optimal composite depth (OCD) function through genetic programming (GP) for accurate depth estimation. The OCD function is developed through optimally combining the primary information extracted using one (homogeneous features) or more focus measures (heterogeneous features). The genetically developed composite function is then used to compute the optimal depth map of objects. The performance of this function is investigated using both synthetic and real world image sequences. Experimental results demonstrate that the proposed estimator is more accurate than existing SFF methods. Further, it is found that heterogeneous function is more effective than homogeneous function.


Author(s):  
Hon-Sing Tong ◽  
Yui-Lun Ng ◽  
Zhiyu Liu ◽  
Justin D. L. Ho ◽  
Po-Ling Chan ◽  
...  

Abstract Purpose Surgical annotation promotes effective communication between medical personnel during surgical procedures. However, existing approaches to 2D annotations are mostly static with respect to a display. In this work, we propose a method to achieve 3D annotations that anchor rigidly and stably to target structures upon camera movement in a transnasal endoscopic surgery setting. Methods This is accomplished through intra-operative endoscope tracking and monocular depth estimation. A virtual endoscopic environment is utilized to train a supervised depth estimation network. An adversarial network transfers the style from the real endoscopic view to a synthetic-like view for input into the depth estimation network, wherein framewise depth can be obtained in real time. Results (1) Accuracy: Framewise depth was predicted from images captured from within a nasal airway phantom and compared with ground truth, achieving a SSIM value of 0.8310 ± 0.0655. (2) Stability: mean absolute error (MAE) between reference and predicted depth of a target point was 1.1330 ± 0.9957 mm. Conclusion Both the accuracy and stability evaluations demonstrated the feasibility and practicality of our proposed method for achieving 3D annotations.


Sensors ◽  
2019 ◽  
Vol 19 (7) ◽  
pp. 1708 ◽  
Author(s):  
Daniel Stanley Tan ◽  
Chih-Yuan Yao ◽  
Conrado Ruiz ◽  
Kai-Lung Hua

Depth has been a valuable piece of information for perception tasks such as robot grasping, obstacle avoidance, and navigation, which are essential tasks for developing smart homes and smart cities. However, not all applications have the luxury of using depth sensors or multiple cameras to obtain depth information. In this paper, we tackle the problem of estimating the per-pixel depths from a single image. Inspired by the recent works on generative neural network models, we formulate the task of depth estimation as a generative task where we synthesize an image of the depth map from a single Red, Green, and Blue (RGB) input image. We propose a novel generative adversarial network that has an encoder-decoder type generator with residual transposed convolution blocks trained with an adversarial loss. Quantitative and qualitative experimental results demonstrate the effectiveness of our approach over several depth estimation works.


2020 ◽  
Vol 7 (2) ◽  
pp. 4-7
Author(s):  
Shadi Saleh ◽  
Shanmugapriyan Manoharan ◽  
Wolfram Hardt

Depth is a vital prerequisite for the fulfillment of various tasks such as perception, navigation, and planning. Estimating depth using only a single image is a challenging task since the analytic mapping is not available between the intensity image and its depth where the features cue of the context is usually absent in the single image. Furthermore, most current researchers rely on the supervised Learning approach to handle depth estimation. Therefore, the demand for recorded ground truth depth is important at the training time, which is actually tricky and costly. This study presents two approaches (unsupervised learning and semi-supervised learning) to learn the depth information using only a single RGB-image. The main objective of depth estimation is to extract a representation of the spatial structure of the environment and to restore the 3D shape and visual appearance of objects in imagery.


Sign in / Sign up

Export Citation Format

Share Document