Fast Depth Estimation in a Single Image Using Lightweight Efficient Neural Network

Depth estimation is a crucial and fundamental problem in the computer vision field. Conventional methods re-construct scenes using feature points extracted from multiple images; however, these approaches require multiple images and thus are not easily implemented in various real-time applications. Moreover, the special equipment required by hardware-based approaches using 3D sensors is expensive. Therefore, software-based methods for estimating depth from a single image using machine learning or deep learning are emerging as new alternatives. In this paper, we propose an algorithm that generates a depth map in real time using a single image and an optimized lightweight efficient neural network (L-ENet) algorithm instead of physical equipment, such as an infrared sensor or multi-view camera. Because depth values have a continuous nature and can produce locally ambiguous results, pixel-wise prediction with ordinal depth range classification was applied in this study. In addition, in our method various convolution techniques are applied to extract a dense feature map, and the number of parameters is greatly reduced by reducing the network layer. By using the proposed L-ENet algorithm, an accurate depth map can be generated from a single image quickly and, in a comparison with the ground truth, we can produce depth values closer to those of the ground truth with small errors. Experiments confirmed that the proposed L-ENet can achieve a significantly improved estimation performance over the state-of-the-art algorithms in depth estimation based on a single image.

Download Full-text

Unsupervised Monocular Depth Estimation Based on Residual Neural Network of Coarse–Refined Feature Extractions for Drone

Electronics ◽

10.3390/electronics8101179 ◽

2019 ◽

Vol 8 (10) ◽

pp. 1179 ◽

Cited By ~ 1

Author(s):

Tao Huang ◽

Shuanfeng Zhao ◽

Longlong Geng ◽

Qian Xu

Keyword(s):

Neural Network ◽

Image Reconstruction ◽

Depth Map ◽

Ground Truth ◽

Depth Estimation ◽

Input Image ◽

Superior Performance ◽

Estimation Methods ◽

Depth Information ◽

Monocular Depth

To take full advantage of the information of images captured by drones and given that most existing monocular depth estimation methods based on supervised learning require vast quantities of corresponding ground truth depth data for training, the model of unsupervised monocular depth estimation based on residual neural network of coarse–refined feature extractions for drone is therefore proposed. As a virtual camera is introduced through a deep residual convolution neural network based on coarse–refined feature extractions inspired by the principle of binocular depth estimation, the unsupervised monocular depth estimation has become an image reconstruction problem. To improve the performance of our model for monocular depth estimation, the following innovations are proposed. First, the pyramid processing for input image is proposed to build the topological relationship between the resolution of input image and the depth of input image, which can improve the sensitivity of depth information from a single image and reduce the impact of input image resolution on depth estimation. Second, the residual neural network of coarse–refined feature extractions for corresponding image reconstruction is designed to improve the accuracy of feature extraction and solve the contradiction between the calculation time and the numbers of network layers. In addition, to predict high detail output depth maps, the long skip connections between corresponding layers in the neural network of coarse feature extractions and deconvolution neural network of refined feature extractions are designed. Third, the loss of corresponding image reconstruction based on the structural similarity index (SSIM), the loss of approximate disparity smoothness and the loss of depth map are united as a novel training loss to better train our model. The experimental results show that our model has superior performance on the KITTI dataset composed by corresponding left view and right view and Make3D dataset composed by image and corresponding ground truth depth map compared to the state-of-the-art monocular depth estimation methods and basically meet the requirements for depth information of images captured by drones when our model is trained on KITTI.

Download Full-text

Real-Time Single Image Depth Perception in the Wild with Handheld Devices

Sensors ◽

10.3390/s21010015 ◽

2020 ◽

Vol 21 (1) ◽

pp. 15

Author(s):

Filippo Aleotti ◽

Giulio Zaccaroni ◽

Luca Bartolomei ◽

Matteo Poggi ◽

Fabio Tosi ◽

...

Keyword(s):

Real Time ◽

Depth Perception ◽

Depth Estimation ◽

Autonomous Driving ◽

Estimation Methods ◽

Handheld Devices ◽

Single Image ◽

Handheld Device ◽

Time Performance ◽

In The Wild

Depth perception is paramount for tackling real-world problems, ranging from autonomous driving to consumer applications. For the latter, depth estimation from a single image would represent the most versatile solution since a standard camera is available on almost any handheld device. Nonetheless, two main issues limit the practical deployment of monocular depth estimation methods on such devices: (i) the low reliability when deployed in the wild and (ii) the resources needed to achieve real-time performance, often not compatible with low-power embedded systems. Therefore, in this paper, we deeply investigate all these issues, showing how they are both addressable by adopting appropriate network design and training strategies. Moreover, we also outline how to map the resulting networks on handheld devices to achieve real-time performance. Our thorough evaluation highlights the ability of such fast networks to generalize well to new environments, a crucial feature required to tackle the extremely varied contexts faced in real applications. Indeed, to further support this evidence, we report experimental results concerning real-time, depth-aware augmented reality and image blurring with smartphones in the wild.

Download Full-text

Combining Focus Measures for Three Dimensional Shape Estimation Using Genetic Programming

Depth Map and 3D Imaging Applications ◽

10.4018/978-1-61350-326-3.ch011 ◽

2011 ◽

pp. 209-228

Author(s):

Muhammad Tariq Mahmood ◽

Tae-Sun Choi

Keyword(s):

Genetic Programming ◽

Fundamental Problem ◽

Three Dimensional ◽

Depth Map ◽

Depth Estimation ◽

Shape Reconstruction ◽

Optical Methods ◽

3D Shape ◽

Focus Measure ◽

Single Focus

Three-dimensional (3D) shape reconstruction is a fundamental problem in machine vision applications. Shape from focus (SFF) is one of the passive optical methods for 3D shape recovery, which uses degree of focus as a cue to estimate 3D shape. In this approach, usually a single focus measure operator is applied to measure the focus quality of each pixel in image sequence. However, the applicability of a single focus measure is limited to estimate accurately the depth map for diverse type of real objects. To address this problem, we introduce the development of optimal composite depth (OCD) function through genetic programming (GP) for accurate depth estimation. The OCD function is developed through optimally combining the primary information extracted using one (homogeneous features) or more focus measures (heterogeneous features). The genetically developed composite function is then used to compute the optimal depth map of objects. The performance of this function is investigated using both synthetic and real world image sequences. Experimental results demonstrate that the proposed estimator is more accurate than existing SFF methods. Further, it is found that heterogeneous function is more effective than homogeneous function.

Download Full-text

Deep Multi-scale Convolutional Neural Network Method for Depth Estimation from a Single Image

2020 Chinese Control And Decision Conference (CCDC) ◽

10.1109/ccdc49329.2020.9164182 ◽

2020 ◽

Author(s):

Zhaowei Ma ◽

Yifeng Niu ◽

Jia Hu

Keyword(s):

Neural Network ◽

Convolutional Neural Network ◽

Depth Estimation ◽

Single Image ◽

Neural Network Method ◽

Multi Scale ◽

Network Method

Download Full-text

Object Depth Estimation from a Single Image Using Fully Convolutional Neural Network

2016 International Conference on Digital Image Computing: Techniques and Applications (DICTA) ◽

10.1109/dicta.2016.7797068 ◽

2016 ◽

Cited By ~ 4

Author(s):

Ahmed J. Afifi ◽

Olaf Hellwich

Keyword(s):

Neural Network ◽

Convolutional Neural Network ◽

Depth Estimation ◽

Single Image ◽

Object Depth

Download Full-text

Single image depth estimation based on convolutional neural network and sparse connected conditional random field

Optical Engineering ◽

10.1117/1.oe.55.10.103101 ◽

2016 ◽

Vol 55 (10) ◽

pp. 103101 ◽

Cited By ~ 1

Author(s):

Leqing Zhu ◽

Xun Wang ◽

Dadong Wang ◽

Huiyan Wang

Keyword(s):

Neural Network ◽

Convolutional Neural Network ◽

Random Field ◽

Conditional Random Field ◽

Depth Estimation ◽

Single Image ◽

Image Depth

Download Full-text

MiniNet: An extremely lightweight convolutional neural network for real-time unsupervised monocular depth estimation

ISPRS Journal of Photogrammetry and Remote Sensing ◽

10.1016/j.isprsjprs.2020.06.004 ◽

2020 ◽

Vol 166 ◽

pp. 255-267 ◽

Cited By ~ 1

Author(s):

Jun Liu ◽

Qing Li ◽

Rui Cao ◽

Wenming Tang ◽

Guoping Qiu

Keyword(s):

Neural Network ◽

Convolutional Neural Network ◽

Real Time ◽

Depth Estimation ◽

Monocular Depth

Download Full-text

Real-to-virtual domain transfer-based depth estimation for real-time 3D annotation in transnasal surgery: a study of annotation accuracy and stability

International Journal of Computer Assisted Radiology and Surgery ◽

10.1007/s11548-021-02346-9 ◽

2021 ◽

Author(s):

Hon-Sing Tong ◽

Yui-Lun Ng ◽

Zhiyu Liu ◽

Justin D. L. Ho ◽

Po-Ling Chan ◽

...

Keyword(s):

Real Time ◽

Ground Truth ◽

Depth Estimation ◽

Medical Personnel ◽

Absolute Error ◽

Target Point ◽

Endoscopic View ◽

Adversarial Network ◽

Domain Transfer ◽

Accuracy And Stability

Abstract Purpose Surgical annotation promotes effective communication between medical personnel during surgical procedures. However, existing approaches to 2D annotations are mostly static with respect to a display. In this work, we propose a method to achieve 3D annotations that anchor rigidly and stably to target structures upon camera movement in a transnasal endoscopic surgery setting. Methods This is accomplished through intra-operative endoscope tracking and monocular depth estimation. A virtual endoscopic environment is utilized to train a supervised depth estimation network. An adversarial network transfers the style from the real endoscopic view to a synthetic-like view for input into the depth estimation network, wherein framewise depth can be obtained in real time. Results (1) Accuracy: Framewise depth was predicted from images captured from within a nasal airway phantom and compared with ground truth, achieving a SSIM value of 0.8310 ± 0.0655. (2) Stability: mean absolute error (MAE) between reference and predicted depth of a target point was 1.1330 ± 0.9957 mm. Conclusion Both the accuracy and stability evaluations demonstrated the feasibility and practicality of our proposed method for achieving 3D annotations.

Download Full-text

Single-Image Depth Inference Using Generative Adversarial Networks

Sensors ◽

10.3390/s19071708 ◽

2019 ◽

Vol 19 (7) ◽

pp. 1708 ◽

Cited By ~ 1

Author(s):

Daniel Stanley Tan ◽

Chih-Yuan Yao ◽

Conrado Ruiz ◽

Kai-Lung Hua

Keyword(s):

Smart Cities ◽

Depth Map ◽

Depth Estimation ◽

Input Image ◽

Generative Adversarial Networks ◽

Depth Information ◽

Single Image ◽

Neural Network Models ◽

Generative Adversarial Network ◽

Depth Sensors

Depth has been a valuable piece of information for perception tasks such as robot grasping, obstacle avoidance, and navigation, which are essential tasks for developing smart homes and smart cities. However, not all applications have the luxury of using depth sensors or multiple cameras to obtain depth information. In this paper, we tackle the problem of estimating the per-pixel depths from a single image. Inspired by the recent works on generative neural network models, we formulate the task of depth estimation as a generative task where we synthesize an image of the depth map from a single Red, Green, and Blue (RGB) input image. We propose a novel generative adversarial network that has an encoder-decoder type generator with residual transposed convolution blocks trained with an adversarial loss. Quantitative and qualitative experimental results demonstrate the effectiveness of our approach over several depth estimation works.

Download Full-text

Real-time 3D Perception of Scene with Monocular Camera

Embedded Selforganising Systems ◽

10.14464/ess.v7i2.436 ◽

2020 ◽

Vol 7 (2) ◽

pp. 4-7

Author(s):

Shadi Saleh ◽

Shanmugapriyan Manoharan ◽

Wolfram Hardt

Keyword(s):

Supervised Learning ◽

Ground Truth ◽

Depth Estimation ◽

Depth Information ◽

Learning Approach ◽

Single Image ◽

Learning To Learn ◽

3D Perception ◽

Training Time ◽

Visual Appearance

Depth is a vital prerequisite for the fulfillment of various tasks such as perception, navigation, and planning. Estimating depth using only a single image is a challenging task since the analytic mapping is not available between the intensity image and its depth where the features cue of the context is usually absent in the single image. Furthermore, most current researchers rely on the supervised Learning approach to handle depth estimation. Therefore, the demand for recorded ground truth depth is important at the training time, which is actually tricky and costly. This study presents two approaches (unsupervised learning and semi-supervised learning) to learn the depth information using only a single RGB-image. The main objective of depth estimation is to extract a representation of the spatial structure of the environment and to restore the 3D shape and visual appearance of objects in imagery.

Download Full-text