Real-Time Single Image Depth Perception in the Wild with Handheld Devices

Depth perception is paramount for tackling real-world problems, ranging from autonomous driving to consumer applications. For the latter, depth estimation from a single image would represent the most versatile solution since a standard camera is available on almost any handheld device. Nonetheless, two main issues limit the practical deployment of monocular depth estimation methods on such devices: (i) the low reliability when deployed in the wild and (ii) the resources needed to achieve real-time performance, often not compatible with low-power embedded systems. Therefore, in this paper, we deeply investigate all these issues, showing how they are both addressable by adopting appropriate network design and training strategies. Moreover, we also outline how to map the resulting networks on handheld devices to achieve real-time performance. Our thorough evaluation highlights the ability of such fast networks to generalize well to new environments, a crucial feature required to tackle the extremely varied contexts faced in real applications. Indeed, to further support this evidence, we report experimental results concerning real-time, depth-aware augmented reality and image blurring with smartphones in the wild.

Download Full-text

Deep Learning-Based Monocular Depth Estimation Methods—A State-of-the-Art Review

Sensors ◽

10.3390/s20082272 ◽

2020 ◽

Vol 20 (8) ◽

pp. 2272 ◽

Cited By ~ 5

Author(s):

Faisal Khan ◽

Saqib Salahuddin ◽

Hossein Javidnia

Keyword(s):

Deep Learning ◽

State Of The Art ◽

Research Work ◽

Depth Estimation ◽

Autonomous Driving ◽

Estimation Methods ◽

Future Research ◽

Comprehensive Overview ◽

Ill Posed ◽

Monocular Depth

Monocular depth estimation from Red-Green-Blue (RGB) images is a well-studied ill-posed problem in computer vision which has been investigated intensively over the past decade using Deep Learning (DL) approaches. The recent approaches for monocular depth estimation mostly rely on Convolutional Neural Networks (CNN). Estimating depth from two-dimensional images plays an important role in various applications including scene reconstruction, 3D object-detection, robotics and autonomous driving. This survey provides a comprehensive overview of this research topic including the problem representation and a short description of traditional methods for depth estimation. Relevant datasets and 13 state-of-the-art deep learning-based approaches for monocular depth estimation are reviewed, evaluated and discussed. We conclude this paper with a perspective towards future research work requiring further investigation in monocular depth estimation challenges.

Download Full-text

Multiple Road-Objects Detection and Tracking for Autonomous Driving

Journal of Engineering Research ◽

10.36909/jer.10993 ◽

2021 ◽

Author(s):

Wael Farag ◽

Keyword(s):

Kalman Filter ◽

Real Time ◽

High Performance ◽

Unscented Kalman Filter ◽

Measurement Data ◽

Autonomous Driving ◽

Estimation Accuracy ◽

Time Performance ◽

Detection And Tracking ◽

Objects Detection

In this paper, a real-time road-Object Detection and Tracking (LR_ODT) method for autonomous driving is proposed. The method is based on the fusion of lidar and radar measurement data, where they are installed on the ego car, and a customized Unscented Kalman Filter (UKF) is employed for their data fusion. The merits of both devices are combined using the proposed fusion approach to precisely provide both pose and velocity information for objects moving in roads around the ego car. Unlike other detection and tracking approaches, the balanced treatment of both pose estimation accuracy and its real-time performance is the main contribution in this work. The proposed technique is implemented using the high-performance language C++ and utilizes highly optimized math and optimization libraries for best real-time performance. Simulation studies have been carried out to evaluate the performance of the LR_ODT for tracking bicycles, cars, and pedestrians. Moreover, the performance of the UKF fusion is compared to that of the Extended Kalman Filter fusion (EKF) showing its superiority. The UKF has outperformed the EKF on all test cases and all the state variable levels (-24% average RMSE). The employed fusion technique show how outstanding is the improvement in tracking performance compared to the use of a single device (-29% RMES with lidar and -38% RMSE with radar).

Download Full-text

Fast Depth Estimation in a Single Image Using Lightweight Efficient Neural Network

Sensors ◽

10.3390/s19204434 ◽

2019 ◽

Vol 19 (20) ◽

pp. 4434 ◽

Cited By ~ 1

Author(s):

Sangwon Kim ◽

Jaeyeal Nam ◽

Byoungchul Ko

Keyword(s):

Neural Network ◽

Real Time ◽

Fundamental Problem ◽

Depth Map ◽

Ground Truth ◽

Depth Estimation ◽

Depth Range ◽

Single Image ◽

Special Equipment ◽

Multiple Images

Depth estimation is a crucial and fundamental problem in the computer vision field. Conventional methods re-construct scenes using feature points extracted from multiple images; however, these approaches require multiple images and thus are not easily implemented in various real-time applications. Moreover, the special equipment required by hardware-based approaches using 3D sensors is expensive. Therefore, software-based methods for estimating depth from a single image using machine learning or deep learning are emerging as new alternatives. In this paper, we propose an algorithm that generates a depth map in real time using a single image and an optimized lightweight efficient neural network (L-ENet) algorithm instead of physical equipment, such as an infrared sensor or multi-view camera. Because depth values have a continuous nature and can produce locally ambiguous results, pixel-wise prediction with ordinal depth range classification was applied in this study. In addition, in our method various convolution techniques are applied to extract a dense feature map, and the number of parameters is greatly reduced by reducing the network layer. By using the proposed L-ENet algorithm, an accurate depth map can be generated from a single image quickly and, in a comparison with the ground truth, we can produce depth values closer to those of the ground truth with small errors. Experiments confirmed that the proposed L-ENet can achieve a significantly improved estimation performance over the state-of-the-art algorithms in depth estimation based on a single image.

Download Full-text

Measuring the performance of single image depth estimation methods

2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) ◽

10.1109/iros.2016.7759611 ◽

2016 ◽

Cited By ~ 1

Author(s):

Cesar Cadena ◽

Yasir Latif ◽

Ian D. Reid

Keyword(s):

Depth Estimation ◽

Estimation Methods ◽

Single Image ◽

Image Depth

Download Full-text

The Aesthetics of Liminality: Augmentation as Artform

Leonardo ◽

10.1162/leon_a_00837 ◽

2014 ◽

Vol 47 (4) ◽

pp. 325-336 ◽

Cited By ~ 2

Author(s):

Patrick Lichty

Keyword(s):

Virtual Reality ◽

Augmented Reality ◽

Real Time ◽

Virtual Worlds ◽

Handheld Devices ◽

Time Pattern ◽

Handheld Device ◽

Spatial Environment ◽

Critical Issues ◽

Commercial Applications

From ARToolkit’s emergence in the 1990s to the emergence of augmented reality (AR) as an art medium in the 2010s, AR has developed as a number of evidential sites. As an extension of virtual media, it merges real-time pattern recognition with goggles (finally realizing William Gibson’s sci-fi fantasy) or handheld devices. This creates a welding of real-time media and virtual reality, or an optically registered simulation overlaid upon an actual spatial environment. Commercial applications are numerous, including entertainment, sales, and navigation. Even though AR-based works can be traced back to the late 1990s, AR work requires some understanding of coding and tethered imaging equipment. It was not until marker-based AR, affording lower entries to usage, as well as geo-locational AR-based media, using handheld devices and tablets, that augmented reality as an art medium would propagate. While one can argue that AR-based art is a convergence of handheld device art and virtual reality, there are intrinsic gestures specific to augmented reality that make it unique. The author looks at some historical examples of AR as well as critical issues of AR-based gestures such as compounding the gaze, problematizing the retinal, and the representational issues of informatic overlays. This generates four gestural vectors, analogous to those defined in “The Translation of Art in Virtual Worlds,” which is examined through case studies. From this, a visual theory of augmentation will be proposed.

Download Full-text

Domain gap in adapting self-supervised depth estimation methods for stereo-endoscopy

Current Directions in Biomedical Engineering ◽

10.1515/cdbme-2020-0004 ◽

2020 ◽

Vol 6 (1) ◽

Author(s):

Lalith Sharan ◽

Lukas Burger ◽

Georgii Kostiuchik ◽

Ivo Wolf ◽

Matthias Karck ◽

...

Keyword(s):

Visual Information ◽

Ground Truth ◽

Depth Estimation ◽

Autonomous Driving ◽

Mitral Valve Surgery ◽

Estimation Methods ◽

Depth Information ◽

Depth Sensor ◽

Detection Range ◽

Depth Sensors

AbstractIn endoscopy, depth estimation is a task that potentially helps in quantifying visual information for better scene understanding. A plethora of depth estimation algorithms have been proposed in the computer vision community. The endoscopic domain however, differs from the typical depth estimation scenario due to differences in the setup and nature of the scene. Furthermore, it is unfeasible to obtain ground truth depth information owing to an unsuitable detection range of off-the-shelf depth sensors and difficulties in setting up a depth-sensor in a surgical environment. In this paper, an existing self-supervised approach, called Monodepth [1], from the field of autonomous driving is applied to a novel dataset of stereo-endoscopic images from reconstructive mitral valve surgery. While it is already known that endoscopic scenes are more challenging than outdoor driving scenes, the paper performs experiments to quantify the comparison, and describe the domain gap and challenges involved in the transfer of these methods.

Download Full-text

Collision Avoidance Using Deep Learning-Based Monocular Vision

SN Computer Science ◽

10.1007/s42979-021-00759-6 ◽

2021 ◽

Vol 2 (5) ◽

Author(s):

Róbert-Adrian Rill ◽

Kinga Bettina Faragó

Keyword(s):

Deep Learning ◽

Collision Avoidance ◽

Depth Estimation ◽

Autonomous Driving ◽

Monocular Vision ◽

Estimation Methods ◽

Simple Method ◽

Crash Avoidance ◽

Monocular Depth ◽

Similar Accuracy

AbstractAutonomous driving technologies, including monocular vision-based approaches, are in the forefront of industrial and research communities, since they are expected to have a significant impact on economy and society. However, they have limitations in terms of crash avoidance because of the rarity of labeled data for collisions in everyday traffic, as well as due to the complexity of driving situations. In this work, we propose a simple method based solely on monocular vision to overcome the data scarcity problem and to promote forward collision avoidance systems. We exploit state-of-the-art deep learning-based optical flow and monocular depth estimation methods, as well as object detection to estimate the speed of the ego-vehicle and to identify the lead vehicle, respectively. The proposed method utilizes car stop situations as collision surrogates to obtain data for time to collision estimation. We evaluate this approach on our own driving videos, collected using a spherical camera and smart glasses. Our results indicate that similar accuracy can be achieved on both video sources: the external road view from the car’s, and the ego-centric view from the driver’s perspective. Additionally, we set forth the possibility of using spherical cameras as opposed to traditional cameras for vision-based automotive sensing.

Download Full-text

A Survey on Novel Estimation Approach of Motion Controllers for Self-Driving Cars

Journal of Electronics and Informatics - September 2019 ◽

10.36548/jei.2020.4.003 ◽

2021 ◽

Vol 2 (4) ◽

pp. 211-219

Author(s):

Vinothkanna R

Keyword(s):

Motion Planning ◽

Real Time ◽

Autonomous Vehicle ◽

Estimation Method ◽

Autonomous Driving ◽

Estimation Methods ◽

Real Time Traffic ◽

Traffic Uncertainty ◽

Research Article ◽

Uncertainty Control

The motion planning framework is one of the challenging tasks in autonomous driving cars. During motion planning, predicting of trajectory is computed by Gaussian propagation. Recently, the localization uncertainty control will be estimating by Gaussian framework. This estimation suffers from real time constraint distribution for (Global Positioning System) GPS error. In this research article compared novel motion planning methods and concluding the suitable estimating algorithm depends on the two different real time traffic conditions. One is the realistic unusual traffic and complex target is another one. The real time platform is used to measure the several estimation methods for motion planning. Our research article is that comparing novel estimation methods in two different real time environments and an identifying better estimation method for that. Our suggesting idea is that the autonomous vehicle uncertainty control is estimating by modified version of action based coarse trajectory planning. Our suggesting framework permits the planner to avoid complex and unusual traffic (uncertainty condition) efficiently. Our proposed case studies offer to choose effectiveness framework for complex mode of surrounding environment.

Download Full-text

Evaluation of CNN-Based Single-Image Depth Estimation Methods

Lecture Notes in Computer Science - Computer Vision – ECCV 2018 Workshops ◽

10.1007/978-3-030-11015-4_25 ◽

2019 ◽

pp. 331-348 ◽

Cited By ~ 8

Author(s):

Tobias Koch ◽

Lukas Liebel ◽

Friedrich Fraundorfer ◽

Marco Körner

Keyword(s):

Depth Estimation ◽

Estimation Methods ◽

Single Image ◽

Image Depth

Download Full-text

A UNIFIED APPROACH TO REAL-TIME, MULTI-RESOLUTION, MULTI-BASELINE 2D VIEW SYNTHESIS AND 3D DEPTH ESTIMATION USING COMMODITY GRAPHICS HARDWARE

International Journal of Image and Graphics ◽

10.1142/s0219467804001579 ◽

2004 ◽

Vol 04 (04) ◽

pp. 627-651 ◽

Cited By ~ 15

Author(s):

RUIGANG YANG ◽

MARC POLLEFEYS ◽

HUA YANG ◽

GREG WELCH

Keyword(s):

Real Time ◽

Depth Estimation ◽

View Synthesis ◽

Graphics Hardware ◽

Reference Image ◽

Unified Approach ◽

Time Performance ◽

Multiple Resolutions ◽

On Line ◽

Image Pixels

We present a new method for using commodity graphics hardware to achieve real-time, on-line, 2D view synthesis or 3D depth estimation from two or more calibrated cameras. Our method combines a 3D plane-sweeping approach with 2D multi-resolution color consistency tests. We project camera imagery onto each plane, compute measures of color consistency throughout the plane at multiple resolutions, and then choose the color or depth (corresponding plane) that is most consistent. The key to achieving real-time performance is our use of the advanced features included with recent commodity computer graphics hardware to implement the computations simultaneously (in parallel) across all reference image pixels on a plane. Our method is relatively simple to implement, and flexible in term of the number and placement of cameras. With two cameras and an NVIDIA GeForce4 graphics card we can achieve 50–70 M disparity evaluations per second, including image download and read-back overhead. This performance matches the fastest available commercial software-only implementation of correlation-based stereo algorithms, while freeing up the CPU for other uses.

Download Full-text