scholarly journals Real-Time Single Image Depth Perception in the Wild with Handheld Devices

Sensors ◽  
2020 ◽  
Vol 21 (1) ◽  
pp. 15
Author(s):  
Filippo Aleotti ◽  
Giulio Zaccaroni ◽  
Luca Bartolomei ◽  
Matteo Poggi ◽  
Fabio Tosi ◽  
...  

Depth perception is paramount for tackling real-world problems, ranging from autonomous driving to consumer applications. For the latter, depth estimation from a single image would represent the most versatile solution since a standard camera is available on almost any handheld device. Nonetheless, two main issues limit the practical deployment of monocular depth estimation methods on such devices: (i) the low reliability when deployed in the wild and (ii) the resources needed to achieve real-time performance, often not compatible with low-power embedded systems. Therefore, in this paper, we deeply investigate all these issues, showing how they are both addressable by adopting appropriate network design and training strategies. Moreover, we also outline how to map the resulting networks on handheld devices to achieve real-time performance. Our thorough evaluation highlights the ability of such fast networks to generalize well to new environments, a crucial feature required to tackle the extremely varied contexts faced in real applications. Indeed, to further support this evidence, we report experimental results concerning real-time, depth-aware augmented reality and image blurring with smartphones in the wild.

Sensors ◽  
2020 ◽  
Vol 20 (8) ◽  
pp. 2272 ◽  
Author(s):  
Faisal Khan ◽  
Saqib Salahuddin ◽  
Hossein Javidnia

Monocular depth estimation from Red-Green-Blue (RGB) images is a well-studied ill-posed problem in computer vision which has been investigated intensively over the past decade using Deep Learning (DL) approaches. The recent approaches for monocular depth estimation mostly rely on Convolutional Neural Networks (CNN). Estimating depth from two-dimensional images plays an important role in various applications including scene reconstruction, 3D object-detection, robotics and autonomous driving. This survey provides a comprehensive overview of this research topic including the problem representation and a short description of traditional methods for depth estimation. Relevant datasets and 13 state-of-the-art deep learning-based approaches for monocular depth estimation are reviewed, evaluated and discussed. We conclude this paper with a perspective towards future research work requiring further investigation in monocular depth estimation challenges.


Author(s):  
Wael Farag ◽  

In this paper, a real-time road-Object Detection and Tracking (LR_ODT) method for autonomous driving is proposed. The method is based on the fusion of lidar and radar measurement data, where they are installed on the ego car, and a customized Unscented Kalman Filter (UKF) is employed for their data fusion. The merits of both devices are combined using the proposed fusion approach to precisely provide both pose and velocity information for objects moving in roads around the ego car. Unlike other detection and tracking approaches, the balanced treatment of both pose estimation accuracy and its real-time performance is the main contribution in this work. The proposed technique is implemented using the high-performance language C++ and utilizes highly optimized math and optimization libraries for best real-time performance. Simulation studies have been carried out to evaluate the performance of the LR_ODT for tracking bicycles, cars, and pedestrians. Moreover, the performance of the UKF fusion is compared to that of the Extended Kalman Filter fusion (EKF) showing its superiority. The UKF has outperformed the EKF on all test cases and all the state variable levels (-24% average RMSE). The employed fusion technique show how outstanding is the improvement in tracking performance compared to the use of a single device (-29% RMES with lidar and -38% RMSE with radar).


Sensors ◽  
2019 ◽  
Vol 19 (20) ◽  
pp. 4434 ◽  
Author(s):  
Sangwon Kim ◽  
Jaeyeal Nam ◽  
Byoungchul Ko

Depth estimation is a crucial and fundamental problem in the computer vision field. Conventional methods re-construct scenes using feature points extracted from multiple images; however, these approaches require multiple images and thus are not easily implemented in various real-time applications. Moreover, the special equipment required by hardware-based approaches using 3D sensors is expensive. Therefore, software-based methods for estimating depth from a single image using machine learning or deep learning are emerging as new alternatives. In this paper, we propose an algorithm that generates a depth map in real time using a single image and an optimized lightweight efficient neural network (L-ENet) algorithm instead of physical equipment, such as an infrared sensor or multi-view camera. Because depth values have a continuous nature and can produce locally ambiguous results, pixel-wise prediction with ordinal depth range classification was applied in this study. In addition, in our method various convolution techniques are applied to extract a dense feature map, and the number of parameters is greatly reduced by reducing the network layer. By using the proposed L-ENet algorithm, an accurate depth map can be generated from a single image quickly and, in a comparison with the ground truth, we can produce depth values closer to those of the ground truth with small errors. Experiments confirmed that the proposed L-ENet can achieve a significantly improved estimation performance over the state-of-the-art algorithms in depth estimation based on a single image.


Leonardo ◽  
2014 ◽  
Vol 47 (4) ◽  
pp. 325-336 ◽  
Author(s):  
Patrick Lichty

From ARToolkit’s emergence in the 1990s to the emergence of augmented reality (AR) as an art medium in the 2010s, AR has developed as a number of evidential sites. As an extension of virtual media, it merges real-time pattern recognition with goggles (finally realizing William Gibson’s sci-fi fantasy) or handheld devices. This creates a welding of real-time media and virtual reality, or an optically registered simulation overlaid upon an actual spatial environment. Commercial applications are numerous, including entertainment, sales, and navigation. Even though AR-based works can be traced back to the late 1990s, AR work requires some understanding of coding and tethered imaging equipment. It was not until marker-based AR, affording lower entries to usage, as well as geo-locational AR-based media, using handheld devices and tablets, that augmented reality as an art medium would propagate. While one can argue that AR-based art is a convergence of handheld device art and virtual reality, there are intrinsic gestures specific to augmented reality that make it unique. The author looks at some historical examples of AR as well as critical issues of AR-based gestures such as compounding the gaze, problematizing the retinal, and the representational issues of informatic overlays. This generates four gestural vectors, analogous to those defined in “The Translation of Art in Virtual Worlds,” which is examined through case studies. From this, a visual theory of augmentation will be proposed.


2020 ◽  
Vol 6 (1) ◽  
Author(s):  
Lalith Sharan ◽  
Lukas Burger ◽  
Georgii Kostiuchik ◽  
Ivo Wolf ◽  
Matthias Karck ◽  
...  

AbstractIn endoscopy, depth estimation is a task that potentially helps in quantifying visual information for better scene understanding. A plethora of depth estimation algorithms have been proposed in the computer vision community. The endoscopic domain however, differs from the typical depth estimation scenario due to differences in the setup and nature of the scene. Furthermore, it is unfeasible to obtain ground truth depth information owing to an unsuitable detection range of off-the-shelf depth sensors and difficulties in setting up a depth-sensor in a surgical environment. In this paper, an existing self-supervised approach, called Monodepth [1], from the field of autonomous driving is applied to a novel dataset of stereo-endoscopic images from reconstructive mitral valve surgery. While it is already known that endoscopic scenes are more challenging than outdoor driving scenes, the paper performs experiments to quantify the comparison, and describe the domain gap and challenges involved in the transfer of these methods.


2021 ◽  
Vol 2 (5) ◽  
Author(s):  
Róbert-Adrian Rill ◽  
Kinga Bettina Faragó

AbstractAutonomous driving technologies, including monocular vision-based approaches, are in the forefront of industrial and research communities, since they are expected to have a significant impact on economy and society. However, they have limitations in terms of crash avoidance because of the rarity of labeled data for collisions in everyday traffic, as well as due to the complexity of driving situations. In this work, we propose a simple method based solely on monocular vision to overcome the data scarcity problem and to promote forward collision avoidance systems. We exploit state-of-the-art deep learning-based optical flow and monocular depth estimation methods, as well as object detection to estimate the speed of the ego-vehicle and to identify the lead vehicle, respectively. The proposed method utilizes car stop situations as collision surrogates to obtain data for time to collision estimation. We evaluate this approach on our own driving videos, collected using a spherical camera and smart glasses. Our results indicate that similar accuracy can be achieved on both video sources: the external road view from the car’s, and the ego-centric view from the driver’s perspective. Additionally, we set forth the possibility of using spherical cameras as opposed to traditional cameras for vision-based automotive sensing.


2021 ◽  
Vol 2 (4) ◽  
pp. 211-219
Author(s):  
Vinothkanna R

The motion planning framework is one of the challenging tasks in autonomous driving cars. During motion planning, predicting of trajectory is computed by Gaussian propagation. Recently, the localization uncertainty control will be estimating by Gaussian framework. This estimation suffers from real time constraint distribution for (Global Positioning System) GPS error. In this research article compared novel motion planning methods and concluding the suitable estimating algorithm depends on the two different real time traffic conditions. One is the realistic unusual traffic and complex target is another one. The real time platform is used to measure the several estimation methods for motion planning. Our research article is that comparing novel estimation methods in two different real time environments and an identifying better estimation method for that. Our suggesting idea is that the autonomous vehicle uncertainty control is estimating by modified version of action based coarse trajectory planning. Our suggesting framework permits the planner to avoid complex and unusual traffic (uncertainty condition) efficiently. Our proposed case studies offer to choose effectiveness framework for complex mode of surrounding environment.


2004 ◽  
Vol 04 (04) ◽  
pp. 627-651 ◽  
Author(s):  
RUIGANG YANG ◽  
MARC POLLEFEYS ◽  
HUA YANG ◽  
GREG WELCH

We present a new method for using commodity graphics hardware to achieve real-time, on-line, 2D view synthesis or 3D depth estimation from two or more calibrated cameras. Our method combines a 3D plane-sweeping approach with 2D multi-resolution color consistency tests. We project camera imagery onto each plane, compute measures of color consistency throughout the plane at multiple resolutions, and then choose the color or depth (corresponding plane) that is most consistent. The key to achieving real-time performance is our use of the advanced features included with recent commodity computer graphics hardware to implement the computations simultaneously (in parallel) across all reference image pixels on a plane. Our method is relatively simple to implement, and flexible in term of the number and placement of cameras. With two cameras and an NVIDIA GeForce4 graphics card we can achieve 50–70 M disparity evaluations per second, including image download and read-back overhead. This performance matches the fastest available commercial software-only implementation of correlation-based stereo algorithms, while freeing up the CPU for other uses.


Sign in / Sign up

Export Citation Format

Share Document