Depth Estimation From a Single Image Using Deep Learned Phase Coded Mask

Depth perception is paramount for tackling real-world problems, ranging from autonomous driving to consumer applications. For the latter, depth estimation from a single image would represent the most versatile solution since a standard camera is available on almost any handheld device. Nonetheless, two main issues limit the practical deployment of monocular depth estimation methods on such devices: (i) the low reliability when deployed in the wild and (ii) the resources needed to achieve real-time performance, often not compatible with low-power embedded systems. Therefore, in this paper, we deeply investigate all these issues, showing how they are both addressable by adopting appropriate network design and training strategies. Moreover, we also outline how to map the resulting networks on handheld devices to achieve real-time performance. Our thorough evaluation highlights the ability of such fast networks to generalize well to new environments, a crucial feature required to tackle the extremely varied contexts faced in real applications. Indeed, to further support this evidence, we report experimental results concerning real-time, depth-aware augmented reality and image blurring with smartphones in the wild.

Download Full-text

DEEP LEARNING FOR MONOCULAR DEPTH ESTIMATION FROM UAV IMAGES

ISPRS Annals of Photogrammetry Remote Sensing and Spatial Information Sciences ◽

10.5194/isprs-annals-v-2-2020-451-2020 ◽

2020 ◽

Vol V-2-2020 ◽

pp. 451-458

Author(s):

L. Madhuanand ◽

F. Nex ◽

M. Y. Yang

Keyword(s):

Deep Learning ◽

Ground Level ◽

Depth Estimation ◽

Aerial Images ◽

Aerial Image ◽

Depth Information ◽

Single Image ◽

Monocular Depth ◽

Uav Images ◽

Image Depth

Abstract. Depth is an essential component for various scene understanding tasks and for reconstructing the 3D geometry of the scene. Estimating depth from stereo images requires multiple views of the same scene to be captured which is often not possible when exploring new environments with a UAV. To overcome this monocular depth estimation has been a topic of interest with the recent advancements in computer vision and deep learning techniques. This research has been widely focused on indoor scenes or outdoor scenes captured at ground level. Single image depth estimation from aerial images has been limited due to additional complexities arising from increased camera distance, wider area coverage with lots of occlusions. A new aerial image dataset is prepared specifically for this purpose combining Unmanned Aerial Vehicles (UAV) images covering different regions, features and point of views. The single image depth estimation is based on image reconstruction techniques which uses stereo images for learning to estimate depth from single images. Among the various available models for ground-level single image depth estimation, two models, 1) a Convolutional Neural Network (CNN) and 2) a Generative Adversarial model (GAN) are used to learn depth from aerial images from UAVs. These models generate pixel-wise disparity images which could be converted into depth information. The generated disparity maps from these models are evaluated for its internal quality using various error metrics. The results show higher disparity ranges with smoother images generated by CNN model and sharper images with lesser disparity range generated by GAN model. The produced disparity images are converted to depth information and compared with point clouds obtained using Pix4D. It is found that the CNN model performs better than GAN and produces depth similar to that of Pix4D. This comparison helps in streamlining the efforts to produce depth from a single aerial image.

Download Full-text

Single image depth estimation using joint local-global features

2016 23rd International Conference on Pattern Recognition (ICPR) ◽

10.1109/icpr.2016.7899721 ◽

2016 ◽

Cited By ~ 2

Author(s):

H. Mohaghegh ◽

N. Karimi ◽

S.M.R. Soroushmehr ◽

S. Samavi ◽

K. Najarian

Keyword(s):

Depth Estimation ◽

Single Image ◽

Global Features ◽

Image Depth

Download Full-text

UW-GAN: Single Image Depth Estimation and Image Enhancement for Underwater Images

IEEE Transactions on Instrumentation and Measurement ◽

10.1109/tim.2021.3120130 ◽

2021 ◽

pp. 1-1

Author(s):

Praful Hambarde ◽

Subrahmanyam Murala ◽

Abhinav Dhall

Keyword(s):

Image Enhancement ◽

Depth Estimation ◽

Single Image ◽

Image Depth

Download Full-text

Latent 3D Volume for Joint Depth Estimation and Semantic Segmentation from a Single Image

Sensors ◽

10.3390/s20205765 ◽

2020 ◽

Vol 20 (20) ◽

pp. 5765 ◽

Cited By ~ 1

Author(s):

Seiya Ito ◽

Naoshi Kaneko ◽

Kazuhiko Sumi

Keyword(s):

3D Structure ◽

Three Dimensional ◽

Semantic Segmentation ◽

Depth Estimation ◽

Image Features ◽

Feature Representation ◽

Single Image ◽

Feature Vectors ◽

3D Space ◽

3D Volume

This paper proposes a novel 3D representation, namely, a latent 3D volume, for joint depth estimation and semantic segmentation. Most previous studies encoded an input scene (typically given as a 2D image) into a set of feature vectors arranged over a 2D plane. However, considering the real world is three-dimensional, this 2D arrangement reduces one dimension and may limit the capacity of feature representation. In contrast, we examine the idea of arranging the feature vectors in 3D space rather than in a 2D plane. We refer to this 3D volumetric arrangement as a latent 3D volume. We will show that the latent 3D volume is beneficial to the tasks of depth estimation and semantic segmentation because these tasks require an understanding of the 3D structure of the scene. Our network first constructs an initial 3D volume using image features and then generates latent 3D volume by passing the initial 3D volume through several 3D convolutional layers. We apply depth regression and semantic segmentation by projecting the latent 3D volume onto a 2D plane. The evaluation results show that our method outperforms previous approaches on the NYU Depth v2 dataset.

Download Full-text

High Level 3D Structure Extraction from a Single Image Using a CNN-Based Approach

Sensors ◽

10.3390/s19030563 ◽

2019 ◽

Vol 19 (3) ◽

pp. 563 ◽

Cited By ~ 3

Author(s):

J. Osuna-Coutiño ◽

Jose Martinez-Carranza

Keyword(s):

Level Structure ◽

3D Structure ◽

Depth Estimation ◽

Point Clouds ◽

Graph Analysis ◽

Single Image ◽

3D Data ◽

3D Elements ◽

High Level ◽

Structure Extraction

High-Level Structure (HLS) extraction in a set of images consists of recognizing 3D elements with useful information to the user or application. There are several approaches to HLS extraction. However, most of these approaches are based on processing two or more images captured from different camera views or on processing 3D data in the form of point clouds extracted from the camera images. In contrast and motivated by the extensive work developed for the problem of depth estimation in a single image, where parallax constraints are not required, in this work, we propose a novel methodology towards HLS extraction from a single image with promising results. For that, our method has four steps. First, we use a CNN to predict the depth for a single image. Second, we propose a region-wise analysis to refine depth estimates. Third, we introduce a graph analysis to segment the depth in semantic orientations aiming at identifying potential HLS. Finally, the depth sections are provided to a new CNN architecture that predicts HLS in the shape of cubes and rectangular parallelepipeds.

Download Full-text