Fast depth extraction from a single image

Predicting depth from a single image is an important problem for understanding the 3-D geometry of a scene. Recently, the nonparametric depth sampling (DepthTransfer) has shown great potential in solving this problem, and its two key components are a Scale Invariant Feature Transform (SIFT) flow–based depth warping between the input image and its retrieved similar images and a pixel-wise depth fusion from all warped depth maps. In addition to the inherent heavy computational load in the SIFT flow computation even under a coarse-to-fine scheme, the fusion reliability is also low due to the low discriminativeness of pixel-wise description nature. This article aims at solving these two problems. First, a novel sparse SIFT flow algorithm is proposed to reduce the complexity from subquadratic to sublinear. Then, a reweighting technique is introduced where the variance of the SIFT flow descriptor is computed at every pixel and used for reweighting the data term in the conditional Markov random fields. Our proposed depth transfer method is tested on the Make3D Range Image Data and NYU Depth Dataset V2. It is shown that, with comparable depth estimation accuracy, our method is 2–3 times faster than the DepthTransfer.

Download Full-text

Single Image Depth Estimation With Normal Guided Scale Invariant Deep Convolutional Fields

IEEE Transactions on Circuits and Systems for Video Technology ◽

10.1109/tcsvt.2017.2772892 ◽

2019 ◽

Vol 29 (1) ◽

pp. 80-92 ◽

Cited By ~ 3

Author(s):

Han Yan ◽

Xin Yu ◽

Yu Zhang ◽

Shunli Zhang ◽

Xiaolin Zhao ◽

...

Keyword(s):

Depth Estimation ◽

Single Image ◽

Scale Invariant ◽

Image Depth

Download Full-text

Depth Estimation and Semantic Segmentation from a Single RGB Image Using a Hybrid Convolutional Neural Network

Sensors ◽

10.3390/s19081795 ◽

2019 ◽

Vol 19 (8) ◽

pp. 1795 ◽

Cited By ~ 5

Author(s):

Xiao Lin ◽

Dalila Sánchez-Escobedo ◽

Josep R. Casas ◽

Montse Pardàs

Keyword(s):

Neural Network ◽

Convolutional Neural Network ◽

State Of The Art ◽

Semantic Segmentation ◽

Depth Estimation ◽

Input Image ◽

Estimation Accuracy ◽

Single Task ◽

Qualitative And Quantitative ◽

Highly Correlated

Semantic segmentation and depth estimation are two important tasks in computer vision, and many methods have been developed to tackle them. Commonly these two tasks are addressed independently, but recently the idea of merging these two problems into a sole framework has been studied under the assumption that integrating two highly correlated tasks may benefit each other to improve the estimation accuracy. In this paper, depth estimation and semantic segmentation are jointly addressed using a single RGB input image under a unified convolutional neural network. We analyze two different architectures to evaluate which features are more relevant when shared by the two tasks and which features should be kept separated to achieve a mutual improvement. Likewise, our approaches are evaluated under two different scenarios designed to review our results versus single-task and multi-task methods. Qualitative and quantitative experiments demonstrate that the performance of our methodology outperforms the state of the art on single-task approaches, while obtaining competitive results compared with other multi-task methods.

Download Full-text

Single-Image Depth Inference Using Generative Adversarial Networks

Sensors ◽

10.3390/s19071708 ◽

2019 ◽

Vol 19 (7) ◽

pp. 1708 ◽

Cited By ~ 1

Author(s):

Daniel Stanley Tan ◽

Chih-Yuan Yao ◽

Conrado Ruiz ◽

Kai-Lung Hua

Keyword(s):

Smart Cities ◽

Depth Map ◽

Depth Estimation ◽

Input Image ◽

Generative Adversarial Networks ◽

Depth Information ◽

Single Image ◽

Neural Network Models ◽

Generative Adversarial Network ◽

Depth Sensors

Depth has been a valuable piece of information for perception tasks such as robot grasping, obstacle avoidance, and navigation, which are essential tasks for developing smart homes and smart cities. However, not all applications have the luxury of using depth sensors or multiple cameras to obtain depth information. In this paper, we tackle the problem of estimating the per-pixel depths from a single image. Inspired by the recent works on generative neural network models, we formulate the task of depth estimation as a generative task where we synthesize an image of the depth map from a single Red, Green, and Blue (RGB) input image. We propose a novel generative adversarial network that has an encoder-decoder type generator with residual transposed convolution blocks trained with an adversarial loss. Quantitative and qualitative experimental results demonstrate the effectiveness of our approach over several depth estimation works.

Download Full-text

A Semi-Supervised Monocular Stereo Matching Method

Symmetry ◽

10.3390/sym11050690 ◽

2019 ◽

Vol 11 (5) ◽

pp. 690

Author(s):

Zhimin Zhang ◽

Jianzhong Qiao ◽

Shukuan Lin

Keyword(s):

Stereo Matching ◽

Ground Truth ◽

Depth Estimation ◽

Estimation Methods ◽

Estimation Accuracy ◽

Stereo Pair ◽

Single Image ◽

Matching Method ◽

Matching Model ◽

Monocular Depth

Supervised monocular depth estimation methods based on learning have shown promising results compared with the traditional methods. However, these methods require a large number of high-quality corresponding ground truth depth data as supervision labels. Due to the limitation of acquisition equipment, it is expensive and impractical to record ground truth depth for different scenes. Compared to supervised methods, the self-supervised monocular depth estimation method without using ground truth depth is a promising research direction, but self-supervised depth estimation from a single image is geometrically ambiguous and suboptimal. In this paper, we propose a novel semi-supervised monocular stereo matching method based on existing approaches to improve the accuracy of depth estimation. This idea is inspired by the experimental results of the paper that the depth estimation accuracy of a stereo pair as input is better than that of a monocular view as input in the same self-supervised network model. Therefore, we decompose the monocular depth estimation problem into two sub-problems, a right view synthesized process followed by a semi-supervised stereo matching process. In order to improve the accuracy of the synthetic right view, we innovate beyond the existing view synthesis method Deep3D by adding a left-right consistency constraint and a smoothness constraint. To reduce the error caused by the reconstructed right view, we propose a semi-supervised stereo matching model that makes use of disparity maps generated by a self-supervised stereo matching model as the supervision cues and joint self-supervised cues to optimize the stereo matching network. In the test, the two networks are able to predict the depth map directly from a single image by pipeline connecting. Both procedures not only obey geometric principles, but also improve estimation accuracy. Test results on the KITTI dataset show that this method is superior to the current mainstream monocular self-supervised depth estimation methods under the same condition.

Download Full-text

Real-Time Single Image Depth Perception in the Wild with Handheld Devices

Sensors ◽

10.3390/s21010015 ◽

2020 ◽

Vol 21 (1) ◽

pp. 15

Author(s):

Filippo Aleotti ◽

Giulio Zaccaroni ◽

Luca Bartolomei ◽

Matteo Poggi ◽

Fabio Tosi ◽

...

Keyword(s):

Real Time ◽

Depth Perception ◽

Depth Estimation ◽

Autonomous Driving ◽

Estimation Methods ◽

Handheld Devices ◽

Single Image ◽

Handheld Device ◽

Time Performance ◽

In The Wild

Depth perception is paramount for tackling real-world problems, ranging from autonomous driving to consumer applications. For the latter, depth estimation from a single image would represent the most versatile solution since a standard camera is available on almost any handheld device. Nonetheless, two main issues limit the practical deployment of monocular depth estimation methods on such devices: (i) the low reliability when deployed in the wild and (ii) the resources needed to achieve real-time performance, often not compatible with low-power embedded systems. Therefore, in this paper, we deeply investigate all these issues, showing how they are both addressable by adopting appropriate network design and training strategies. Moreover, we also outline how to map the resulting networks on handheld devices to achieve real-time performance. Our thorough evaluation highlights the ability of such fast networks to generalize well to new environments, a crucial feature required to tackle the extremely varied contexts faced in real applications. Indeed, to further support this evidence, we report experimental results concerning real-time, depth-aware augmented reality and image blurring with smartphones in the wild.

Download Full-text

PDANet: Self-Supervised Monocular Depth Estimation Using Perceptual and Data Augmentation Consistency

Applied Sciences ◽

10.3390/app11125383 ◽

2021 ◽

Vol 11 (12) ◽

pp. 5383

Author(s):

Huachen Gao ◽

Xiaoyu Liu ◽

Meixia Qu ◽

Shijie Huang

Keyword(s):

Data Augmentation ◽

State Of The Art ◽

Depth Estimation ◽

Input Image ◽

Depth Information ◽

Disparity Map ◽

Estimation Model ◽

Absolute Relative Error ◽

Texture Region ◽

Monocular Depth

In recent studies, self-supervised learning methods have been explored for monocular depth estimation. They minimize the reconstruction loss of images instead of depth information as a supervised signal. However, existing methods usually assume that the corresponding points in different views should have the same color, which leads to unreliable unsupervised signals and ultimately damages the reconstruction loss during the training. Meanwhile, in the low texture region, it is unable to predict the disparity value of pixels correctly because of the small number of extracted features. To solve the above issues, we propose a network—PDANet—that integrates perceptual consistency and data augmentation consistency, which are more reliable unsupervised signals, into a regular unsupervised depth estimation model. Specifically, we apply a reliable data augmentation mechanism to minimize the loss of the disparity map generated by the original image and the augmented image, respectively, which will enhance the robustness of the image in the prediction of color fluctuation. At the same time, we aggregate the features of different layers extracted by a pre-trained VGG16 network to explore the higher-level perceptual differences between the input image and the generated one. Ablation studies demonstrate the effectiveness of each components, and PDANet shows high-quality depth estimation results on the KITTI benchmark, which optimizes the state-of-the-art method from 0.114 to 0.084, measured by absolute relative error for depth estimation.

Download Full-text

Sensor Model Based Preprocessing of 3-D Laser Range Image Data and Motion Oriented Feature Extraction for Mobile Robot Applications

IFAC Proceedings Volumes ◽

10.1016/s1474-6670(17)54624-8 ◽

1988 ◽

Vol 21 (16) ◽

pp. 285-291

Author(s):

G. Karl ◽

G. Schmidt

Keyword(s):

Feature Extraction ◽

Mobile Robot ◽

Image Data ◽

Range Image ◽

Laser Range ◽

Model Based ◽

Sensor Model

Download Full-text

A Framework for Leveraging Image Security in Cloud with Simultaneous Compression and Encryption Using Compressive Sensing

Revue d intelligence artificielle ◽

10.18280/ria.350110 ◽

2021 ◽

Vol 35 (1) ◽

pp. 85-91

Author(s):

Naga Raju Hari Manikyam ◽

Munisamy Shyamala Devi

Keyword(s):

Compressive Sensing ◽

Image Data ◽

Input Image ◽

Security And Privacy ◽

Cloud Infrastructure ◽

Security Framework ◽

Image Security ◽

Compression Performance ◽

Security Algorithm ◽

Simultaneous Sensing

In the contemporary era, technological innovations like cloud computing and Internet of Things (IoT) pave way for diversified applications producing multimedia content. Especially large volumes of image data, in medical and other domains, are produced. Cloud infrastructure is widely used to reap benefits such as scalability and availability. However, security and privacy of imagery is in jeopardy when outsourced it to cloud directly. Many compression and encryption techniques came into existence to improve performance and security. Nevertheless, in the wake of emergence of quantum computing in future, there is need for more secure means with multiple transformations of data. Compressive sensing (CS) used in existing methods to improve security. However, most of the schemes suffer from the problem of inability to perform compression and encryption simultaneously besides ending up with large key size. In this paper, we proposed a framework known as Cloud Image Security Framework (CISF) leveraging outsourced image security. The framework has an underlying algorithm known as Hybrid Image Security Algorithm (HISA). It is based on compressive sensing, simultaneous sensing and encryption besides random pixel exchange to ensure multiple transformations of input image. The empirical study revealed that the CISF is more effective, secure with acceptable compression performance over the state of the art methods.

Download Full-text

An Automatic Measurement Method for Absolute Depth of Objects in Two Monocular Images Based on SIFT Feature

10.20944/preprints201705.0028.v1 ◽

2017 ◽

Author(s):

Lixin He ◽

Jing Yang ◽

Bin Kong ◽

Can Wang

Keyword(s):

Automatic Measurement ◽

Depth Estimation ◽

Depth Information ◽

Scale Invariant ◽

Line Segments ◽

Sift Feature ◽

Novel Approach ◽

Straight Line ◽

Object Depth ◽

Scale Invariant Feature

It is one of very important and basic problem in compute vision field that recovering depth information of objects from two-dimensional images. In view of the shortcomings of existing methods of depth estimation, a novel approach based on SIFT (the Scale Invariant Feature Transform) is presented in this paper. The approach can estimate the depths of objects in two images which are captured by an un-calibrated ordinary monocular camera. In this approach, above all, the first image is captured. All of the camera parameters remain unchanged, and the second image is acquired after moving the camera a distance d along the optical axis. Then image segmentation and SIFT feature extraction are implemented on the two images separately, and objects in the images are matched. Lastly, an object depth can be computed by the lengths of a pair of straight line segments. In order to ensure that the best appropriate a pair of straight line segments are chose and reduce the computation, the theory of convex hull and the knowledge of triangle similarity are employed. The experimental results show our approach is effective and practical.

Download Full-text

DEEP LEARNING FOR MONOCULAR DEPTH ESTIMATION FROM UAV IMAGES

ISPRS Annals of Photogrammetry Remote Sensing and Spatial Information Sciences ◽

10.5194/isprs-annals-v-2-2020-451-2020 ◽

2020 ◽

Vol V-2-2020 ◽

pp. 451-458

Author(s):

L. Madhuanand ◽

F. Nex ◽

M. Y. Yang

Keyword(s):

Deep Learning ◽

Ground Level ◽

Depth Estimation ◽

Aerial Images ◽

Aerial Image ◽

Depth Information ◽

Single Image ◽

Monocular Depth ◽

Uav Images ◽

Image Depth

Abstract. Depth is an essential component for various scene understanding tasks and for reconstructing the 3D geometry of the scene. Estimating depth from stereo images requires multiple views of the same scene to be captured which is often not possible when exploring new environments with a UAV. To overcome this monocular depth estimation has been a topic of interest with the recent advancements in computer vision and deep learning techniques. This research has been widely focused on indoor scenes or outdoor scenes captured at ground level. Single image depth estimation from aerial images has been limited due to additional complexities arising from increased camera distance, wider area coverage with lots of occlusions. A new aerial image dataset is prepared specifically for this purpose combining Unmanned Aerial Vehicles (UAV) images covering different regions, features and point of views. The single image depth estimation is based on image reconstruction techniques which uses stereo images for learning to estimate depth from single images. Among the various available models for ground-level single image depth estimation, two models, 1) a Convolutional Neural Network (CNN) and 2) a Generative Adversarial model (GAN) are used to learn depth from aerial images from UAVs. These models generate pixel-wise disparity images which could be converted into depth information. The generated disparity maps from these models are evaluated for its internal quality using various error metrics. The results show higher disparity ranges with smoother images generated by CNN model and sharper images with lesser disparity range generated by GAN model. The produced disparity images are converted to depth information and compared with point clouds obtained using Pix4D. It is found that the CNN model performs better than GAN and produces depth similar to that of Pix4D. This comparison helps in streamlining the efforts to produce depth from a single aerial image.

Download Full-text