CNN Based Monocular Depth Estimation

In several applications, such as scene interpretation and reconstruction, precise depth measurement from images is a significant challenge. Current depth estimate techniques frequently provide fuzzy, low-resolution estimates. With the use of transfer learning, this research executes a convolutional neural network for generating a high-resolution depth map from a single RGB image. With a typical encoder-decoder architecture, when initializing the encoder, we use features extracted from high-performing pre-trained networks, as well as augmentation and training procedures that lead to more accurate outcomes. We demonstrate how, even with a very basic decoder, our approach can provide complete high-resolution depth maps. A wide number of deep learning approaches have recently been presented, and they have showed significant promise in dealing with the classical ill-posed issue. The studies are carried out using KITTI and NYU Depth v2, two widely utilized public datasets. We also examine the errors created by various models in order to expose the shortcomings of present approaches which accomplishes viable performance on KITTI besides NYU Depth v2.

Download Full-text

EfficientNet-B0 Based Monocular Dense-Depth Map Estimation

Traitement du signal ◽

10.18280/ts.380524 ◽

2021 ◽

Vol 38 (5) ◽

pp. 1485-1493

Author(s):

Yasasvy Tadepalli ◽

Meenakshi Kollati ◽

Swaraja Kuraparthi ◽

Padmavathi Kora

Keyword(s):

Depth Map ◽

Depth Estimation ◽

Model Parameters ◽

Map Estimation ◽

Bilinear Method ◽

Regression Problem ◽

Actual Error ◽

Ill Posed ◽

Monocular Image ◽

Monocular Depth

Monocular depth estimation is a hot research topic in autonomous car driving. Deep convolution neural networks (DCNN) comprising encoder and decoder with transfer learning are exploited in the proposed work for monocular depth map estimation of two-dimensional images. Extracted CNN features from initial stages are later upsampled using a sequence of Bilinear UpSampling and convolution layers to reconstruct the depth map. The encoder forms the feature extraction part, and the decoder forms the image reconstruction part. EfficientNetB0, a new architecture is used with pretrained weights as encoder. It is a revolutionary architecture with smaller model parameters yet achieving higher efficiencies than the architectures of state-of-the-art, pretrained networks. EfficientNet-B0 is compared with two other pretrained networks, the DenseNet-121 and ResNet50 models. Each of these three models are used in encoding stage for features extraction followed by bilinear method of UpSampling in the decoder. The Monocular image is an ill-posed problem and is thus considered as a regression problem. So the metrics used in the proposed work are F1-score, Jaccard score and Mean Actual Error (MAE) etc., between the original and the reconstructed image. The results convey that EfficientNet-B0 outperforms in validation loss, F1-score and Jaccard score compared to DenseNet-121 and ResNet-50 models.

Download Full-text

Monocular Depth Estimation using Transfer learning-An Overview

E3S Web of Conferences ◽

10.1051/e3sconf/202130901069 ◽

2021 ◽

Vol 309 ◽

pp. 01069

Author(s):

K. Swaraja ◽

V. Akshitha ◽

K. Pranav ◽

B. Vyshnavi ◽

V. Sai Akhil ◽

...

Keyword(s):

Deep Learning ◽

Transfer Learning ◽

Deep Neural Networks ◽

Depth Estimation ◽

Depth Information ◽

Learning Approaches ◽

Learning Network ◽

Depth Maps ◽

Ill Posed ◽

Monocular Depth

Depth estimation is a computer vision technique that is critical for autonomous schemes for sensing their surroundings and predict their own condition. Traditional estimating approaches, such as structure from motion besides stereo vision similarity, rely on feature communications from several views to provide depth information. In the meantime, the depth maps anticipated are scarce. Gathering depth information via monocular depth estimation is an ill-posed issue, according to a substantial corpus of deep learning approaches recently suggested. Estimation of Monocular depth with deep learning has gotten a lot of interest in current years, thanks to the fast expansion of deep neural networks, and numerous strategies have been developed to solve this issue. In this study, we want to give a comprehensive assessment of the methodologies often used in the estimation of monocular depth. The purpose of this study is to look at recent advances in deep learning-based estimation of monocular depth. To begin, we'll go through the various depth estimation techniques and datasets for monocular depth estimation. A complete overview of multiple deep learning methods that use transfer learning Network designs, including several combinations of encoders and decoders, is offered. In addition, multiple deep learning-based monocular depth estimation approaches and models are classified. Finally, the use of transfer learning approaches to monocular depth estimation is illustrated.

Download Full-text

Monocular Depth Estimation with Joint Attention Feature Distillation and Wavelet-Based Loss Function

Sensors ◽

10.3390/s21010054 ◽

2020 ◽

Vol 21 (1) ◽

pp. 54

Author(s):

Peng Liu ◽

Zonghua Zhang ◽

Zhaozong Meng ◽

Nan Gao

Keyword(s):

Joint Attention ◽

Loss Function ◽

Depth Estimation ◽

Depth Information ◽

3D Vision ◽

Network Training ◽

Crucial Component ◽

Benchmark Datasets ◽

Ill Posed ◽

Monocular Depth

Depth estimation is a crucial component in many 3D vision applications. Monocular depth estimation is gaining increasing interest due to flexible use and extremely low system requirements, but inherently ill-posed and ambiguous characteristics still cause unsatisfactory estimation results. This paper proposes a new deep convolutional neural network for monocular depth estimation. The network applies joint attention feature distillation and wavelet-based loss function to recover the depth information of a scene. Two improvements were achieved, compared with previous methods. First, we combined feature distillation and joint attention mechanisms to boost feature modulation discrimination. The network extracts hierarchical features using a progressive feature distillation and refinement strategy and aggregates features using a joint attention operation. Second, we adopted a wavelet-based loss function for network training, which improves loss function effectiveness by obtaining more structural details. The experimental results on challenging indoor and outdoor benchmark datasets verified the proposed method’s superiority compared with current state-of-the-art methods.

Download Full-text

Monocular Depth Estimation Based on Multi-Scale Depth Map Fusion

IEEE Access ◽

10.1109/access.2021.3076346 ◽

2021 ◽

pp. 1-1

Author(s):

Xin Yang ◽

Qingling Chang ◽

Xinglin Liu ◽

Siyuan He ◽

Yan Cui

Keyword(s):

Depth Map ◽

Depth Estimation ◽

Multi Scale ◽

Monocular Depth

Download Full-text

Deep Learning-Based Monocular Depth Estimation Methods—A State-of-the-Art Review

Sensors ◽

10.3390/s20082272 ◽

2020 ◽

Vol 20 (8) ◽

pp. 2272 ◽

Cited By ~ 5

Author(s):

Faisal Khan ◽

Saqib Salahuddin ◽

Hossein Javidnia

Keyword(s):

Deep Learning ◽

State Of The Art ◽

Research Work ◽

Depth Estimation ◽

Autonomous Driving ◽

Estimation Methods ◽

Future Research ◽

Comprehensive Overview ◽

Ill Posed ◽

Monocular Depth

Monocular depth estimation from Red-Green-Blue (RGB) images is a well-studied ill-posed problem in computer vision which has been investigated intensively over the past decade using Deep Learning (DL) approaches. The recent approaches for monocular depth estimation mostly rely on Convolutional Neural Networks (CNN). Estimating depth from two-dimensional images plays an important role in various applications including scene reconstruction, 3D object-detection, robotics and autonomous driving. This survey provides a comprehensive overview of this research topic including the problem representation and a short description of traditional methods for depth estimation. Relevant datasets and 13 state-of-the-art deep learning-based approaches for monocular depth estimation are reviewed, evaluated and discussed. We conclude this paper with a perspective towards future research work requiring further investigation in monocular depth estimation challenges.

Download Full-text

BridgeNet: A Joint Learning Network of Depth Map Super-Resolution and Monocular Depth Estimation

10.1145/3474085.3475373 ◽

2021 ◽

Author(s):

Qi Tang ◽

Runmin Cong ◽

Ronghui Sheng ◽

Lingzhi He ◽

Dan Zhang ◽

...

Keyword(s):

Super Resolution ◽

Depth Map ◽

Depth Estimation ◽

Joint Learning ◽

Learning Network ◽

Monocular Depth

Download Full-text

Integrating Sensor Models in Deep Learning Boosts Performance: Application to Monocular Depth Estimation in Warehouse Automation

Sensors ◽

10.3390/s21041437 ◽

2021 ◽

Vol 21 (4) ◽

pp. 1437

Author(s):

Ryota Yoneyama ◽

Angel J. Duran ◽

Angel P. del Pobil

Keyword(s):

Deep Learning ◽

Robot Vision ◽

Active Vision ◽

Depth Estimation ◽

Learning Performance ◽

Learning Approaches ◽

Standard Data ◽

Consumption Cost ◽

Sensor Models ◽

Monocular Depth

Deep learning is the mainstream paradigm in computer vision and machine learning, but performance is usually not as good as expected when used for applications in robot vision. The problem is that robot sensing is inherently active, and often, relevant data is scarce for many application domains. This calls for novel deep learning approaches that can offer a good performance at a lower data consumption cost. We address here monocular depth estimation in warehouse automation with new methods and three different deep architectures. Our results suggest that the incorporation of sensor models and prior knowledge relative to robotic active vision, can consistently improve the results and learning performance from fewer than usual training samples, as compared to standard data-driven deep learning.

Download Full-text

Unsupervised Monocular Depth Estimation Based on Residual Neural Network of Coarse–Refined Feature Extractions for Drone

Electronics ◽

10.3390/electronics8101179 ◽

2019 ◽

Vol 8 (10) ◽

pp. 1179 ◽

Cited By ~ 1

Author(s):

Tao Huang ◽

Shuanfeng Zhao ◽

Longlong Geng ◽

Qian Xu

Keyword(s):

Neural Network ◽

Image Reconstruction ◽

Depth Map ◽

Ground Truth ◽

Depth Estimation ◽

Input Image ◽

Superior Performance ◽

Estimation Methods ◽

Depth Information ◽

Monocular Depth

To take full advantage of the information of images captured by drones and given that most existing monocular depth estimation methods based on supervised learning require vast quantities of corresponding ground truth depth data for training, the model of unsupervised monocular depth estimation based on residual neural network of coarse–refined feature extractions for drone is therefore proposed. As a virtual camera is introduced through a deep residual convolution neural network based on coarse–refined feature extractions inspired by the principle of binocular depth estimation, the unsupervised monocular depth estimation has become an image reconstruction problem. To improve the performance of our model for monocular depth estimation, the following innovations are proposed. First, the pyramid processing for input image is proposed to build the topological relationship between the resolution of input image and the depth of input image, which can improve the sensitivity of depth information from a single image and reduce the impact of input image resolution on depth estimation. Second, the residual neural network of coarse–refined feature extractions for corresponding image reconstruction is designed to improve the accuracy of feature extraction and solve the contradiction between the calculation time and the numbers of network layers. In addition, to predict high detail output depth maps, the long skip connections between corresponding layers in the neural network of coarse feature extractions and deconvolution neural network of refined feature extractions are designed. Third, the loss of corresponding image reconstruction based on the structural similarity index (SSIM), the loss of approximate disparity smoothness and the loss of depth map are united as a novel training loss to better train our model. The experimental results show that our model has superior performance on the KITTI dataset composed by corresponding left view and right view and Make3D dataset composed by image and corresponding ground truth depth map compared to the state-of-the-art monocular depth estimation methods and basically meet the requirements for depth information of images captured by drones when our model is trained on KITTI.

Download Full-text

SFA-MDEN: Semantic-Feature-Aided Monocular Depth Estimation Network Using Dual Branches

Sensors ◽

10.3390/s21165476 ◽

2021 ◽

Vol 21 (16) ◽

pp. 5476

Author(s):

Rui Wang ◽

Jialing Zou ◽

James Zhiqing Wen

Keyword(s):

Network Architecture ◽

Semantic Segmentation ◽

Depth Estimation ◽

Semantic Feature ◽

Semantic Features ◽

Feature Maps ◽

Task Learning ◽

Vision Sensors ◽

Monocular Depth ◽

Public Datasets

Monocular depth estimation based on unsupervised learning has attracted great attention due to the rising demand for lightweight monocular vision sensors. Inspired by multi-task learning, semantic information has been used to improve the monocular depth estimation models. However, multi-task learning is still limited by multi-type annotations. As far as we know, there are scarcely any large public datasets that provide all the necessary information. Therefore, we propose a novel network architecture Semantic-Feature-Aided Monocular Depth Estimation Network (SFA-MDEN) to extract multi-resolution depth features and semantic features, which are merged and fed into the decoder, with the goal of predicting depth with the support of semantics. Instead of using loss functions to relate the semantics and depth, the fusion of feature maps for semantics and depth is employed to predict the monocular depth. Therefore, two accessible datasets with similar topics for depth estimation and semantic segmentation can meet the requirements of SFA-MDEN for training sets. We explored the performance of the proposed SFA-MDEN with experiments on different datasets, including KITTI, Make3D, and our own dataset BHDE-v1. The experimental results demonstrate that SFA-MDEN achieves competitive accuracy and generalization capacity compared to state-of-the-art methods.

Download Full-text

Unconformity surface architecture of the northeast Thelon Basin, Nunavut, derived from integration of magnetic source depth estimates

Interpretation ◽

10.1190/int-2014-0001.1 ◽

2014 ◽

Vol 2 (4) ◽

pp. SJ117-SJ132 ◽

Cited By ~ 6

Author(s):

Victoria Tschirhart ◽

William A. Morris ◽

Charles W. Jefferson

Keyword(s):

High Resolution ◽

Seismic Refraction ◽

Depth Estimation ◽

Uranium Deposits ◽

Aeromagnetic Data ◽

Source Depth ◽

Depth Estimate ◽

Thelon Basin ◽

Individual Source ◽

Unconformity Surface

Exploration for unconformity-associated uranium deposits requires detailed 3D knowledge of the depth and morphology of the unconformity surface. Modifications of the unconformity surface by reactivated intersecting faults and favorable basement lithology are key parameters when attempting to vector toward potential deposits. In the absence of seismic reflection and closely spaced drill data, high-resolution aeromagnetic data can provide surprisingly detailed 3D constraints through the use of source depth routines. Such routines are applied to the northeastern part of the Thelon Basin, termed the Aberdeen Subbasin, in Nunavut. This region is considered prospective for unconformity-associated uranium deposits. Deposits have so far been discovered adjacent to the subbasin where they are hosted by structurally complex Neoarchean and early Paleoproterozoic supracrustal rocks. We determined the morphology of two unconformity surfaces by combining the outputs from multiple analyses of high-resolution aeromagnetic data: three semiautomated depth estimation routines (Werner deconvolution, Euler deconvolution, and source parameter imaging) and two potential field inversion procedures. Confidence in depth estimates was increased by combining the output of individual source depth algorithms. Results were integrated with previously mapped fault displacements, seismic refraction profiles, boreholes, and outcrop geology around the subbasin perimeter. An integrated pseudo-3D source depth estimate of the unconformity surface is presented as twenty-three north–south profiles. The revised model of the upper unconformity surface, the base of the Thelon Formation, shows a complex set of stepped blocks bounded by four major intersecting fault arrays with approximate offsets ranging from tens to hundreds of meters.

Download Full-text