Deep Learning-Based Monocular Depth Estimation Methods—A State-of-the-Art Review

Faisal Khan; Saqib Salahuddin; Hossein Javidnia

doi:10.3390/s20082272

Deep Learning-Based Monocular Depth Estimation Methods—A State-of-the-Art Review

Sensors ◽

10.3390/s20082272 ◽

2020 ◽

Vol 20 (8) ◽

pp. 2272 ◽

Cited By ~ 5

Author(s):

Faisal Khan ◽

Saqib Salahuddin ◽

Hossein Javidnia

Keyword(s):

Deep Learning ◽

State Of The Art ◽

Research Work ◽

Depth Estimation ◽

Autonomous Driving ◽

Estimation Methods ◽

Future Research ◽

Comprehensive Overview ◽

Ill Posed ◽

Monocular Depth

Monocular depth estimation from Red-Green-Blue (RGB) images is a well-studied ill-posed problem in computer vision which has been investigated intensively over the past decade using Deep Learning (DL) approaches. The recent approaches for monocular depth estimation mostly rely on Convolutional Neural Networks (CNN). Estimating depth from two-dimensional images plays an important role in various applications including scene reconstruction, 3D object-detection, robotics and autonomous driving. This survey provides a comprehensive overview of this research topic including the problem representation and a short description of traditional methods for depth estimation. Relevant datasets and 13 state-of-the-art deep learning-based approaches for monocular depth estimation are reviewed, evaluated and discussed. We conclude this paper with a perspective towards future research work requiring further investigation in monocular depth estimation challenges.

Collision Avoidance Using Deep Learning-Based Monocular Vision

SN Computer Science ◽

10.1007/s42979-021-00759-6 ◽

2021 ◽

Vol 2 (5) ◽

Author(s):

Róbert-Adrian Rill ◽

Kinga Bettina Faragó

Keyword(s):

Deep Learning ◽

Collision Avoidance ◽

Depth Estimation ◽

Autonomous Driving ◽

Monocular Vision ◽

Estimation Methods ◽

Simple Method ◽

Crash Avoidance ◽

Monocular Depth ◽

Similar Accuracy

AbstractAutonomous driving technologies, including monocular vision-based approaches, are in the forefront of industrial and research communities, since they are expected to have a significant impact on economy and society. However, they have limitations in terms of crash avoidance because of the rarity of labeled data for collisions in everyday traffic, as well as due to the complexity of driving situations. In this work, we propose a simple method based solely on monocular vision to overcome the data scarcity problem and to promote forward collision avoidance systems. We exploit state-of-the-art deep learning-based optical flow and monocular depth estimation methods, as well as object detection to estimate the speed of the ego-vehicle and to identify the lead vehicle, respectively. The proposed method utilizes car stop situations as collision surrogates to obtain data for time to collision estimation. We evaluate this approach on our own driving videos, collected using a spherical camera and smart glasses. Our results indicate that similar accuracy can be achieved on both video sources: the external road view from the car’s, and the ego-centric view from the driver’s perspective. Additionally, we set forth the possibility of using spherical cameras as opposed to traditional cameras for vision-based automotive sensing.

A new Evaluation Approach for Deep Learning-based Monocular Depth Estimation Methods

2020 IEEE 23rd International Conference on Intelligent Transportation Systems (ITSC) ◽

10.1109/itsc45102.2020.9294620 ◽

2020 ◽

Author(s):

Antoine Mauri ◽

Redouane Khemmar ◽

Remi Boutteau ◽

Benoit Decoux ◽

Jean-Yves Ertaud ◽

...

Keyword(s):

Deep Learning ◽

Depth Estimation ◽

Estimation Methods ◽

Evaluation Approach ◽

Monocular Depth

Real-Time Semantic Image Segmentation with Deep Learning for Autonomous Driving: A Survey

Applied Sciences ◽

10.3390/app11198802 ◽

2021 ◽

Vol 11 (19) ◽

pp. 8802

Author(s):

Ilias Papadeas ◽

Lazaros Tsochatzidis ◽

Angelos Amanatiadis ◽

Ioannis Pratikakis

Keyword(s):

Image Segmentation ◽

Deep Learning ◽

Real Time ◽

Current Trend ◽

Autonomous Driving ◽

Evaluation Framework ◽

Future Research ◽

Comprehensive Overview ◽

Semantic Image Segmentation ◽

Recent Developments

Semantic image segmentation for autonomous driving is a challenging task due to its requirement for both effectiveness and efficiency. Recent developments in deep learning have demonstrated important performance boosting in terms of accuracy. In this paper, we present a comprehensive overview of the state-of-the-art semantic image segmentation methods using deep-learning techniques aiming to operate in real time so that can efficiently support an autonomous driving scenario. To this end, the presented overview puts a particular emphasis on the presentation of all those approaches which permit inference time reduction, while an analysis of the existing methods is addressed by taking into account their end-to-end functionality, as well as a comparative study that relies upon a consistent evaluation framework. Finally, a fruitful discussion is presented that provides key insights for the current trend and future research directions in real-time semantic image segmentation with deep learning for autonomous driving.

Monocular Depth Estimation using Transfer learning-An Overview

E3S Web of Conferences ◽

10.1051/e3sconf/202130901069 ◽

2021 ◽

Vol 309 ◽

pp. 01069

Author(s):

K. Swaraja ◽

V. Akshitha ◽

K. Pranav ◽

B. Vyshnavi ◽

V. Sai Akhil ◽

...

Keyword(s):

Deep Learning ◽

Transfer Learning ◽

Deep Neural Networks ◽

Depth Estimation ◽

Depth Information ◽

Learning Approaches ◽

Learning Network ◽

Depth Maps ◽

Ill Posed ◽

Monocular Depth

Depth estimation is a computer vision technique that is critical for autonomous schemes for sensing their surroundings and predict their own condition. Traditional estimating approaches, such as structure from motion besides stereo vision similarity, rely on feature communications from several views to provide depth information. In the meantime, the depth maps anticipated are scarce. Gathering depth information via monocular depth estimation is an ill-posed issue, according to a substantial corpus of deep learning approaches recently suggested. Estimation of Monocular depth with deep learning has gotten a lot of interest in current years, thanks to the fast expansion of deep neural networks, and numerous strategies have been developed to solve this issue. In this study, we want to give a comprehensive assessment of the methodologies often used in the estimation of monocular depth. The purpose of this study is to look at recent advances in deep learning-based estimation of monocular depth. To begin, we'll go through the various depth estimation techniques and datasets for monocular depth estimation. A complete overview of multiple deep learning methods that use transfer learning Network designs, including several combinations of encoders and decoders, is offered. In addition, multiple deep learning-based monocular depth estimation approaches and models are classified. Finally, the use of transfer learning approaches to monocular depth estimation is illustrated.

Efficient Monocular Depth Estimation with Transfer Feature Enhancement

International Journal of Circuits, Systems and Signal Processing ◽

10.46300/9106.2021.15.127 ◽

2021 ◽

Vol 15 ◽

pp. 1165-1173

Author(s):

Ming Yin

Keyword(s):

State Of The Art ◽

Depth Estimation ◽

Kernel Size ◽

Training Time ◽

Lower Error Rate ◽

Proposed Model ◽

Ill Posed ◽

Monocular Image ◽

Monocular Depth ◽

End Depth

Estimating the depth of the scene from a monocular image is an essential step for image semantic understanding. Practically, some existing methods for this highly ill-posed issue are still in lack of robustness and efficiency. This paper proposes a novel end-to-end depth esti- mation model with skip connections from a pre- trained Xception model for dense feature extrac- tion, and three new modules are designed to im- prove the upsampling process. In addition, ELU activation and convolutions with smaller kernel size are added to improve the pixel-wise regres- sion process. The experimental results show that our model has fewer network parameters, a lower error rate than the most advanced networks and requires only half the training time. The evalu- ation is based on the NYU v2 dataset, and our proposed model can achieve clearer boundary de- tails with state-of-the-art effects and robustness.

Unsupervised Monocular Depth Estimation for Autonomous Driving

Proceedings of the International Display Workshops ◽

10.36463/idw.2019.3dsap2_3dp2-2 ◽

2019 ◽

pp. 128

Author(s):

Chih-Shuan Huang ◽

Wan-Nung Tsung ◽

Wei-Jong Yang ◽

Chin-Hsing Chen

Keyword(s):

Depth Estimation ◽

Autonomous Driving ◽

Monocular Depth

Monocular Depth Estimation with Joint Attention Feature Distillation and Wavelet-Based Loss Function

Sensors ◽

10.3390/s21010054 ◽

2020 ◽

Vol 21 (1) ◽

pp. 54

Author(s):

Peng Liu ◽

Zonghua Zhang ◽

Zhaozong Meng ◽

Nan Gao

Keyword(s):

Joint Attention ◽

Loss Function ◽

Depth Estimation ◽

Depth Information ◽

3D Vision ◽

Network Training ◽

Crucial Component ◽

Benchmark Datasets ◽

Ill Posed ◽

Monocular Depth

Depth estimation is a crucial component in many 3D vision applications. Monocular depth estimation is gaining increasing interest due to flexible use and extremely low system requirements, but inherently ill-posed and ambiguous characteristics still cause unsatisfactory estimation results. This paper proposes a new deep convolutional neural network for monocular depth estimation. The network applies joint attention feature distillation and wavelet-based loss function to recover the depth information of a scene. Two improvements were achieved, compared with previous methods. First, we combined feature distillation and joint attention mechanisms to boost feature modulation discrimination. The network extracts hierarchical features using a progressive feature distillation and refinement strategy and aggregates features using a joint attention operation. Second, we adopted a wavelet-based loss function for network training, which improves loss function effectiveness by obtaining more structural details. The experimental results on challenging indoor and outdoor benchmark datasets verified the proposed method’s superiority compared with current state-of-the-art methods.

Real-Time Single Image Depth Perception in the Wild with Handheld Devices

Sensors ◽

10.3390/s21010015 ◽

2020 ◽

Vol 21 (1) ◽

pp. 15

Author(s):

Filippo Aleotti ◽

Giulio Zaccaroni ◽

Luca Bartolomei ◽

Matteo Poggi ◽

Fabio Tosi ◽

...

Keyword(s):

Real Time ◽

Depth Perception ◽

Depth Estimation ◽

Autonomous Driving ◽

Estimation Methods ◽

Handheld Devices ◽

Single Image ◽

Handheld Device ◽

Time Performance ◽

In The Wild

Depth perception is paramount for tackling real-world problems, ranging from autonomous driving to consumer applications. For the latter, depth estimation from a single image would represent the most versatile solution since a standard camera is available on almost any handheld device. Nonetheless, two main issues limit the practical deployment of monocular depth estimation methods on such devices: (i) the low reliability when deployed in the wild and (ii) the resources needed to achieve real-time performance, often not compatible with low-power embedded systems. Therefore, in this paper, we deeply investigate all these issues, showing how they are both addressable by adopting appropriate network design and training strategies. Moreover, we also outline how to map the resulting networks on handheld devices to achieve real-time performance. Our thorough evaluation highlights the ability of such fast networks to generalize well to new environments, a crucial feature required to tackle the extremely varied contexts faced in real applications. Indeed, to further support this evidence, we report experimental results concerning real-time, depth-aware augmented reality and image blurring with smartphones in the wild.

PDANet: Self-Supervised Monocular Depth Estimation Using Perceptual and Data Augmentation Consistency

Applied Sciences ◽

10.3390/app11125383 ◽

2021 ◽

Vol 11 (12) ◽

pp. 5383

Author(s):

Huachen Gao ◽

Xiaoyu Liu ◽

Meixia Qu ◽

Shijie Huang

Keyword(s):

Data Augmentation ◽

State Of The Art ◽

Depth Estimation ◽

Input Image ◽

Depth Information ◽

Disparity Map ◽

Estimation Model ◽

Absolute Relative Error ◽

Texture Region ◽

Monocular Depth

In recent studies, self-supervised learning methods have been explored for monocular depth estimation. They minimize the reconstruction loss of images instead of depth information as a supervised signal. However, existing methods usually assume that the corresponding points in different views should have the same color, which leads to unreliable unsupervised signals and ultimately damages the reconstruction loss during the training. Meanwhile, in the low texture region, it is unable to predict the disparity value of pixels correctly because of the small number of extracted features. To solve the above issues, we propose a network—PDANet—that integrates perceptual consistency and data augmentation consistency, which are more reliable unsupervised signals, into a regular unsupervised depth estimation model. Specifically, we apply a reliable data augmentation mechanism to minimize the loss of the disparity map generated by the original image and the augmented image, respectively, which will enhance the robustness of the image in the prediction of color fluctuation. At the same time, we aggregate the features of different layers extracted by a pre-trained VGG16 network to explore the higher-level perceptual differences between the input image and the generated one. Ablation studies demonstrate the effectiveness of each components, and PDANet shows high-quality depth estimation results on the KITTI benchmark, which optimizes the state-of-the-art method from 0.114 to 0.084, measured by absolute relative error for depth estimation.

Flying Free: A Research Overview of Deep Learning in Drone Navigation Autonomy

Drones ◽

10.3390/drones5020052 ◽

2021 ◽

Vol 5 (2) ◽

pp. 52

Author(s):

Thomas Lee ◽

Susan Mckeever ◽

Jane Courtney

Keyword(s):

Deep Learning ◽

Research Work ◽

Learning Approaches ◽

Clear Definition ◽

Top Down ◽

Research Activity ◽

Comprehensive Overview ◽

The Past ◽

Computer Vision Applications ◽

Definition Of

With the rise of Deep Learning approaches in computer vision applications, significant strides have been made towards vehicular autonomy. Research activity in autonomous drone navigation has increased rapidly in the past five years, and drones are moving fast towards the ultimate goal of near-complete autonomy. However, while much work in the area focuses on specific tasks in drone navigation, the contribution to the overall goal of autonomy is often not assessed, and a comprehensive overview is needed. In this work, a taxonomy of drone navigation autonomy is established by mapping the definitions of vehicular autonomy levels, as defined by the Society of Automotive Engineers, to specific drone tasks in order to create a clear definition of autonomy when applied to drones. A top–down examination of research work in the area is conducted, focusing on drone navigation tasks, in order to understand the extent of research activity in each area. Autonomy levels are cross-checked against the drone navigation tasks addressed in each work to provide a framework for understanding the trajectory of current research. This work serves as a guide to research in drone autonomy with a particular focus on Deep Learning-based solutions, indicating key works and areas of opportunity for development of this area in the future.