Unsupervised Monocular Depth Estimation for Autonomous Driving

Monocular depth estimation from Red-Green-Blue (RGB) images is a well-studied ill-posed problem in computer vision which has been investigated intensively over the past decade using Deep Learning (DL) approaches. The recent approaches for monocular depth estimation mostly rely on Convolutional Neural Networks (CNN). Estimating depth from two-dimensional images plays an important role in various applications including scene reconstruction, 3D object-detection, robotics and autonomous driving. This survey provides a comprehensive overview of this research topic including the problem representation and a short description of traditional methods for depth estimation. Relevant datasets and 13 state-of-the-art deep learning-based approaches for monocular depth estimation are reviewed, evaluated and discussed. We conclude this paper with a perspective towards future research work requiring further investigation in monocular depth estimation challenges.

Download Full-text

Unsupervised Monocular Depth Estimation for Autonomous Driving

Proceedings of the International Display Workshops ◽

10.36463/idw.2019.0128 ◽

2019 ◽

pp. 128

Author(s):

Chih-Shuan Huang ◽

Wan-Nung Tsung ◽

Wei-Jong Yang ◽

Chin-Hsing Chen

Keyword(s):

Depth Estimation ◽

Autonomous Driving ◽

Monocular Depth

Download Full-text

SemanticDepth: Fusing Semantic Segmentation and Monocular Depth Estimation for Enabling Autonomous Driving in Roads without Lane Lines

Sensors ◽

10.3390/s19143224 ◽

2019 ◽

Vol 19 (14) ◽

pp. 3224 ◽

Cited By ~ 4

Author(s):

Pablo R. Palafox ◽

Johannes Betz ◽

Felix Nobis ◽

Konstantin Riedl ◽

Markus Lienkamp

Keyword(s):

Semantic Segmentation ◽

Depth Estimation ◽

Autonomous Driving ◽

Warning Systems ◽

The Road ◽

Lane Departure ◽

Rgb Images ◽

Monocular Depth ◽

On The Road ◽

The City

Typically, lane departure warning systems rely on lane lines being present on the road.However, in many scenarios, e.g., secondary roads or some streets in cities, lane lines are eithernot present or not sufficiently well signaled. In this work, we present a vision-based method tolocate a vehicle within the road when no lane lines are present using only RGB images as input.To this end, we propose to fuse together the outputs of a semantic segmentation and a monoculardepth estimation architecture to reconstruct locally a semantic 3D point cloud of the viewed scene.We only retain points belonging to the road and, additionally, to any kind of fences or walls thatmight be present right at the sides of the road. We then compute the width of the road at a certainpoint on the planned trajectory and, additionally, what we denote as the fence-to-fence distance.Our system is suited to any kind of motoring scenario and is especially useful when lane lines arenot present on the road or do not signal the path correctly. The additional fence-to-fence distancecomputation is complementary to the road’s width estimation. We quantitatively test our methodon a set of images featuring streets of the city of Munich that contain a road-fence structure, so asto compare our two proposed variants, namely the road’s width and the fence-to-fence distancecomputation. In addition, we also validate our system qualitatively on the Stuttgart sequence of thepublicly available Cityscapes dataset, where no fences or walls are present at the sides of the road,thus demonstrating that our system can be deployed in a standard city-like environment. For thebenefit of the community, we make our software open source.

Download Full-text

Collision Avoidance Using Deep Learning-Based Monocular Vision

SN Computer Science ◽

10.1007/s42979-021-00759-6 ◽

2021 ◽

Vol 2 (5) ◽

Author(s):

Róbert-Adrian Rill ◽

Kinga Bettina Faragó

Keyword(s):

Deep Learning ◽

Collision Avoidance ◽

Depth Estimation ◽

Autonomous Driving ◽

Monocular Vision ◽

Estimation Methods ◽

Simple Method ◽

Crash Avoidance ◽

Monocular Depth ◽

Similar Accuracy

AbstractAutonomous driving technologies, including monocular vision-based approaches, are in the forefront of industrial and research communities, since they are expected to have a significant impact on economy and society. However, they have limitations in terms of crash avoidance because of the rarity of labeled data for collisions in everyday traffic, as well as due to the complexity of driving situations. In this work, we propose a simple method based solely on monocular vision to overcome the data scarcity problem and to promote forward collision avoidance systems. We exploit state-of-the-art deep learning-based optical flow and monocular depth estimation methods, as well as object detection to estimate the speed of the ego-vehicle and to identify the lead vehicle, respectively. The proposed method utilizes car stop situations as collision surrogates to obtain data for time to collision estimation. We evaluate this approach on our own driving videos, collected using a spherical camera and smart glasses. Our results indicate that similar accuracy can be achieved on both video sources: the external road view from the car’s, and the ego-centric view from the driver’s perspective. Additionally, we set forth the possibility of using spherical cameras as opposed to traditional cameras for vision-based automotive sensing.

Download Full-text

Joint Soft–Hard Attention for Self-Supervised Monocular Depth Estimation

Sensors ◽

10.3390/s21216956 ◽

2021 ◽

Vol 21 (21) ◽

pp. 6956

Author(s):

Chao Fan ◽

Zhenyu Yin ◽

Fulong Xu ◽

Anying Chai ◽

Feiqing Zhang

Keyword(s):

Depth Estimation ◽

Autonomous Driving ◽

Estimation Accuracy ◽

Multi Scale ◽

Laser Sensors ◽

Robot Perception ◽

New Ideas ◽

Supervised Methods ◽

Monocular Depth ◽

Direct Use

In recent years, self-supervised monocular depth estimation has gained popularity among researchers because it uses only a single camera at a much lower cost than the direct use of laser sensors to acquire depth. Although monocular self-supervised methods can obtain dense depths, the estimation accuracy needs to be further improved for better applications in scenarios such as autonomous driving and robot perception. In this paper, we innovatively combine soft attention and hard attention with two new ideas to improve self-supervised monocular depth estimation: (1) a soft attention module and (2) a hard attention strategy. We integrate the soft attention module in the model architecture to enhance feature extraction in both spatial and channel dimensions, adding only a small number of parameters. Unlike traditional fusion approaches, we use the hard attention strategy to enhance the fusion of generated multi-scale depth predictions. Further experiments demonstrate that our method can achieve the best self-supervised performance both on the standard KITTI benchmark and the Make3D dataset.

Download Full-text

YOLO MDE: Object Detection with Monocular Depth Estimation

Electronics ◽

10.3390/electronics11010076 ◽

2021 ◽

Vol 11 (1) ◽

pp. 76

Author(s):

Jongsub Yu ◽

Hyukdoo Choi

Keyword(s):

Risk Assessment ◽

Object Detection ◽

Network Architecture ◽

Ground Truth ◽

Depth Estimation ◽

Autonomous Driving ◽

Depth Prediction ◽

Bounding Box ◽

Monocular Depth ◽

Bounding Boxes

This paper presents an object detector with depth estimation using monocular camera images. Previous detection studies have typically focused on detecting objects with 2D or 3D bounding boxes. A 3D bounding box consists of the center point, its size parameters, and heading information. However, predicting complex output compositions leads a model to have generally low performances, and it is not necessary for risk assessment for autonomous driving. We focused on predicting a single depth per object, which is essential for risk assessment for autonomous driving. Our network architecture is based on YOLO v4, which is a fast and accurate one-stage object detector. We added an additional channel to the output layer for depth estimation. To train depth prediction, we extract the closest depth from the 3D bounding box coordinates of ground truth labels in the dataset. Our model is compared with the latest studies on 3D object detection using the KITTI object detection benchmark. As a result, we show that our model achieves higher detection performance and detection speed than existing models with comparable depth accuracy.

Download Full-text

A Novel Method for Estimating Monocular Depth Using Cycle GAN and Segmentation

Sensors ◽

10.3390/s20092567 ◽

2020 ◽

Vol 20 (9) ◽

pp. 2567

Author(s):

Dong-hoon Kwak ◽

Seung-ho Lee

Keyword(s):

Visual Information ◽

Spatial Information ◽

Three Dimensional ◽

Depth Estimation ◽

Autonomous Driving ◽

Depth Information ◽

Generative Adversarial Network ◽

Adversarial Network ◽

Novel Method ◽

Monocular Depth

Modern image processing techniques use three-dimensional (3D) images, which contain spatial information such as depth and scale, in addition to visual information. These images are indispensable in virtual reality, augmented reality (AR), and autonomous driving applications. We propose a novel method to estimate monocular depth using a cycle generative adversarial network (GAN) and segmentation. In this paper, we propose a method for estimating depth information by combining segmentation. It uses three processes: segmentation and depth estimation, adversarial loss calculations, and cycle consistency loss calculations. The cycle consistency loss calculation process evaluates the similarity of two images when they are restored to their original forms after being estimated separately from two adversarial losses. To evaluate the objective reliability of the proposed method, we compared our proposed method with other monocular depth estimation (MDE) methods using the NYU Depth Dataset V2. Our results show that the benchmark value for our proposed method is better than other methods. Therefore, we demonstrated that our proposed method is more efficient in determining depth estimation.

Download Full-text

A variational approach for estimation of monocular depth and camera motion in autonomous driving

Proceedings of the Institution of Mechanical Engineers Part D Journal of Automobile Engineering ◽

10.1177/09544070211034332 ◽

2021 ◽

pp. 095440702110343

Author(s):

Huijuan Hu ◽

Chuan Hu ◽

Xuetao Zhang

Keyword(s):

Structure From Motion ◽

Variational Approach ◽

Depth Estimation ◽

Autonomous Driving ◽

Variational Model ◽

Camera Motion ◽

Soft Constraint ◽

Smoothness Constraint ◽

Monocular Depth ◽

Dense Correspondence

In this paper, a new direct computational approach to dense 3D reconstruction in autonomous driving is proposed to simultaneously estimate the depth and the camera motion for the motion stereo problem. A traditional Structure from Motion framework is utilized to establish geometric constrains for our variational model. The architecture is mainly composed of the texture constancy constraint, one-order motion smoothness constraint, a second-order depth regularize constraint and a soft constraint. The texture constancy constraint can improve the robustness against illumination changes. One-order motion smoothness constraint can reduce the noise in estimation of dense correspondence. The depth regularize constraint is used to handle inherent ambiguities and guarantee a smooth or piecewise smooth surface, and the soft constraint can provide a dense correspondence as initial estimation of the camera matrix to improve the robustness future. Compared to the traditional dense Structure from Motion approaches and popular stereo approaches, our monocular depth estimation results are more accurate and more robust. Even in contrast to the popular depth from single image networks, our variational approach still has good performance in estimation of monocular depth and camera motion.

Download Full-text