Unsupervised Monocular Depth Estimation CNN Robust to Training Data Diversity

2021 ◽

Vol XLIV-M-3-2021 ◽

pp. 1-5

Author(s):

M. R. Bayanlou ◽

M. Khoshboresh-Masouleh

Keyword(s):

Scene Understanding ◽

Depth Estimation ◽

Traffic Monitoring ◽

Training Data ◽

Specific Information ◽

Infrastructure Development ◽

Task Learning ◽

Monocular Depth ◽

Uav Images ◽

3D City Modelling

Abstract. Single-task learning in artificial neural networks will be able to learn the model very well, and the benefits brought by transferring knowledge thus become limited. In this regard, when the number of tasks increases (e.g., semantic segmentation, panoptic segmentation, monocular depth estimation, and 3D point cloud), duplicate information may exist across tasks, and the improvement becomes less significant. Multi-task learning has emerged as a solution to knowledge-transfer issues and is an approach to scene understanding which involves multiple related tasks each with potentially limited training data. Multi-task learning improves generalization by leveraging the domain-specific information contained in the training data of related tasks. In urban management applications such as infrastructure development, traffic monitoring, smart 3D cities, and change detection, automated multi-task data analysis for scene understanding based on the semantic, instance, and panoptic annotation, as well as monocular depth estimation, is required to generate precise urban models. In this study, a common framework for the performance assessment of multi-task learning methods from fixed-wing UAV images for 2D/3D city modelling is presented.

Download Full-text

SELF-SUPERVISED LEARNING FOR MONOCULAR DEPTH ESTIMATION FROM AERIAL IMAGERY

ISPRS Annals of Photogrammetry Remote Sensing and Spatial Information Sciences ◽

10.5194/isprs-annals-v-2-2020-357-2020 ◽

2020 ◽

Vol V-2-2020 ◽

pp. 357-364

Author(s):

M. Hermann ◽

B. Ruf ◽

M. Weinmann ◽

S. Hinz

Keyword(s):

Supervised Learning ◽

Image Matching ◽

Ground Truth ◽

Depth Estimation ◽

Training Data ◽

Aerial Imagery ◽

Small Model ◽

Conventional Methods ◽

Monocular Depth ◽

Real Time Application

Abstract. Supervised learning based methods for monocular depth estimation usually require large amounts of extensively annotated training data. In the case of aerial imagery, this ground truth is particularly difficult to acquire. Therefore, in this paper, we present a method for self-supervised learning for monocular depth estimation from aerial imagery that does not require annotated training data. For this, we only use an image sequence from a single moving camera and learn to simultaneously estimate depth and pose information. By sharing the weights between pose and depth estimation, we achieve a relatively small model, which favors real-time application. We evaluate our approach on three diverse datasets and compare the results to conventional methods that estimate depth maps based on multi-view geometry. We achieve an accuracy δ1:25 of up to 93.5 %. In addition, we have paid particular attention to the generalization of a trained model to unknown data and the self-improving capabilities of our approach. We conclude that, even though the results of monocular depth estimation are inferior to those achieved by conventional methods, they are well suited to provide a good initialization for methods that rely on image matching or to provide estimates in regions where image matching fails, e.g. occluded or texture-less regions.

Download Full-text

Unsupervised Monocular Depth Estimation for Autonomous Driving

Proceedings of the International Display Workshops ◽

10.36463/idw.2019.3dsap2_3dp2-2 ◽

2019 ◽

pp. 128

Author(s):

Chih-Shuan Huang ◽

Wan-Nung Tsung ◽

Wei-Jong Yang ◽

Chin-Hsing Chen

Keyword(s):

Depth Estimation ◽

Autonomous Driving ◽

Monocular Depth

Download Full-text

On the Uncertainty of Self-Supervised Monocular Depth Estimation

2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) ◽

10.1109/cvpr42600.2020.00329 ◽

2020 ◽

Cited By ~ 1

Author(s):

Matteo Poggi ◽

Filippo Aleotti ◽

Fabio Tosi ◽

Stefano Mattoccia

Keyword(s):

Depth Estimation ◽

Monocular Depth

Download Full-text

Constant Velocity Constraints for Self-Supervised Monocular Depth Estimation

European Conference on Visual Media Production ◽

10.1145/3429341.3429355 ◽

2020 ◽

Author(s):

Hang Zhou ◽

David Greenwood ◽

Sarah Taylor ◽

Han Gong

Keyword(s):

Constant Velocity ◽

Depth Estimation ◽

Monocular Depth ◽

Velocity Constraints

Download Full-text

Hierarchical Object Relationship Constrained Monocular Depth Estimation.

Pattern Recognition ◽

10.1016/j.patcog.2021.108116 ◽

2021 ◽

pp. 108116

Author(s):

Shuai Li ◽

Jiaying Shi ◽

Wenfeng Song ◽

Aimin Hao ◽

Hong Qin

Keyword(s):

Depth Estimation ◽

Monocular Depth ◽

Object Relationship

Download Full-text

Monocular Depth Estimation with Joint Attention Feature Distillation and Wavelet-Based Loss Function

Sensors ◽

10.3390/s21010054 ◽

2020 ◽

Vol 21 (1) ◽

pp. 54

Author(s):

Peng Liu ◽

Zonghua Zhang ◽

Zhaozong Meng ◽

Nan Gao

Keyword(s):

Joint Attention ◽

Loss Function ◽

Depth Estimation ◽

Depth Information ◽

3D Vision ◽

Network Training ◽

Crucial Component ◽

Benchmark Datasets ◽

Ill Posed ◽

Monocular Depth

Depth estimation is a crucial component in many 3D vision applications. Monocular depth estimation is gaining increasing interest due to flexible use and extremely low system requirements, but inherently ill-posed and ambiguous characteristics still cause unsatisfactory estimation results. This paper proposes a new deep convolutional neural network for monocular depth estimation. The network applies joint attention feature distillation and wavelet-based loss function to recover the depth information of a scene. Two improvements were achieved, compared with previous methods. First, we combined feature distillation and joint attention mechanisms to boost feature modulation discrimination. The network extracts hierarchical features using a progressive feature distillation and refinement strategy and aggregates features using a joint attention operation. Second, we adopted a wavelet-based loss function for network training, which improves loss function effectiveness by obtaining more structural details. The experimental results on challenging indoor and outdoor benchmark datasets verified the proposed method’s superiority compared with current state-of-the-art methods.

Download Full-text

Time- and Resource-Efficient Time-to-Collision Forecasting for Indoor Pedestrian Obstacles Avoidance

Journal of Imaging ◽

10.3390/jimaging7040061 ◽

2021 ◽

Vol 7 (4) ◽

pp. 61

Author(s):

David Urban ◽

Alice Caplier

Keyword(s):

Neural Network ◽

Autonomous Vehicles ◽

Depth Estimation ◽

Video Camera ◽

Obstacle Detection ◽

Navigation Systems ◽

Time To Collision ◽

Static Data ◽

Monocular Depth ◽

Fully Connected

As difficult vision-based tasks like object detection and monocular depth estimation are making their way in real-time applications and as more light weighted solutions for autonomous vehicles navigation systems are emerging, obstacle detection and collision prediction are two very challenging tasks for small embedded devices like drones. We propose a novel light weighted and time-efficient vision-based solution to predict Time-to-Collision from a monocular video camera embedded in a smartglasses device as a module of a navigation system for visually impaired pedestrians. It consists of two modules: a static data extractor made of a convolutional neural network to predict the obstacle position and distance and a dynamic data extractor that stacks the obstacle data from multiple frames and predicts the Time-to-Collision with a simple fully connected neural network. This paper focuses on the Time-to-Collision network’s ability to adapt to new sceneries with different types of obstacles with supervised learning.

Download Full-text