UnDispNet: Unsupervised Learning for Multi-Stage Monocular Depth Prediction

Learning to predict scene depth from RGB inputs is a challenging task both for indoor and outdoor robot navigation. In this work we address unsupervised learning of scene depth and robot ego-motion where supervision is provided by monocular videos, as cameras are the cheapest, least restrictive and most ubiquitous sensor for robotics. Previous work in unsupervised image-to-depth learning has established strong baselines in the domain. We propose a novel approach which produces higher quality results, is able to model moving objects and is shown to transfer across data domains, e.g. from outdoors to indoor scenes. The main idea is to introduce geometric structure in the learning process, by modeling the scene and the individual objects; camera ego-motion and object motions are learned from monocular videos as input. Furthermore an online refinement method is introduced to adapt learning on the fly to unknown domains. The proposed approach outperforms all state-of-the-art approaches, including those that handle motion e.g. through learned flow. Our results are comparable in quality to the ones which used stereo as supervision and significantly improve depth prediction on scenes and datasets which contain a lot of object motion. The approach is of practical relevance, as it allows transfer across environments, by transferring models trained on data collected for robot navigation in urban scenes to indoor navigation settings. The code associated with this paper can be found at https://sites.google.com/view/struct2depth.

Download Full-text

Unsupervised Learning of Monocular Depth and Large-Ego-Motion With Multiple Loop Consistency Losses

IEEE Access ◽

10.1109/access.2019.2920301 ◽

2019 ◽

Vol 7 ◽

pp. 77839-77848 ◽

Cited By ~ 2

Author(s):

Junning Zhang ◽

Qunxing Su ◽

Pengyuan Liu ◽

Chao Xu ◽

Yanlong Chen

Keyword(s):

Unsupervised Learning ◽

Monocular Depth

Download Full-text

Temporal-Aware SfM-Learner: Unsupervised Learning Monocular Depth and Motion from Stereo Video Clips

2020 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR) ◽

10.1109/mipr49039.2020.00059 ◽

2020 ◽

Author(s):

Lanqing Zhang ◽

Ge Li ◽

Thomas H. Li

Keyword(s):

Unsupervised Learning ◽

Video Clips ◽

Stereo Video ◽

Monocular Depth

Download Full-text

Deeper Monocular Depth Prediction via Long and Short Skip Connection

2019 International Joint Conference on Neural Networks (IJCNN) ◽

10.1109/ijcnn.2019.8851883 ◽

2019 ◽

Cited By ~ 1

Author(s):

Zhaokai Wang ◽

Limin Xiao ◽

Rongbin Xu ◽

Shubin Su ◽

Shupan Li ◽

...

Keyword(s):

Depth Prediction ◽

Monocular Depth

Download Full-text

Unsupervised learning of monocular depth and ego-motion with space–temporal-centroid loss

International Journal of Machine Learning and Cybernetics ◽

10.1007/s13042-019-01020-6 ◽

2019 ◽

Vol 11 (3) ◽

pp. 615-627 ◽

Cited By ~ 1

Author(s):

Junning Zhang ◽

Qunxing Su ◽

Pengyuan Liu ◽

Chao Xu ◽

Yanlong Chen

Keyword(s):

Unsupervised Learning ◽

Monocular Depth

Download Full-text

Task-Aware Monocular Depth Estimation for 3D Object Detection

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i07.6908 ◽

2020 ◽

Vol 34 (07) ◽

pp. 12257-12264 ◽

Cited By ~ 1

Author(s):

Xinlong Wang ◽

Wei Yin ◽

Tao Kong ◽

Yuning Jiang ◽

Lei Li ◽

...

Keyword(s):

Object Detection ◽

State Of The Art ◽

Depth Estimation ◽

3D Perception ◽

Research Attention ◽

3D Object ◽

Depth Prediction ◽

Monocular Depth ◽

Almost All ◽

3D Object Detection

Monocular depth estimation enables 3D perception from a single 2D image, thus attracting much research attention for years. Almost all methods treat foreground and background regions (“things and stuff”) in an image equally. However, not all pixels are equal. Depth of foreground objects plays a crucial role in 3D object recognition and localization. To date how to boost the depth prediction accuracy of foreground objects is rarely discussed. In this paper, we first analyze the data distributions and interaction of foreground and background, then propose the foreground-background separated monocular depth estimation (ForeSeE) method, to estimate the foreground and background depth using separate optimization objectives and decoders. Our method significantly improves the depth estimation performance on foreground objects. Applying ForeSeE to 3D object detection, we achieve 7.5 AP gains and set new state-of-the-art results among other monocular methods. Code will be available at: https://github.com/WXinlong/ForeSeE.

Download Full-text

Motion Rectification Network for Unsupervised Learning of Monocular Depth and Camera Motion

2020 IEEE International Conference on Image Processing (ICIP) ◽

10.1109/icip40778.2020.9190649 ◽

2020 ◽

Author(s):

Hong Liu ◽

Guoliang Hua ◽

Weibo Huang

Keyword(s):

Unsupervised Learning ◽

Camera Motion ◽

Monocular Depth

Download Full-text

Unsupervised Learning of Monocular Depth from Videos

2019 Chinese Automation Congress (CAC) ◽

10.1109/cac48633.2019.8996631 ◽

2019 ◽

Author(s):

Gao Haosheng ◽

Teng Wang

Keyword(s):

Unsupervised Learning ◽

Monocular Depth

Download Full-text

UnDispNet: Unsupervised Learning for Multi-Stage Monocular Depth Prediction

MSDPN: Monocular Depth Prediction with Partial Laser Observation using Multi-stage Neural Networks

Unsupervised learning for monocular depth and motion with real scale

Depth Prediction without the Sensors: Leveraging Structure for Unsupervised Learning from Monocular Videos

Unsupervised Learning of Monocular Depth and Large-Ego-Motion With Multiple Loop Consistency Losses

Temporal-Aware SfM-Learner: Unsupervised Learning Monocular Depth and Motion from Stereo Video Clips

Deeper Monocular Depth Prediction via Long and Short Skip Connection

Unsupervised learning of monocular depth and ego-motion with space–temporal-centroid loss

Task-Aware Monocular Depth Estimation for 3D Object Detection

Motion Rectification Network for Unsupervised Learning of Monocular Depth and Camera Motion

Unsupervised Learning of Monocular Depth from Videos

Export Citation Format