scholarly journals Bootstrapped Self-Supervised Training with Monocular Video for Semantic Segmentation and Depth Estimation

Author(s):  
Yihao Zhang ◽  
John J. Leonard
Sensors ◽  
2020 ◽  
Vol 20 (20) ◽  
pp. 5765 ◽  
Author(s):  
Seiya Ito ◽  
Naoshi Kaneko ◽  
Kazuhiko Sumi

This paper proposes a novel 3D representation, namely, a latent 3D volume, for joint depth estimation and semantic segmentation. Most previous studies encoded an input scene (typically given as a 2D image) into a set of feature vectors arranged over a 2D plane. However, considering the real world is three-dimensional, this 2D arrangement reduces one dimension and may limit the capacity of feature representation. In contrast, we examine the idea of arranging the feature vectors in 3D space rather than in a 2D plane. We refer to this 3D volumetric arrangement as a latent 3D volume. We will show that the latent 3D volume is beneficial to the tasks of depth estimation and semantic segmentation because these tasks require an understanding of the 3D structure of the scene. Our network first constructs an initial 3D volume using image features and then generates latent 3D volume by passing the initial 3D volume through several 3D convolutional layers. We apply depth regression and semantic segmentation by projecting the latent 3D volume onto a 2D plane. The evaluation results show that our method outperforms previous approaches on the NYU Depth v2 dataset.


Author(s):  
Vladimir Nekrasov ◽  
Thanuja Dharmasiri ◽  
Andrew Spek ◽  
Tom Drummond ◽  
Chunhua Shen ◽  
...  

2020 ◽  
Vol 12 (21) ◽  
pp. 3603 ◽  
Author(s):  
Jiaxin Wang ◽  
Chris H. Q. Ding ◽  
Sibao Chen ◽  
Chenggang He ◽  
Bin Luo

Image segmentation has made great progress in recent years, but the annotation required for image segmentation is usually expensive, especially for remote sensing images. To solve this problem, we explore semi-supervised learning methods and appropriately utilize a large amount of unlabeled data to improve the performance of remote sensing image segmentation. This paper proposes a method for remote sensing image segmentation based on semi-supervised learning. We first design a Consistency Regularization (CR) training method for semi-supervised training, then employ the new learned model for Average Update of Pseudo-label (AUP), and finally combine pseudo labels and strong labels to train semantic segmentation network. We demonstrate the effectiveness of the proposed method on three remote sensing datasets, achieving better performance without more labeled data. Extensive experiments show that our semi-supervised method can learn the latent information from the unlabeled data to improve the segmentation performance.


2021 ◽  
Author(s):  
Lukas Hoyer ◽  
Dengxin Dai ◽  
Yuhua Chen ◽  
Adrian Koring ◽  
Suman Saha ◽  
...  

2021 ◽  
Vol 8 (3) ◽  
pp. 15-27
Author(s):  
Mohamed N. Sweilam ◽  
Nikolay Tolstokulakov

Depth estimation has made great progress in the last few years due to its applications in robotics science and computer vision. Various methods have been implemented and enhanced to estimate the depth without flickers and missing holes. Despite this progress, it is still one of the main challenges for researchers, especially for the video applications which have more complexity of the neural network which af ects the run time. Moreover to use such input like monocular video for depth estimation is considered an attractive idea, particularly for hand-held devices such as mobile phones, they are very popular for capturing pictures and videos, in addition to having a limited amount of RAM. Here in this work, we focus on enhancing the existing consistent depth estimation for monocular videos approach to be with less usage of RAM and with using less number of parameters without having a significant reduction in the quality of the depth estimation.


Sensors ◽  
2019 ◽  
Vol 19 (8) ◽  
pp. 1795 ◽  
Author(s):  
Xiao Lin ◽  
Dalila Sánchez-Escobedo ◽  
Josep R. Casas ◽  
Montse Pardàs

Semantic segmentation and depth estimation are two important tasks in computer vision, and many methods have been developed to tackle them. Commonly these two tasks are addressed independently, but recently the idea of merging these two problems into a sole framework has been studied under the assumption that integrating two highly correlated tasks may benefit each other to improve the estimation accuracy. In this paper, depth estimation and semantic segmentation are jointly addressed using a single RGB input image under a unified convolutional neural network. We analyze two different architectures to evaluate which features are more relevant when shared by the two tasks and which features should be kept separated to achieve a mutual improvement. Likewise, our approaches are evaluated under two different scenarios designed to review our results versus single-task and multi-task methods. Qualitative and quantitative experiments demonstrate that the performance of our methodology outperforms the state of the art on single-task approaches, while obtaining competitive results compared with other multi-task methods.


2021 ◽  
Vol 2021 ◽  
pp. 1-16
Author(s):  
Yuren Chen ◽  
Xinyi Xie ◽  
Bo Yu ◽  
Yi Li ◽  
Kunhui Lin

The multitarget vehicle tracking and motion state estimation are crucial for controlling the host vehicle accurately and preventing collisions. However, current multitarget tracking methods are inconvenient to deal with multivehicle issues due to the dynamically complex driving environment. Driving environment perception systems, as an indispensable component of intelligent vehicles, have the potential to solve this problem from the perspective of image processing. Thus, this study proposes a novel driving environment perception system of intelligent vehicles by using deep learning methods to track multitarget vehicles and estimate their motion states. Firstly, a panoramic segmentation neural network that supports end-to-end training is designed and implemented, which is composed of semantic segmentation and instance segmentation. A depth calculation model of the driving environment is established by adding a depth estimation branch to the feature extraction and fusion module of the panoramic segmentation network. These deep neural networks are trained and tested in the Mapillary Vistas Dataset and the Cityscapes Dataset, and the results showed that these methods performed well with high recognition accuracy. Then, Kalman filtering and Hungarian algorithm are used for the multitarget vehicle tracking and motion state estimation. The effectiveness of this method is tested by a simulation experiment, and results showed that the relative relation (i.e., relative speed and distance) between multiple vehicles can be estimated accurately. The findings of this study can contribute to the development of intelligent vehicles to alert drivers to possible danger, assist drivers’ decision-making, and improve traffic safety.


Sign in / Sign up

Export Citation Format

Share Document