scholarly journals END-TO-END DEPTH FROM MOTION WITH STABILIZED MONOCULAR VIDEOS

Author(s):  
C. Pinard ◽  
L. Chevalley ◽  
A. Manzanera ◽  
D. Filliat

We propose a depth map inference system from monocular videos based on a novel dataset for navigation that mimics aerial footage from gimbal stabilized monocular camera in rigid scenes. Unlike most navigation datasets, the lack of rotation implies an easier structure from motion problem which can be leveraged for different kinds of tasks such as depth inference and obstacle avoidance. We also propose an architecture for end-to-end depth inference with a fully convolutional network. Results show that although tied to camera inner parameters, the problem is locally solvable and leads to good quality depth prediction.

IEEE Access ◽  
2019 ◽  
Vol 7 ◽  
pp. 16395-16405 ◽  
Author(s):  
Xiaoyan Jiang ◽  
Yongbin Gao ◽  
Zhijun Fang ◽  
Peng Wang ◽  
Bo Huang

2021 ◽  
Author(s):  
Jianfeng Wang ◽  
Lin Song ◽  
Zeming Li ◽  
Hongbin Sun ◽  
Jian Sun ◽  
...  

2021 ◽  
Vol 11 (15) ◽  
pp. 6975
Author(s):  
Tao Zhang ◽  
Lun He ◽  
Xudong Li ◽  
Guoqing Feng

Lipreading aims to recognize sentences being spoken by a talking face. In recent years, the lipreading method has achieved a high level of accuracy on large datasets and made breakthrough progress. However, lipreading is still far from being solved, and existing methods tend to have high error rates on the wild data and have the defects of disappearing training gradient and slow convergence. To overcome these problems, we proposed an efficient end-to-end sentence-level lipreading model, using an encoder based on a 3D convolutional network, ResNet50, Temporal Convolutional Network (TCN), and a CTC objective function as the decoder. More importantly, the proposed architecture incorporates TCN as a feature learner to decode feature. It can partly eliminate the defects of RNN (LSTM, GRU) gradient disappearance and insufficient performance, and this yields notable performance improvement as well as faster convergence. Experiments show that the training and convergence speed are 50% faster than the state-of-the-art method, and improved accuracy by 2.4% on the GRID dataset.


Sign in / Sign up

Export Citation Format

Share Document