scholarly journals Audio-Driven Talking Face Video Generation with Dynamic Convolution Kernels

2022 ◽  
pp. 1-1
Author(s):  
Zipeng Ye ◽  
Mengfei Xia ◽  
Ran Yi ◽  
Juyong Zhang ◽  
Yu-Kun Lai ◽  
...  
2021 ◽  
Vol 11 (15) ◽  
pp. 6975
Author(s):  
Tao Zhang ◽  
Lun He ◽  
Xudong Li ◽  
Guoqing Feng

Lipreading aims to recognize sentences being spoken by a talking face. In recent years, the lipreading method has achieved a high level of accuracy on large datasets and made breakthrough progress. However, lipreading is still far from being solved, and existing methods tend to have high error rates on the wild data and have the defects of disappearing training gradient and slow convergence. To overcome these problems, we proposed an efficient end-to-end sentence-level lipreading model, using an encoder based on a 3D convolutional network, ResNet50, Temporal Convolutional Network (TCN), and a CTC objective function as the decoder. More importantly, the proposed architecture incorporates TCN as a feature learner to decode feature. It can partly eliminate the defects of RNN (LSTM, GRU) gradient disappearance and insufficient performance, and this yields notable performance improvement as well as faster convergence. Experiments show that the training and convergence speed are 50% faster than the state-of-the-art method, and improved accuracy by 2.4% on the GRID dataset.


2015 ◽  
Vol 26 (4) ◽  
pp. 490-498 ◽  
Author(s):  
Ferran Pons ◽  
Laura Bosch ◽  
David J. Lewkowicz

2020 ◽  
pp. 1-11
Author(s):  
Jie Liu ◽  
Hongbo Zhao

BACKGROUND: Convolution neural network is often superior to other similar algorithms in image classification. Convolution layer and sub-sampling layer have the function of extracting sample features, and the feature of sharing weights greatly reduces the training parameters of the network. OBJECTIVE: This paper describes the improved convolution neural network structure, including convolution layer, sub-sampling layer and full connection layer. This paper also introduces five kinds of diseases and normal eye images reflected by the blood filament of the eyeball “yan.mat” data set, convenient to use MATLAB software for calculation. METHODSL: In this paper, we improve the structure of the classical LeNet-5 convolutional neural network, and design a network structure with different convolution kernels, different sub-sampling methods and different classifiers, and use this structure to solve the problem of ocular bloodstream disease recognition. RESULTS: The experimental results show that the improved convolutional neural network structure is ideal for the recognition of eye blood silk data set, which shows that the convolution neural network has the characteristics of strong classification and strong robustness. The improved structure can classify the diseases reflected by eyeball bloodstain well.


2015 ◽  
Vol 41 (1) ◽  
pp. 165-173 ◽  
Author(s):  
Fabio Massimo Zanzotto ◽  
Lorenzo Ferrone ◽  
Marco Baroni

Distributional semantics has been extended to phrases and sentences by means of composition operations. We look at how these operations affect similarity measurements, showing that similarity equations of an important class of composition methods can be decomposed into operations performed on the subparts of the input phrases. This establishes a strong link between these models and convolution kernels.


Mathematics ◽  
2021 ◽  
Vol 9 (23) ◽  
pp. 3035
Author(s):  
Feiyue Deng ◽  
Yan Bi ◽  
Yongqiang Liu ◽  
Shaopu Yang

Remaining useful life (RUL) prediction of key components is an important influencing factor in making accurate maintenance decisions for mechanical systems. With the rapid development of deep learning (DL) techniques, the research on RUL prediction based on the data-driven model is increasingly widespread. Compared with the conventional convolution neural networks (CNNs), the multi-scale CNNs can extract different-scale feature information, which exhibits a better performance in the RUL prediction. However, the existing multi-scale CNNs employ multiple convolution kernels with different sizes to construct the network framework. There are two main shortcomings of this approach: (1) the convolution operation based on multiple size convolution kernels requires enormous computation and has a low operational efficiency, which severely restricts its application in practical engineering. (2) The convolutional layer with a large size convolution kernel needs a mass of weight parameters, leading to a dramatic increase in the network training time and making it prone to overfitting in the case of small datasets. To address the above issues, a multi-scale dilated convolution network (MsDCN) is proposed for RUL prediction in this article. The MsDCN adopts a new multi-scale dilation convolution fusion unit (MsDCFU), in which the multi-scale network framework is composed of convolution operations with different dilated factors. This effectively expands the range of receptive field (RF) for the convolution kernel without an additional computational burden. Moreover, the MsDCFU employs the depthwise separable convolution (DSC) to further improve the operational efficiency of the prognostics model. Finally, the proposed method was validated with the accelerated degradation test data of rolling element bearings (REBs). The experimental results demonstrate that the proposed MSDCN has a higher RUL prediction accuracy compared to some typical CNNs and better operational efficiency than the existing multi-scale CNNs based on different convolution kernel sizes.


Author(s):  
Sefik Emre Eskimez ◽  
Ross K. Maddox ◽  
Chenliang Xu ◽  
Zhiyao Duan
Keyword(s):  

Sign in / Sign up

Export Citation Format

Share Document