A Comparison of Recent Neural Vocoders for Speech Signal Reconstruction

Author(s):  
Prachi Govalkar ◽  
Johannes Fischer ◽  
Frank Zalkow ◽  
Christian Dittmar
2019 ◽  
Vol 29 (06) ◽  
pp. 1950075
Author(s):  
Yumei Zhang ◽  
Xiangying Guo ◽  
Xia Wu ◽  
Suzhen Shi ◽  
Xiaojun Wu

In this paper, we propose a nonlinear prediction model of speech signal series with an explicit structure. In order to overcome some intrinsic shortcomings, such as traps at the local minimum, improper selection of parameters, and slow convergence rate, which are always caused by improper parameters generated by, typically, the low performance of least mean square (LMS) in updating kernel coefficients of the Volterra model, a uniform searching particle swarm optimization (UPSO) algorithm to optimize the kernel coefficients of the Volterra model is proposed. The second-order Volterra filter (SOVF) speech prediction model based on UPSO is established by using English phonemes, words, and phrases. In order to reduce the complexity of the model, given a user-designed tolerance of errors, we extract the reduced parameter of SOVF (RPSOVF) for acceleration. The experimental results show that in the tasks of single-frame and multiframe speech signals, both UPSO-SOVF and UPSO-RPSOVF are better than LMS-SOVF and PSO-SOVF in terms of root mean square error (RMSE) and mean absolute deviation (MAD). UPSO-SOVF and UPSO-RPSOVF can better reflect trends and regularity of speech signals, which can fully meet the requirements of speech signal prediction. The proposed model presents a nonlinear analysis and valuable model structure for speech signal series, and can be further employed in speech signal reconstruction or compression coding.


Sign in / Sign up

Export Citation Format

Share Document