Tibetan speech synthesis based on an improved neural network
Keyword(s):
Nowadays, Tibetan speech synthesis based on neural network has become the mainstream synthesis method. Among them, the griffin-lim vocoder is widely used in Tibetan speech synthesis because of its relatively simple synthesis.Aiming at the problem of low fidelity of griffin-lim vocoder, this paper uses WaveNet vocoder instead of griffin-lim for Tibetan speech synthesis.This paper first uses convolution operation and attention mechanism to extract sequence features.And then uses linear projection and feature amplification module to predict mel spectrogram.Finally, use WaveNet vocoder to synthesize speech waveform. Experimental data shows that our model has a better performance in Tibetan speech synthesis.
Keyword(s):
2020 ◽
Vol 34
(05)
◽
pp. 8228-8235
Keyword(s):
Keyword(s):
2021 ◽
Vol 75
◽
pp. 103019
Keyword(s):
Keyword(s):
Keyword(s):
Keyword(s):