A novel method for Mandarin speech synthesis by inserting prosodic structure prediction into Tacotron2

Author(s):  
Junmin Liu ◽  
Zhuangzhuang Xie ◽  
Chunxia Zhang ◽  
Guang Shi
2021 ◽  
Vol 2021 ◽  
pp. 1-9
Author(s):  
Hongjuan Ma

With the increasing maturity of speech synthesis technology, on the one hand, it has been more and more widely used in people’s lives; on the other hand, it also brings more and more convenience to people. The requirements for speech synthesis systems are getting higher and higher. Therefore, advanced technology is used to improve and update the accent recognition system. This paper mainly introduces the word stress annotation technology combined with neural network speech synthesis technology. In Chinese speech synthesis, prosodic structure prediction has a great influence on naturalness. The purpose of this paper is to accurately predict the prosodic structure, which has become an important problem to be solved in speech synthesis. Experimental data show that the average error of samples in the network training process is lel/85, and the minimum value of the training error after 500 steps is 0.00013127, so the final sample average error is lel = 85  ∗  0.0013127 = 0.112 < 0.5, and use the deep neural network (DNN) to train different parameters to obtain the conversion model, and then synthesize these conversion models, and finally achieve the effect of improving the synthesized sound quality.


2020 ◽  
Vol 34 (05) ◽  
pp. 9106-9113
Author(s):  
Amir Veyseh ◽  
Franck Dernoncourt ◽  
My Thai ◽  
Dejing Dou ◽  
Thien Nguyen

Relation Extraction (RE) is one of the fundamental tasks in Information Extraction. The goal of this task is to find the semantic relations between entity mentions in text. It has been shown in many previous work that the structure of the sentences (i.e., dependency trees) can provide important information/features for the RE models. However, the common limitation of the previous work on RE is the reliance on some external parsers to obtain the syntactic trees for the sentence structures. On the one hand, it is not guaranteed that the independent external parsers can offer the optimal sentence structures for RE and the customized structures for RE might help to further improve the performance. On the other hand, the quality of the external parsers might suffer when applied to different domains, thus also affecting the performance of the RE models on such domains. In order to overcome this issue, we introduce a novel method for RE that simultaneously induces the structures and predicts the relations for the input sentences, thus avoiding the external parsers and potentially leading to better sentence structures for RE. Our general strategy to learn the RE-specific structures is to apply two different methods to infer the structures for the input sentences (i.e., two views). We then introduce several mechanisms to encourage the structure and semantic consistencies between these two views so the effective structure and semantic representations for RE can emerge. We perform extensive experiments on the ACE 2005 and SemEval 2010 datasets to demonstrate the advantages of the proposed method, leading to the state-of-the-art performance on such datasets.


Sign in / Sign up

Export Citation Format

Share Document