scholarly journals Methods for evaluating of the synthesized Speech quality

2010 ◽  
Vol 44-47 ◽  
pp. 3672-3676
Author(s):  
Jian Lei Li ◽  
Zhen Ma ◽  
Ming Zhao Wu

On the base of all-poles model, this paper provides order-variable all-poles model according to instability of track complexity and applies this model in Multi-pulses linear prediction speech coding. This method is simulated in Matlab and quality of synthesized speech is evaluated, order-variable model is founded to keep better speech quality on the base of decreasing coding rates.


2011 ◽  
Vol 97 (5) ◽  
pp. 852-868 ◽  
Author(s):  
Peter Počta ◽  
Jan Holub

This paper investigates the impact of independent and dependent losses and coding on speech quality predictions provided by PESQ (also known as ITU-T P.862) and P.563 models, when both naturally-produced and synthesized speech are used. Two synthesized speech samples generated with two different Text-to-Speech systems and one naturally-produced sample are investigated. In addition, we assess the variability of PESQ's and P.563's predictions with respect to the type of speech used (naturally-produced or synthesized) and loss conditions as well as their accuracy, by comparing the predictions with subjective assessments. The results show that there is no difference between the impact of packet loss on naturally-produced speech and synthesized speech. On the other hand, the impact of coding is different for the two types of stimuli. In addition, synthesized speech seems to be insensitive to degradations provided by most of the codecs investigated here. The reasons for those findings are particularly discussed. Finally, it is concluded that both models are capable of predicting the quality of transmitted synthesized speech under the investigated conditions to a certain degree. As expected, PESQ achieves the best performance over almost all of the investigated conditions.


1983 ◽  
Vol 26 (4) ◽  
pp. 516-524 ◽  
Author(s):  
Donald J. Sharf ◽  
Ralph N. Ohde

Adult and Child manifolds were generated by synthesizing 5 X 5 matrices of/Cej/ type utterances in which F2 and F3 frequencies were systematically varied. Manifold stimuli were presented to 11 graduate-level speech-language pathology students in two conditions: (a) a rating condition in which stimuli were rated on a 4-point scale between good /r/and good /w/; and (b) a labeling condition in which stimuli were labeled as "R," "W," "distorted R." or "N" (for none of the previous choices). It was found that (a) stimuli with low F2 and high F3 frequencies were rated 1.0nmdas;1.4; those with high F2 and low F3 frequencies were rated 3.6–4.0, and those with intermediate values were rated 1.5–3.5; (b) stimuli rated 1.0–1.4 were labeled as "W" and stimuli rated 3.6–4.0 were labeled as "R"; (c) none of the Child manifold stimuli were labeled as distorted "R" and one of the Adult manifold stimuli approached a level of identification that approached the percentage of identification for "R" and "W": and (d) rating and labeling tasks were performed with a high degree of reliability.


2010 ◽  
Author(s):  
Marcel Wältermann ◽  
Alexander Raake ◽  
Sebastian Möller

Sign in / Sign up

Export Citation Format

Share Document