scholarly journals GMM-Based Evaluation of Synthetic Speech Quality Using 2D Classification in Pleasure-Arousal Scale

2020 ◽  
Vol 11 (1) ◽  
pp. 2
Author(s):  
Jiří Přibil ◽  
Anna Přibilová ◽  
Jindřich Matoušek

The paper focuses on the description of a system for the automatic evaluation of synthetic speech quality based on the Gaussian mixture model (GMM) classifier. The speech material originating from a real speaker is compared with synthesized material to determine similarities or differences between them. The final evaluation order is determined by distances in the Pleasure-Arousal (P-A) space between the original and synthetic speech using different synthesis and/or prosody manipulation methods implemented in the Czech text-to-speech system. The GMM models for continual 2D detection of P-A classes are trained using the sound/speech material from the databases without any relation to the original speech or the synthesized sentences. Preliminary and auxiliary analyses show a substantial influence of the number of mixtures, the number and type of the speech features used the size of the processed speech material, as well as the type of the database used for the creation of the GMMs on the P-A classification process and on the final evaluation result. The main evaluation experiments confirm the functionality of the system developed. The objective evaluation results obtained are principally correlated with the subjective ratings of human evaluators; however, partial differences were indicated, so a subsequent detailed investigation must be performed.

2016 ◽  
Author(s):  
Teng Zhang ◽  
Zhipeng Chen ◽  
Ji Wu ◽  
Sam Lai ◽  
Wenhui Lei ◽  
...  

1991 ◽  
Vol 19 (1) ◽  
pp. 139-146 ◽  
Author(s):  
Louis C.W. Pols ◽  
Renée van Bezooijen

Author(s):  
Mahbubur R. Syed ◽  
Shuvro Chakrobartty ◽  
Robert J. Bignall

Speech synthesis is the process of producing natural-sounding, highly intelligible synthetic speech simulated by a machine in such a way that it sounds as if it was produced by a human vocal system. A text-to-speech (TTS) synthesis system is a computer-based system where the input is text and the output is a simulated vocalization of that text. Before the 1970s, most speech synthesis was achieved with hardware, but this was costly and it proved impossible to properly simulate natural speech production. Since the 1970s, the use of computers has made the practical application of speech synthesis more feasible.


1992 ◽  
Vol 36 (2) ◽  
pp. 190-192 ◽  
Author(s):  
Janan Al-Awar Smither

This experiment investigated the demands synthetic speech places on short term memory by comparing performance of old and young adults on an ordinary short term memory task. Items presented were generated by a human speaker or by a text-to-speech computer synthesizer. Results were consistent with the idea that the comprehension of synthetic speech imposes increased resource demands on the short term memory system. Older subjects performed significantly more poorly than younger subjects, and both groups performed more poorly with synthetic than with human speech. Findings suggest that short term memory demands imposed by the processing of synthetic speech should be investigated further, particularly regarding the implementation of voice response systems in devices for the elderly.


2005 ◽  
Vol 48 (3) ◽  
pp. 702-714 ◽  
Author(s):  
Peninah S. Rosengard ◽  
Karen L. Payton ◽  
Louis D. Braida

The purpose of this study was twofold: (a) to determine the extent to which 4-channel, slow-acting wide dynamic range amplitude compression (WDRC) can counteract the perceptual effects of reduced auditory dynamic range and (b) to examine the relation between objective measures of speech intelligibility and categorical ratings of speech quality for sentences processed with slow-acting WDRC. Multiband expansion was used to simulate the effects of elevated thresholds and loudness recruitment in normal hearing listeners. While some previous studies have shown that WDRC can improve both speech intelligibility and quality, others have found no benefit. The current experiment shows that moderate amounts of compression can provide a small but significant improvement in speech intelligibility, relative to linear amplification, for simulated-loss listeners with small dynamic ranges (i.e., flat, moderate hearing loss). This benefit was found for speech at conversational levels, both in quiet and in a background of babble. Simulated-loss listeners with large dynamic ranges (i.e., sloping, mild-to-moderate hearing loss) did not show any improvement. Comparison of speech intelligibility scores and subjective ratings of intelligibility showed that listeners with simulated hearing loss could accurately judge the overall intelligibility of speech. However, in all listeners, ratings of pleasantness decreased as the compression ratio increased. These findings suggest that subjective measures of speech quality should be used in conjunction with either objective or subjective measures of speech intelligibility to ensure that participant-selected hearing aid parameters optimize both comfort and intelligibility.


2019 ◽  
Vol 11 (11-SPECIAL ISSUE) ◽  
pp. 967-971
Author(s):  
Quennie Joy B. Mesa ◽  
Kyung-Tae Kim ◽  
Jing-Jon Kim

Sign in / Sign up

Export Citation Format

Share Document