Experiment with Evaluation of Quality of the Synthetic Speech by the GMM Classifier

A 4kbps vocoder based on MELP is presented in this paper. It uses the parameter encoding and mixed excitation technology to ensure the quality of speech. Through adopting the scalar quantization of Line Spectrum Frequency (LSF), the algorithm reduces the storage and computational complexity. Meanwhile, 4kbps vocoder adds a new frame type-transition frame. The classifier can reduce the U/V decision errors and avoid excessive switching between voiced frame and unvoiced frame. A modified bit allocation table is introduced and the PESQ-MOS and coding time test shows that the synthetic speech quality has been improved and reached the quality of communication.

Download Full-text

GMM-Based Evaluation of Synthetic Speech Quality Using 2D Classification in Pleasure-Arousal Scale

Applied Sciences ◽

10.3390/app11010002 ◽

2020 ◽

Vol 11 (1) ◽

pp. 2

Author(s):

Jiří Přibil ◽

Anna Přibilová ◽

Jindřich Matoušek

Keyword(s):

Objective Evaluation ◽

Gaussian Mixture ◽

Speech Quality ◽

Final Evaluation ◽

Synthetic Speech ◽

Subjective Ratings ◽

Text To Speech ◽

Speech Features ◽

Gmm Classifier ◽

Original Speech

The paper focuses on the description of a system for the automatic evaluation of synthetic speech quality based on the Gaussian mixture model (GMM) classifier. The speech material originating from a real speaker is compared with synthesized material to determine similarities or differences between them. The final evaluation order is determined by distances in the Pleasure-Arousal (P-A) space between the original and synthetic speech using different synthesis and/or prosody manipulation methods implemented in the Czech text-to-speech system. The GMM models for continual 2D detection of P-A classes are trained using the sound/speech material from the databases without any relation to the original speech or the synthesized sentences. Preliminary and auxiliary analyses show a substantial influence of the number of mixtures, the number and type of the speech features used the size of the processed speech material, as well as the type of the database used for the creation of the GMMs on the P-A classification process and on the final evaluation result. The main evaluation experiments confirm the functionality of the system developed. The objective evaluation results obtained are principally correlated with the subjective ratings of human evaluators; however, partial differences were indicated, so a subsequent detailed investigation must be performed.

Download Full-text

Improving Post-Filtering of Artificial Speech Using Pre-Trained LSTM Neural Networks

10.20944/preprints201905.0228.v1 ◽

2019 ◽

Author(s):

Marvin Coto-Jiménez

Keyword(s):

Neural Networks ◽

Speech Synthesis ◽

Short Term Memory ◽

Synthetic Speech ◽

Efficient Manner ◽

Training Approach ◽

Random Initialization ◽

Speech Spectrum ◽

Lstm Network

Several researchers have contemplated deep learning-based post-filters to increase the quality of statistical parametric speech synthesis, which perform a mapping of the synthetic speech to the natural speech, considering the different parameters separately and trying to reduce the gap between them. The Long Short-term Memory (LSTM) Neural Networks have been applied successfully in this purpose, but there are still many aspects to improve in the results and in the process itself. In this paper, we introduce a new pre-training approach for the LSTM, with the objective of enhancing the quality of the synthesized speech, particularly in the spectrum, in a more efficient manner. Our approach begins with an auto-associative training of one LSTM network, which is used as an initialization for the post-filters. We show the advantages of this initialization for the enhancing of the Mel-Frequency Cepstral parameters of synthetic speech. Results show that the initialization succeeds in achieving better results in enhancing the statistical parametric speech spectrum in most cases when compared to the common random initialization approach of the networks.

Download Full-text

Evaluating the Quality of Synthetic Speech

Human Factors and Voice Interactive Systems ◽

10.1007/978-1-4757-2980-1_3 ◽

1999 ◽

pp. 63-97 ◽

Cited By ~ 6

Author(s):

Alexander L. Francis ◽

Howard C. Nusbaum

Keyword(s):

Synthetic Speech

Download Full-text

Memory, comprehension, and judged quality of synthetic speech under several prosodic conditions

The Journal of the Acoustical Society of America ◽

10.1121/1.2015843 ◽

1977 ◽

Vol 61 (S1) ◽

pp. S68-S68

Author(s):

A. McHugh ◽

L. S. Larkey

Keyword(s):

Synthetic Speech

Download Full-text

Improving Post-Filtering of Artificial Speech Using Pre-Trained LSTM Neural Networks

Biomimetics ◽

10.3390/biomimetics4020039 ◽

2019 ◽

Vol 4 (2) ◽

pp. 39 ◽

Cited By ~ 4

Author(s):

Marvin Coto-Jiménez

Keyword(s):

Neural Networks ◽

Speech Synthesis ◽

Short Term Memory ◽

Synthetic Speech ◽

Efficient Manner ◽

Training Approach ◽

Random Initialization ◽

Speech Spectrum ◽

Lstm Network

Several researchers have contemplated deep learning-based post-filters to increase the quality of statistical parametric speech synthesis, which perform a mapping of the synthetic speech to the natural speech, considering the different parameters separately and trying to reduce the gap between them. The Long Short-term Memory (LSTM) Neural Networks have been applied successfully in this purpose, but there are still many aspects to improve in the results and in the process itself. In this paper, we introduce a new pre-training approach for the LSTM, with the objective of enhancing the quality of the synthesized speech, particularly in the spectrum, in a more efficient manner. Our approach begins with an auto-associative training of one LSTM network, which is used as an initialization for the post-filters. We show the advantages of this initialization for the enhancing of the Mel-Frequency Cepstral parameters of synthetic speech. Results show that the initialization succeeds in achieving better results in enhancing the statistical parametric speech spectrum in most cases when compared to the common random initialization approach of the networks.

Download Full-text

A Low-Complexity 3.6kbps Speech Coding Algorithm Based on Mixed Excitation

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.401-403.1282 ◽

2013 ◽

Vol 401-403 ◽

pp. 1282-1286

Author(s):

Qiang Li ◽

Li Zhen Wang ◽

Xu Jiu Xia

Keyword(s):

Computational Complexity ◽

High Frequency ◽

Speech Coding ◽

Low Complexity ◽

Speech Quality ◽

Bit Allocation ◽

Synthetic Speech ◽

Scalar Quantization ◽

Decision Error

A low-complexity 3.6kb/s speech coding algorithm based on mixed excitation is presented in this paper. It uses the parameter encoding and mixed excitation technology to ensure the quality of speech. Through adopting the scalar quantization of Line Spectrum Frequency (LSF), the algorithm reduces the storage and computational complexity. Meanwhile, improved frame type with dynamic Unvoiced/Voiced (U/V) thresholds make a reduction of the traditional U/V decision error and the sudden transformation of U/V frame. A modified bit allocation table is introduced and the PESQ-MOS test shows that the synthetic speech quality has been improved and reached the quality of communication, especially for high frequency female speakers with new frame type.

Download Full-text

Quality of Synthetic Speech

10.1007/978-981-10-3734-4 ◽

2017 ◽

Cited By ~ 1

Author(s):

Florian Hinterleitner

Keyword(s):

Synthetic Speech

Download Full-text

Experiment with Evaluation of Quality of the Synthetic Speech by the GMM Classifier

Improve the Quality of Synthetic Speech Trained with Found Data using Silence Cutter

Stress and Timing as they Influence the Judged Quality of Synthetic Speech

A Rate of 4kbps Vocoder Based on MELP

GMM-Based Evaluation of Synthetic Speech Quality Using 2D Classification in Pleasure-Arousal Scale

Improving Post-Filtering of Artificial Speech Using Pre-Trained LSTM Neural Networks

Evaluating the Quality of Synthetic Speech

Memory, comprehension, and judged quality of synthetic speech under several prosodic conditions

Improving Post-Filtering of Artificial Speech Using Pre-Trained LSTM Neural Networks

A Low-Complexity 3.6kbps Speech Coding Algorithm Based on Mixed Excitation

Quality of Synthetic Speech

Export Citation Format