Comparative Evaluation of Listeners Perceptions on Synthetic Speech Quality

A 4kbps vocoder based on MELP is presented in this paper. It uses the parameter encoding and mixed excitation technology to ensure the quality of speech. Through adopting the scalar quantization of Line Spectrum Frequency (LSF), the algorithm reduces the storage and computational complexity. Meanwhile, 4kbps vocoder adds a new frame type-transition frame. The classifier can reduce the U/V decision errors and avoid excessive switching between voiced frame and unvoiced frame. A modified bit allocation table is introduced and the PESQ-MOS and coding time test shows that the synthetic speech quality has been improved and reached the quality of communication.

Download Full-text

GMM-Based Evaluation of Synthetic Speech Quality Using 2D Classification in Pleasure-Arousal Scale

Applied Sciences ◽

10.3390/app11010002 ◽

2020 ◽

Vol 11 (1) ◽

pp. 2

Author(s):

Jiří Přibil ◽

Anna Přibilová ◽

Jindřich Matoušek

Keyword(s):

Objective Evaluation ◽

Gaussian Mixture ◽

Speech Quality ◽

Final Evaluation ◽

Synthetic Speech ◽

Subjective Ratings ◽

Text To Speech ◽

Speech Features ◽

Gmm Classifier ◽

Original Speech

The paper focuses on the description of a system for the automatic evaluation of synthetic speech quality based on the Gaussian mixture model (GMM) classifier. The speech material originating from a real speaker is compared with synthesized material to determine similarities or differences between them. The final evaluation order is determined by distances in the Pleasure-Arousal (P-A) space between the original and synthetic speech using different synthesis and/or prosody manipulation methods implemented in the Czech text-to-speech system. The GMM models for continual 2D detection of P-A classes are trained using the sound/speech material from the databases without any relation to the original speech or the synthesized sentences. Preliminary and auxiliary analyses show a substantial influence of the number of mixtures, the number and type of the speech features used the size of the processed speech material, as well as the type of the database used for the creation of the GMMs on the P-A classification process and on the final evaluation result. The main evaluation experiments confirm the functionality of the system developed. The objective evaluation results obtained are principally correlated with the subjective ratings of human evaluators; however, partial differences were indicated, so a subsequent detailed investigation must be performed.

Download Full-text

SINTETINĖS ŠNEKOS KOKYBĖS VERTINIMAS: KELIŲ KOMPIUTERINIŲ SINTEZATORIŲ LYGINAMASIS TYRIMAS

Psichologija ◽

10.15388/psichol.2002..4402 ◽

2002 ◽

Vol 25 ◽

pp. 72-96 ◽

Cited By ~ 1

Author(s):

Albinas Bagdonas ◽

Feliksas Laugalys

Keyword(s):

Comparative Study ◽

Speech Intelligibility ◽

Speech Quality ◽

Natural Speech ◽

Synthetic Speech ◽

Human Speech ◽

Previous Version ◽

Computer Based ◽

Improve Correlation

Straipsnyje pateikiami kelių versijų lietuviškos ir rusiškos sintetinės šnekos suprantamumo ir lietuviškos, rusiškos, vengriškos bei itališkos sintetinių šnekų patrauklumo duomenys. Lietuvių ir rusų diktorių kalba yra suprantamesnė nei atitinkama sintetinė. Ankstesnė rusiškos šnekos sintezė blogesnė nei lietuviška ar patobulinta rusiška sintezė (PRS). Pagal sintetinamų garsų charakteristikas aiškėja dvi priešingos PRS tendencijos - pagal bendrą atpažinimo klaidų mažėjimą ji artėja prie natūralios šnekos, tačiau pagal klaidų homogeniškumą nuo pastarosios tolsta. Kadangi pirmoji tendencija vyrauja, bendra atstojamoji rodo PRS gerėjimą.PRS suprantamumo ir patrauklumo koreliacija taip pat rodo jos didesnį artumą natūraliai šnekai. Tiriamiesiems PRS yra patrauklesnė nei ankstesnė rusiškos sintezės versija. Pastaroji, tiriamųjų nuomone, panašesne į roboto šneką, o PRS - į blogą, tačiau jau žmogaus šnekos versiją.Pagal patrauklumo duomenis natūralią šneką labiausiai vertina vengrų klausytojai, o kritiškiausi jos atžvilgiu yra italai. Visos tirtos sintetinių šnekų versijos vertinamos kaip mažiau patrauklios nei natūrali šneka, tačiau jas patobulinus šis vertinimas švelnėja. EVALUATION OF SYNTHETIC SPEECH QUALITY: A COMPARATIVE STUDY OF SEVERAL COMPUTER-BASED SPEECH SYNTHESIZERS Albinas Bagdonas, Feliksas Laugalys SummaryThis paper examines some versions of Lithuanian and Russian synthetic speech intelligibility and Lithuanian, Russian, Hungarian and Italian synthetic speech acceptability. The speech of both Russian and Lithuanian speaker is more intelligible than Russian or Lithuanian synthesis. Previous version of Russian synthesis is worse than Lithuanian and improved Russian synthesis (IRS). Study of characteristics of IRS sounds shows two opposite tendencies - according to the general quantity of mistake reduction this version is tending towards the natural speech, but according to the homogeneity of mistakes, it moves away. As the first tendency is clearly dominant, the general resultant in the new version shows a tend to improve. Correlation between intelligibility and acceptability of IRS deals possibility of small progress towards the natural speech. The IRS is more acceptable to subjects than previous version. The old synthesis is viewed as a rather decent instance of a robot's speech, while the IRS - as a poor variant of human speech. Acceptability studies showed natural speech more enjoyed by Hungarian listeners and more critical by Italian. All versions of synthetic speech were judged as less acceptable than natural but after improvement most of listeners changed their mind.

Download Full-text

Evaluation of Synthetic Speech Quality by Statistical Analysis of Voiced and Unvoiced Part Durations

2018 41st International Conference on Telecommunications and Signal Processing (TSP) ◽

10.1109/tsp.2018.8441352 ◽

2018 ◽

Author(s):

Jiri Pribil ◽

Anna Pribilova ◽

Jindrich Matousek

Keyword(s):

Statistical Analysis ◽

Speech Quality ◽

Synthetic Speech

Download Full-text

Improvement of synthetic speech quality through syntactic information

The Journal of the Acoustical Society of America ◽

10.1121/1.2029411 ◽

1991 ◽

Vol 89 (4B) ◽

pp. 1893-1893

Author(s):

Tohru Shimizu ◽

Seiichi Yamamoto ◽

Norio Higuchi ◽

Hisashi Kawai

Keyword(s):

Speech Quality ◽

Synthetic Speech ◽

Syntactic Information

Download Full-text

Voice Quality Modelling for Expressive Speech Synthesis

The Scientific World JOURNAL ◽

10.1155/2014/627189 ◽

2014 ◽

Vol 2014 ◽

pp. 1-12 ◽

Cited By ~ 1

Author(s):

Carlos Monzo ◽

Ignasi Iriondo ◽

Joan Claudi Socoró

Keyword(s):

Speech Synthesis ◽

Voice Quality ◽

Speech Quality ◽

Noise Model ◽

Synthetic Speech ◽

Test Results ◽

Speech Corpus ◽

Expressive Speech ◽

Speech Styles

This paper presents the perceptual experiments that were carried out in order to validate the methodology of transforming expressive speech styles using voice quality (VoQ) parameters modelling, along with the well-known prosody (F0, duration, and energy), from a neutral style into a number of expressive ones. The main goal was to validate the usefulness of VoQ in the enhancement of expressive synthetic speech in terms of speech quality and style identification. A harmonic plus noise model (HNM) was used to modify VoQ and prosodic parameters that were extracted from an expressive speech corpus. Perception test results indicated the improvement of obtained expressive speech styles using VoQ modelling along with prosodic characteristics.

Download Full-text