Comparative Evaluation of Listeners Perceptions on Synthetic Speech Quality

2019 ◽  
Vol 11 (11-SPECIAL ISSUE) ◽  
pp. 967-971
Author(s):  
Quennie Joy B. Mesa ◽  
Kyung-Tae Kim ◽  
Jing-Jon Kim
1991 ◽  
Vol 19 (1) ◽  
pp. 139-146 ◽  
Author(s):  
Louis C.W. Pols ◽  
Renée van Bezooijen

2014 ◽  
Vol 1030-1032 ◽  
pp. 1638-1641
Author(s):  
Yan Ming ◽  
Li Zhen Wang ◽  
Xu Jiu Xia

A 4kbps vocoder based on MELP is presented in this paper. It uses the parameter encoding and mixed excitation technology to ensure the quality of speech. Through adopting the scalar quantization of Line Spectrum Frequency (LSF), the algorithm reduces the storage and computational complexity. Meanwhile, 4kbps vocoder adds a new frame type-transition frame. The classifier can reduce the U/V decision errors and avoid excessive switching between voiced frame and unvoiced frame. A modified bit allocation table is introduced and the PESQ-MOS and coding time test shows that the synthetic speech quality has been improved and reached the quality of communication.


2020 ◽  
Vol 11 (1) ◽  
pp. 2
Author(s):  
Jiří Přibil ◽  
Anna Přibilová ◽  
Jindřich Matoušek

The paper focuses on the description of a system for the automatic evaluation of synthetic speech quality based on the Gaussian mixture model (GMM) classifier. The speech material originating from a real speaker is compared with synthesized material to determine similarities or differences between them. The final evaluation order is determined by distances in the Pleasure-Arousal (P-A) space between the original and synthetic speech using different synthesis and/or prosody manipulation methods implemented in the Czech text-to-speech system. The GMM models for continual 2D detection of P-A classes are trained using the sound/speech material from the databases without any relation to the original speech or the synthesized sentences. Preliminary and auxiliary analyses show a substantial influence of the number of mixtures, the number and type of the speech features used the size of the processed speech material, as well as the type of the database used for the creation of the GMMs on the P-A classification process and on the final evaluation result. The main evaluation experiments confirm the functionality of the system developed. The objective evaluation results obtained are principally correlated with the subjective ratings of human evaluators; however, partial differences were indicated, so a subsequent detailed investigation must be performed.


Psichologija ◽  
2002 ◽  
Vol 25 ◽  
pp. 72-96 ◽  
Author(s):  
Albinas Bagdonas ◽  
Feliksas Laugalys

Straipsnyje pateikiami kelių versijų lietuviškos ir rusiškos sintetinės šnekos suprantamumo ir lietuviškos, rusiškos, vengriškos bei itališkos sintetinių šnekų patrauklumo duomenys. Lietuvių ir rusų diktorių kalba yra suprantamesnė nei atitinkama sintetinė. Ankstesnė rusiškos šnekos sintezė blogesnė nei lietuviška ar patobulinta rusiška sintezė (PRS). Pagal sintetinamų garsų charakteristikas aiškėja dvi priešingos PRS tendencijos - pagal bendrą atpažinimo klaidų mažėjimą ji artėja prie natūralios šnekos, tačiau pagal klaidų homogeniškumą nuo pastarosios tolsta. Kadangi pirmoji tendencija vyrauja, bendra atstojamoji rodo PRS gerėjimą.PRS suprantamumo ir patrauklumo koreliacija taip pat rodo jos didesnį artumą natūraliai šnekai. Tiriamiesiems PRS yra patrauklesnė nei ankstesnė rusiškos sintezės versija. Pastaroji, tiriamųjų nuomone, panašesne į roboto šneką, o PRS - į blogą, tačiau jau žmogaus šnekos versiją.Pagal patrauklumo duomenis natūralią šneką labiausiai vertina vengrų klausytojai, o kritiškiausi jos atžvilgiu yra italai. Visos tirtos sintetinių šnekų versijos vertinamos kaip mažiau patrauklios nei natūrali šneka, tačiau jas patobulinus šis vertinimas švelnėja. EVALUATION OF SYNTHETIC SPEECH QUALITY: A COMPARATIVE STUDY OF SEVERAL COMPUTER-BASED SPEECH SYNTHESIZERS Albinas Bagdonas, Feliksas Laugalys SummaryThis paper examines some versions of Lithuanian and Russian synthetic speech intelligibility and Lithuanian, Russian, Hungarian and Italian synthetic speech acceptability. The speech of both Russian and Lithuanian speaker is more intelligible than Russian or Lithuanian synthesis. Previous version of Russian synthesis is worse than Lithuanian and improved Russian synthesis (IRS). Study of characteristics of IRS sounds shows two opposite tendencies - according to the general quantity of mistake reduction this version is tending towards the natural speech, but according to the homogeneity of mistakes, it moves away. As the first tendency is clearly dominant, the general resultant in the new version shows a tend to improve. Correlation between intelligibility and acceptability of IRS deals possibility of small progress towards the natural speech. The IRS is more acceptable to subjects than previous version. The old synthesis is viewed as a rather decent instance of a robot's speech, while the IRS - as a poor variant of human speech. Acceptability studies showed natural speech more enjoyed by Hungarian listeners and more critical by Italian. All versions of synthetic speech were judged as less acceptable than natural but after improvement most of listeners changed their mind.


1991 ◽  
Vol 89 (4B) ◽  
pp. 1893-1893
Author(s):  
Tohru Shimizu ◽  
Seiichi Yamamoto ◽  
Norio Higuchi ◽  
Hisashi Kawai

2014 ◽  
Vol 2014 ◽  
pp. 1-12 ◽  
Author(s):  
Carlos Monzo ◽  
Ignasi Iriondo ◽  
Joan Claudi Socoró

This paper presents the perceptual experiments that were carried out in order to validate the methodology of transforming expressive speech styles using voice quality (VoQ) parameters modelling, along with the well-known prosody (F0, duration, and energy), from a neutral style into a number of expressive ones. The main goal was to validate the usefulness of VoQ in the enhancement of expressive synthetic speech in terms of speech quality and style identification. A harmonic plus noise model (HNM) was used to modify VoQ and prosodic parameters that were extracted from an expressive speech corpus. Perception test results indicated the improvement of obtained expressive speech styles using VoQ modelling along with prosodic characteristics.


Sign in / Sign up

Export Citation Format

Share Document