Subjective and objective measurement of synthesized speech intelligibility in modern telephone conditions

2015 ◽  
Vol 71 ◽  
pp. 1-9 ◽  
Author(s):  
Peter Počta ◽  
John G. Beerends
2002 ◽  
Author(s):  
L. Dai ◽  
Y. Ma ◽  
D. J. Caswell

In the current literature, most assessments for speech privacy and speech intelligibility are relying on the subjective measurements utilized with the test materials of English and other Western languages. Effects of different languages and accents in speech privacy and speech intelligibility are usually overseen. This study aims at the speech privacy assessment of closed offices in multicultural environments. Subjective measurements are conducted in this study for closed offices by using English and a tonal language. The evaluation differences in speech privacy between the two languages are evident and significant. It is also found in this study that the existing single word tests used in research and industrial practice for subjectively evaluating speech privacy should be modified when closed spaces are considered. The subjective measurement results of this study are also compared with the objective measurement indices AI.


2006 ◽  
Vol 22 (4) ◽  
pp. 258-268 ◽  
Author(s):  
Diane Mayasari Alamsaputra ◽  
Kathryn J. Kohnert ◽  
Benjamin Munson ◽  
Joe Reichle

2007 ◽  
Vol 21 (4) ◽  
pp. 641-651 ◽  
Author(s):  
Caroline Jones ◽  
Lynn Berry ◽  
Catherine Stevens

Author(s):  
Vladimir Avdeev ◽  
Viktor Trushin ◽  
Mihail Kungurov

The paper considers the possibility of creating a speech-like interference for the means of vibro-acoustic protection of speech information based on tables of syllables and words of the Russian language. The choice of research directions and experimental conditions is substantiated: synthesis of sound files by random sampling of speech elements from a database, research of spectra of synthesized noise, algorithm for creating interference of the “speech choir” type, study of autocorrelation functions of synthesized speech-like interference, as well as their probability distribution density. It is shown that the spectral and statistical characteristics of the synthesized speech-like interference type "speech choir" of five voices are close to similar characteristics of real speech signals. At the same time, the speech choir was formed by averaging the instantaneous values of temporary realizations of sound files. It is shown that the spectral power density of the speech-like interference of the “speech choir” type practically is not changed with the number of averaged “voices” starting from five. The probability density distribution of the speech-like interference value with an increase in the number of voices in the “speech choir” approaches the normal law (unlike a real speech signal whose probability density is close to the Laplace distribution). Evaluation of the autocorrelation function gave a correlation interval of several milliseconds. The articulation tests of speech intelligibility using synthesized speech-like interference with different signal-to-noise ratios showed the possibility of reducing the integral noise level by 12-15 dB compared to noise-like interference. The dependencies of verbal intelligibility on the integral signal-to-noise ratio are constructed on the basis of polynomial and piecewise linear approximations. A preliminary assessment of a possible impact of speech-like interference on the psycho-emotional state of a person was performed. The direction of further research on increasing the efficiency of algorithms for generating speech-like interference is discussed.


1987 ◽  
Vol 31 (9) ◽  
pp. 961-965 ◽  
Author(s):  
Monica A. Merva ◽  
Beverly H. Williges

Two studies were conducted to explore the effects of various parameters on rule-based synthetic speech intelligibility. Experiment I examined the effect of situational context clues and speech rate on synthesized speech intelligibility. Subjects who received pragmatic context information prior to each message had transcription error rates 50% lower than those who received no context information. Speech rates of 250 words per minute (wpm) yielded significantly more transcription errors than rates of 180 wpm. In Experiment II, the effects of speech rate, message repetition, and location of information in a message were examined. Transcription accuracy was best for messages spoken at 150 or 180 wpm and for messages repeated either twice or three times. Words at the end of messages were transcribed more accurately than words at the beginning of messages. Subjective ratings indicated that subjects were aware of errors when incorrectly transcribing a message even though no feedback was provided.


Sign in / Sign up

Export Citation Format

Share Document