Modelling the human-machine gap in speech reception: microscopic speech intelligibility prediction for normal-hearing subjects with an auditory model

Author(s):  
Tim Jürgens ◽  
Thomas Brand ◽  
Birger Kollmeier
2019 ◽  
Vol 62 (11) ◽  
pp. 4179-4195 ◽  
Author(s):  
Nicola Prodi ◽  
Chiara Visentin

Purpose This study examines the effects of reverberation and noise fluctuation on the response time (RT) to the auditory stimuli in a speech reception task. Method The speech reception task was presented to 76 young adults with normal hearing in 3 simulated listening conditions (1 anechoic, 2 reverberant). Speechlike stationary and fluctuating noise were used as maskers, in a wide range of signal-to-noise ratios. The speech-in-noise tests were presented in a closed-set format; data on speech intelligibility and RT (time elapsed from the offset of the auditory stimulus to the response selection) were collected. A slowing down in RTs was interpreted as an increase in listening effort. Results RTs slowed down in the more challenging signal-to-noise ratios, with increasing reverberation and for stationary compared to fluctuating noise, consistently with a fluctuating masking release scheme. When speech intelligibility was fixed, it was found that the estimated RTs were similar or faster for stationary compared to fluctuating noise, depending on the amount of reverberation. Conclusions The current findings add to the literature on listening effort for listeners with normal hearing by indicating that the addition of reverberation to fluctuating noise increases RT in a speech reception task. The results support the importance of integrating noise and reverberation to provide accurate predictors of real-world performance in clinical settings.


2015 ◽  
Vol 40 (1) ◽  
pp. 41-50
Author(s):  
Magdalena Krenz ◽  
Andrzej Wicher ◽  
Aleksander Sęk

Abstract To determine speech intelligibility using the test suggested by Ozimek et al. (2009), the subject composed sentences with the words presented on a computer screen. However, the number and the type of these words were chosen arbitrarily. The subject was always presented with 18, similarly sounding words. Therefore, the aim of this study was to determine whether the number and the type of alternative words used by Ozimek et al. (2009), had a significant influence on the speech intelligibility. The aim was also to determine an optimal number of alternative words: i.e., the number that did not affect the speech reception threshold (SRT) and not unduly lengthened the duration of the test. The study conducted using a group of 10 subjects with normal hearing showed that an increase in the number of words to choose from 12 to 30 increased the speech intelligibility by about 0.3 dB/6 words. The use of paronyms as alternative words as opposed to random words, leads to an increase in the speech intelligibility by about 0.6 dB, which is equivalent to a decrease in intelligibility by 15 percentage points. Enlarging the number of words to choose from, and switching alternative words to paronyms, led to an increase in response time from approximately 11 to 16 s. It seems that the use of paronyms as alternative words as well as using 12 or 18 words to choose from is the best choice when using the Polish Sentence Test (PST).


1994 ◽  
Vol 110 (1) ◽  
pp. 75-83 ◽  
Author(s):  
C SPEAKS ◽  
T TRINE ◽  
T CRAIN ◽  
N NICCUM

Author(s):  
Seong Hee Lee ◽  
Hyun Joon Shim ◽  
Sang Won Yoon ◽  
Kyoung Won Lee

2010 ◽  
Vol 10 ◽  
pp. 329-339 ◽  
Author(s):  
Torsten Rahne ◽  
Michael Ziese ◽  
Dorothea Rostalski ◽  
Roland Mühler

This paper describes a logatome discrimination test for the assessment of speech perception in cochlear implant users (CI users), based on a multilingual speech database, the Oldenburg Logatome Corpus, which was originally recorded for the comparison of human and automated speech recognition. The logatome discrimination task is based on the presentation of 100 logatome pairs (i.e., nonsense syllables) with balanced representations of alternating “vowel-replacement” and “consonant-replacement” paradigms in order to assess phoneme confusions. Thirteen adult normal hearing listeners and eight adult CI users, including both good and poor performers, were included in the study and completed the test after their speech intelligibility abilities were evaluated with an established sentence test in noise. Furthermore, the discrimination abilities were measured electrophysiologically by recording the mismatch negativity (MMN) as a component of auditory event-related potentials. The results show a clear MMN response only for normal hearing listeners and CI users with good performance, correlating with their logatome discrimination abilities. Higher discrimination scores for vowel-replacement paradigms than for the consonant-replacement paradigms were found. We conclude that the logatome discrimination test is well suited to monitor the speech perception skills of CI users. Due to the large number of available spoken logatome items, the Oldenburg Logatome Corpus appears to provide a useful and powerful basis for further development of speech perception tests for CI users.


1976 ◽  
Vol 19 (2) ◽  
pp. 279-289 ◽  
Author(s):  
Randall B. Monsen

Although it is well known that the speech produced by the deaf is generally of low intelligibility, the sources of this low speech intelligibility have generally been ascribed either to aberrant articulation of phonemes or inappropriate prosody. This study was designed to determine to what extent a nonsegmental aspect of speech, formant transitions, may differ in the speech of the deaf and of the normal hearing. The initial second formant transitions of the vowels /i/ and /u/ after labial and alveolar consonants (/b, d, f/) were compared in the speech of six normal-hearing and six hearing-impaired adolescents. In the speech of the hearing-impaired subjects, the second formant transitions may be reduced both in time and in frequency. At its onset, the second formant may be nearer to its eventual target frequency than in the speech of the normal subjects. Since formant transitions are important acoustic cues for the adjacent consonants, reduced F 2 transitions may be an important factor in the low intelligibility of the speech of the deaf.


2021 ◽  
Vol 69 (2) ◽  
pp. 173-179
Author(s):  
Nilolina Samardzic ◽  
Brian C.J. Moore

Traditional methods for predicting the intelligibility of speech in the presence of noise inside a vehicle, such as the Articulation Index (AI), the Speech Intelligibility Index (SII), and the Speech Transmission Index (STI), are not accurate, probably because they do not take binaural listening into account; the signals reaching the two ears can differ markedly depending on the positions of the talker and listener. We propose a new method for predicting the intelligibility of speech in a vehicle, based on the ratio of the binaural loudness of the speech to the binaural loudness of the noise, each calculated using the method specified in ISO 532-2 (2017). The method was found to give accurate predictions of the speech reception threshold (SRT) measured under a variety of conditions and for different positions of the talker and listener in a car. The typical error in the predicted SRT was 1.3 dB, which is markedly smaller than estimated using the SII and STI (2.0 dB and 2.1 dB, respectively).


Sign in / Sign up

Export Citation Format

Share Document