Enhancing Speech Intelligibility: Interactions Among Context, Modality, Speech Style, and Masker

2014 ◽  
Vol 57 (5) ◽  
pp. 1908-1918 ◽  
Author(s):  
Kristin J. Van Engen ◽  
Jasmine E. B. Phelps ◽  
Rajka Smiljanic ◽  
Bharath Chandrasekaran

Purpose The authors sought to investigate interactions among intelligibility-enhancing speech cues (i.e., semantic context, clearly produced speech, and visual information) across a range of masking conditions. Method Sentence recognition in noise was assessed for 29 normal-hearing listeners. Testing included semantically normal and anomalous sentences, conversational and clear speaking styles, auditory-only (AO) and audiovisual (AV) presentation modalities, and 4 different maskers (2-talker babble, 4-talker babble, 8-talker babble, and speech-shaped noise). Results Semantic context, clear speech, and visual input all improved intelligibility but also interacted with one another and with masking condition. Semantic context was beneficial across all maskers in AV conditions but only in speech-shaped noise in AO conditions. Clear speech provided the most benefit for AV speech with semantically anomalous targets. Finally, listeners were better able to take advantage of visual information for meaningful versus anomalous sentences and for clear versus conversational speech. Conclusion Because intelligibility-enhancing cues influence each other and depend on masking condition, multiple maskers and enhancement cues should be used to accurately assess individuals' speech-in-noise perception.

1997 ◽  
Vol 40 (2) ◽  
pp. 432-443 ◽  
Author(s):  
Karen S. Helfer

Research has shown that speaking in a deliberately clear manner can improve the accuracy of auditory speech recognition. Allowing listeners access to visual speech cues also enhances speech understanding. Whether the nature of information provided by speaking clearly and by using visual speech cues is redundant has not been determined. This study examined how speaking mode (clear vs. conversational) and presentation mode (auditory vs. auditory-visual) influenced the perception of words within nonsense sentences. In Experiment 1, 30 young listeners with normal hearing responded to videotaped stimuli presented audiovisually in the presence of background noise at one of three signal-to-noise ratios. In Experiment 2, 9 participants returned for an additional assessment using auditory-only presentation. Results of these experiments showed significant effects of speaking mode (clear speech was easier to understand than was conversational speech) and presentation mode (auditoryvisual presentation led to better performance than did auditory-only presentation). The benefit of clear speech was greater for words occurring in the middle of sentences than for words at either the beginning or end of sentences for both auditory-only and auditory-visual presentation, whereas the greatest benefit from supplying visual cues was for words at the end of sentences spoken both clearly and conversationally. The total benefit from speaking clearly and supplying visual cues was equal to the sum of each of these effects. Overall, the results suggest that speaking clearly and providing visual speech information provide complementary (rather than redundant) information.


2012 ◽  
Vol 132 (3) ◽  
pp. 2080-2080
Author(s):  
Jasmine Beitz ◽  
Kristin Van Engen ◽  
Rajka Smiljanic ◽  
Bharath Chandrasekaran

2021 ◽  
Author(s):  
Hoyoung Yi ◽  
Ashly Pingsterhaus ◽  
Woonyoung Song

The coronavirus pandemic has resulted in recommended/required use of a face mask in public. The use of a face mask compromises communication, especially in the presence of competing noise. It is crucial to measure potential adverse effect(s) of wearing face masks on speech intelligibility in communication contexts where excessive background noise occurs to lead to solutions for this communication challenge. Accordingly, effects of wearing transparent face masks and using clear speech to support better verbal communication was evaluated here. We evaluated listener word identification scores in the following four conditions: (1) type of masking (i.e., no mask, transparent mask, and disposable paper mask), (2) presentation mode (i.e., auditory only and audiovisual), (3) speaker speaking style (i.e., conversational speech and clear speech), and (4) with two types of background noise (i.e., speech shaped noise and four-talker babble at negative 5 signal to noise ratio levels). Results showed that in the presence of noise, listeners performed less well when the speaker wore a disposable paper mask or a transparent mask compared to wearing no mask. Listeners correctly identified more words in the audiovisual when listening to clear speech. Results indicate the combination of face masks and the presence of background noise impact speech intelligibility negatively for listeners. Transparent masks facilitate the ability to understand target sentences by providing visual information. Use of clear speech was shown to alleviate challenging communication situations including lack of visual cues and reduced acoustic signal.


Author(s):  
Su Yeon Shin ◽  
Hongyeop Oh ◽  
In-Ki Jin

Abstract Background Clear speech is an effective communication strategy to improve speech intelligibility. While clear speech in several languages has been shown to significantly benefit intelligibility among listeners with differential hearing sensitivities and across environments of different noise levels, whether these results apply to Korean clear speech is unclear on account of the language's unique acoustic and linguistic characteristics. Purpose This study aimed to measure the intelligibility benefits of Korean clear speech relative to those of conversational speech among listeners with normal hearing and hearing loss. Research Design We used a mixed-model design that included both within-subject (effects of speaking style and listening condition) and between-subject (hearing status) elements. Data Collection and Analysis We compared the rationalized arcsine unit scores, which were transformed from the number of keywords recognized and repeated, between clear and conversational speech in groups with different hearing sensitivities across five listening conditions (quiet and 10, 5, 0, and –5 dB signal-to-noise ratio) using a mixed model analysis. Results The intelligibility scores of Korean clear speech were significantly higher than those of conversational speech under most listening conditions in all groups; the former yielded increases of 6 to 32 rationalized arcsine units in intelligibility. Conclusion The present study provides information on the actual benefits of Korean clear speech for listeners with varying hearing sensitivities. Audiologists or hearing professionals may use this information to establish communication strategies for Korean patients with hearing loss.


2020 ◽  
Vol 63 (12) ◽  
pp. 4265-4276
Author(s):  
Lauren Calandruccio ◽  
Heather L. Porter ◽  
Lori J. Leibold ◽  
Emily Buss

Purpose Talkers often modify their speech when communicating with individuals who struggle to understand speech, such as listeners with hearing loss. This study evaluated the benefit of clear speech in school-age children and adults with normal hearing for speech-in-noise and speech-in-speech recognition. Method Masked sentence recognition thresholds were estimated for school-age children and adults using an adaptive procedure. In Experiment 1, the target and masker were summed and presented over a loudspeaker located directly in front of the listener. The masker was either speech-shaped noise or two-talker speech, and target sentences were produced using a clear or conversational speaking style. In Experiment 2, stimuli were presented over headphones. The two-talker speech masker was diotic (M 0 ). Clear and conversational target sentences were presented either in-phase (T 0 ) or out-of-phase (T π ) between the two ears. The M 0 T π condition introduces a segregation cue that was expected to improve performance. Results For speech presented over a single loudspeaker (Experiment 1), the clear-speech benefit was independent of age for the noise masker, but it increased with age for the two-talker masker. Similar age effects for the two-talker speech masker were seen under headphones with diotic presentation (M 0 T 0 ), but comparable clear-speech benefit as a function of age was observed with a binaural cue to facilitate segregation (M 0 T π ). Conclusions Consistent with prior research, children showed a robust clear-speech benefit for speech-in-noise recognition. Immaturity in the ability to segregate target from masker speech may limit young children's ability to benefit from clear-speech modifications for speech-in-speech recognition under some conditions. When provided with a cue that facilitates segregation, children as young as 4–7 years of age derived a clear-speech benefit in a two-talker masker that was similar to the benefit experienced by adults.


1989 ◽  
Vol 32 (3) ◽  
pp. 600-603 ◽  
Author(s):  
M. A. Picheny ◽  
N. I. Durlach ◽  
L. D. Braida

Previous studies (Picheny, Durlach, & Braida, 1985, 1986) have demonstrated that substantial intelligibility differences exist for hearing-impaired listeners for speech spoken clearly compared to speech spoken conversationally. This paper presents the results of a probe experiment intended to determine the contribution of speaking rate to the intelligibility differences. Clear sentences were processed to have the durational properties of conversational speech, and conversational sentences were processed to have the durational properties of clear speech. Intelligibility testing with hearing-impaired listeners revealed both sets of materials to be degraded after processing. However, the degradation could not be attributable to processing artifacts because reprocessing the materials to restore their original durations produced intelligibility scores close to those observed for the unprocessed materials. We conclude that the simple processing to alter the relative durations of the speech materials was not adequate to assess the contribution of speaking rate to the intelligibility differences; further studies are proposed to address this question.


2018 ◽  
Vol 61 (6) ◽  
pp. 1497-1516 ◽  
Author(s):  
Chiara Visentin ◽  
Nicola Prodi

Purpose The primary aim of this study was to develop and examine the potentials of a new speech-in-noise test in discriminating the favorable listening conditions targeted in the acoustical design of communication spaces. The test is based on the recognition and recall of disyllabic word sequences. A secondary aim was to compare the test with current speech-in-noise tests, assessing its benefits and limitations. Method Young adults (19–40 years old), self-reporting normal hearing, were presented with the newly developed Words Sequence Test (WST; 16 participants, Experiment 1) and with a consonant confusion test and a sentence recognition test (Experiment 2, 36 participants randomly assigned to the 2 tests). Participants performing the WST were presented with word sequences of different lengths (from 2 up to 6 words). Two listening conditions were selected: (a) no noise and no reverberation, and (b) reverberant, steady-state noise (Speech Transmission Index: 0.47). The tests were presented in a closed-set format; data on the number of words correctly recognized (speech intelligibility, IS) and the response times (RTs) were collected (onset RT, single words' RT). Results It was found that a sequence composed of 4 disyllabic words ensured both the full recognition score in quiet conditions and a significant decrease in IS results when noise and reverberation degraded the speech signal. RTs increased with the worsening of the listening conditions and the number of words of the sequence. The greatest onset RT variation was found when using a sequence of 4 words. In the comparison with current speech-in-noise tests, it was found that the WST maximized the IS difference between the selected listening conditions as well as the RT increase. Conclusions Overall, the results suggest that the new speech-in-noise test has good potentials in discriminating conditions with near-ceiling accuracy. As compared with current speech-in-noise tests, it appears that the WST with a 4-word sequence allows for a finer mapping of the acoustical design target conditions of public spaces through accuracy and onset RT data.


2021 ◽  
Vol 12 ◽  
Author(s):  
Hoyoung Yi ◽  
Ashly Pingsterhaus ◽  
Woonyoung Song

The coronavirus pandemic has resulted in the recommended/required use of face masks in public. The use of a face mask compromises communication, especially in the presence of competing noise. It is crucial to measure the potential effects of wearing face masks on speech intelligibility in noisy environments where excessive background noise can create communication challenges. The effects of wearing transparent face masks and using clear speech to facilitate better verbal communication were evaluated in this study. We evaluated listener word identification scores in the following four conditions: (1) type of mask condition (i.e., no mask, transparent mask, and disposable face mask), (2) presentation mode (i.e., auditory only and audiovisual), (3) speaking style (i.e., conversational speech and clear speech), and (4) with two types of background noise (i.e., speech shaped noise and four-talker babble at −5 signal-to-noise ratio). Results indicate that in the presence of noise, listeners performed less well when the speaker wore a disposable face mask or a transparent mask compared to wearing no mask. Listeners correctly identified more words in the audiovisual presentation when listening to clear speech. Results indicate the combination of face masks and the presence of background noise negatively impact speech intelligibility for listeners. Transparent masks facilitate the ability to understand target sentences by providing visual information. Use of clear speech was shown to alleviate challenging communication situations including compensating for a lack of visual cues and reduced acoustic signals.


2020 ◽  
Author(s):  
Anja Gieseler ◽  
Stephanie Rosemann ◽  
Maike Tahden ◽  
Kirsten C Wagener ◽  
Christiane Thiel ◽  
...  

Especially in challenging listening conditions, listeners can benefit from the audiovisual nature of speech by using visual information. Yet there exists great inter-individual variability, not only in understanding speech in noise, but also in the benefit obtained from additional visual cues. First empirical evidence suggests that the ability to integrate auditory and visual input, i.e. audiovisual integration, is altered in hearing impairment and is, at the same time, relevant for audiovisual speech intelligibility. The distinct role of mild hearing loss on audiovisual integration and the significance of these changes for speech intelligibility, however, need further scrutiny. Thus, here we investigated differences in audiovisual integration capacities between elderly, normal-hearing and hearing-impaired individuals using two tests of audiovisual integration (sound-induced flash illusion, McGurk task). To explore whether potential differences in audiovisual integration are meaningful for natural speech intelligibility, we then linked audiovisual integration capacities to speech-in-noise recognition using an audiovisual speech-reception threshold test, expecting this to reflect a more realistic listening scenario. Our results indicate that audiovisual integration abilities are already altered in mild hearing impairment, while the magnitude and direction of the effect depend on the specific test used. At the same time, audiovisual integration capacities seem relevant for predicting audiovisual speech intelligibility in noise, especially in those individuals with a hearing loss. We conclude that audiovisual integration abilities should therefore be considered for future predictions of speech recognition outcomes, which – in turn – should be assessed audiovisually, to account for the multisensory nature of speech and communication.


2015 ◽  
Vol 27 (7) ◽  
pp. 1344-1359 ◽  
Author(s):  
Sara Jahfari ◽  
Lourens Waldorp ◽  
K. Richard Ridderinkhof ◽  
H. Steven Scholte

Action selection often requires the transformation of visual information into motor plans. Preventing premature responses may entail the suppression of visual input and/or of prepared muscle activity. This study examined how the quality of visual information affects frontobasal ganglia (BG) routes associated with response selection and inhibition. Human fMRI data were collected from a stop task with visually degraded or intact face stimuli. During go trials, degraded spatial frequency information reduced the speed of information accumulation and response cautiousness. Effective connectivity analysis of the fMRI data showed action selection to emerge through the classic direct and indirect BG pathways, with inputs deriving form both prefrontal and visual regions. When stimuli were degraded, visual and prefrontal regions processing the stimulus information increased connectivity strengths toward BG, whereas regions evaluating visual scene content or response strategies reduced connectivity toward BG. Response inhibition during stop trials recruited the indirect and hyperdirect BG pathways, with input from visual and prefrontal regions. Importantly, when stimuli were nondegraded and processed fast, the optimal stop model contained additional connections from prefrontal to visual cortex. Individual differences analysis revealed that stronger prefrontal-to-visual connectivity covaried with faster inhibition times. Therefore, prefrontal-to-visual cortex connections appear to suppress the fast flow of visual input for the go task, such that the inhibition process can finish before the selection process. These results indicate response selection and inhibition within the BG to emerge through the interplay of top–down adjustments from prefrontal and bottom–up input from sensory cortex.


Sign in / Sign up

Export Citation Format

Share Document