Enhancing Speech Intelligibility: Interactions Among Context, Modality, Speech Style, and Masker

Purpose The authors sought to investigate interactions among intelligibility-enhancing speech cues (i.e., semantic context, clearly produced speech, and visual information) across a range of masking conditions. Method Sentence recognition in noise was assessed for 29 normal-hearing listeners. Testing included semantically normal and anomalous sentences, conversational and clear speaking styles, auditory-only (AO) and audiovisual (AV) presentation modalities, and 4 different maskers (2-talker babble, 4-talker babble, 8-talker babble, and speech-shaped noise). Results Semantic context, clear speech, and visual input all improved intelligibility but also interacted with one another and with masking condition. Semantic context was beneficial across all maskers in AV conditions but only in speech-shaped noise in AO conditions. Clear speech provided the most benefit for AV speech with semantically anomalous targets. Finally, listeners were better able to take advantage of visual information for meaningful versus anomalous sentences and for clear versus conversational speech. Conclusion Because intelligibility-enhancing cues influence each other and depend on masking condition, multiple maskers and enhancement cues should be used to accurately assess individuals' speech-in-noise perception.

Download Full-text

Auditory and Auditory-Visual Perception of Clear and Conversational Speech

Journal of Speech Language and Hearing Research ◽

10.1044/jslhr.4002.432 ◽

1997 ◽

Vol 40 (2) ◽

pp. 432-443 ◽

Cited By ~ 82

Author(s):

Karen S. Helfer

Keyword(s):

Visual Cues ◽

Presentation Mode ◽

Visual Presentation ◽

Visual Speech ◽

Conversational Speech ◽

Clear Speech ◽

Nature Of Information ◽

Speech Cues ◽

Visual Speech Information ◽

Speech Information

Research has shown that speaking in a deliberately clear manner can improve the accuracy of auditory speech recognition. Allowing listeners access to visual speech cues also enhances speech understanding. Whether the nature of information provided by speaking clearly and by using visual speech cues is redundant has not been determined. This study examined how speaking mode (clear vs. conversational) and presentation mode (auditory vs. auditory-visual) influenced the perception of words within nonsense sentences. In Experiment 1, 30 young listeners with normal hearing responded to videotaped stimuli presented audiovisually in the presence of background noise at one of three signal-to-noise ratios. In Experiment 2, 9 participants returned for an additional assessment using auditory-only presentation. Results of these experiments showed significant effects of speaking mode (clear speech was easier to understand than was conversational speech) and presentation mode (auditoryvisual presentation led to better performance than did auditory-only presentation). The benefit of clear speech was greater for words occurring in the middle of sentences than for words at either the beginning or end of sentences for both auditory-only and auditory-visual presentation, whereas the greatest benefit from supplying visual cues was for words at the end of sentences spoken both clearly and conversationally. The total benefit from speaking clearly and supplying visual cues was equal to the sum of each of these effects. Overall, the results suggest that speaking clearly and providing visual speech information provide complementary (rather than redundant) information.

Download Full-text

Effects of visual cue enhancement on speech intelligibility for clear and conversational speech in noise

The Journal of the Acoustical Society of America ◽

10.1121/1.4755679 ◽

2012 ◽

Vol 132 (3) ◽

pp. 2080-2080

Author(s):

Jasmine Beitz ◽

Kristin Van Engen ◽

Rajka Smiljanic ◽

Bharath Chandrasekaran

Keyword(s):

Speech Intelligibility ◽

Conversational Speech ◽

Visual Cue ◽

Speech In Noise

Download Full-text

The adverse effect of wearing a face mask during the COVID-19 pandemic and benefits of wearing transparent face masks and using clear speech on speech intelligibility

10.31234/osf.io/z9c4k ◽

2021 ◽

Author(s):

Hoyoung Yi ◽

Ashly Pingsterhaus ◽

Woonyoung Song

Keyword(s):

Adverse Effect ◽

Background Noise ◽

Visual Information ◽

Speech Intelligibility ◽

Visual Cues ◽

Word Identification ◽

Signal To Noise Ratio ◽

Face Mask ◽

Presentation Mode ◽

Clear Speech

The coronavirus pandemic has resulted in recommended/required use of a face mask in public. The use of a face mask compromises communication, especially in the presence of competing noise. It is crucial to measure potential adverse effect(s) of wearing face masks on speech intelligibility in communication contexts where excessive background noise occurs to lead to solutions for this communication challenge. Accordingly, effects of wearing transparent face masks and using clear speech to support better verbal communication was evaluated here. We evaluated listener word identification scores in the following four conditions: (1) type of masking (i.e., no mask, transparent mask, and disposable paper mask), (2) presentation mode (i.e., auditory only and audiovisual), (3) speaker speaking style (i.e., conversational speech and clear speech), and (4) with two types of background noise (i.e., speech shaped noise and four-talker babble at negative 5 signal to noise ratio levels). Results showed that in the presence of noise, listeners performed less well when the speaker wore a disposable paper mask or a transparent mask compared to wearing no mask. Listeners correctly identified more words in the audiovisual when listening to clear speech. Results indicate the combination of face masks and the presence of background noise impact speech intelligibility negatively for listeners. Transparent masks facilitate the ability to understand target sentences by providing visual information. Use of clear speech was shown to alleviate challenging communication situations including lack of visual cues and reduced acoustic signal.

Download Full-text

Korean Clear Speech Improves Speech Intelligibility for Individuals with Normal Hearing and Individuals with Hearing Loss

Journal of the American Academy of Audiology ◽

10.1055/s-0040-1719134 ◽

2021 ◽

Author(s):

Su Yeon Shin ◽

Hongyeop Oh ◽

In-Ki Jin

Keyword(s):

Hearing Loss ◽

Speech Intelligibility ◽

Mixed Model ◽

Signal To Noise Ratio ◽

Normal Hearing ◽

Communication Strategy ◽

Conversational Speech ◽

Clear Speech ◽

Mixed Model Analysis ◽

Data Collection And Analysis

Abstract Background Clear speech is an effective communication strategy to improve speech intelligibility. While clear speech in several languages has been shown to significantly benefit intelligibility among listeners with differential hearing sensitivities and across environments of different noise levels, whether these results apply to Korean clear speech is unclear on account of the language's unique acoustic and linguistic characteristics. Purpose This study aimed to measure the intelligibility benefits of Korean clear speech relative to those of conversational speech among listeners with normal hearing and hearing loss. Research Design We used a mixed-model design that included both within-subject (effects of speaking style and listening condition) and between-subject (hearing status) elements. Data Collection and Analysis We compared the rationalized arcsine unit scores, which were transformed from the number of keywords recognized and repeated, between clear and conversational speech in groups with different hearing sensitivities across five listening conditions (quiet and 10, 5, 0, and –5 dB signal-to-noise ratio) using a mixed model analysis. Results The intelligibility scores of Korean clear speech were significantly higher than those of conversational speech under most listening conditions in all groups; the former yielded increases of 6 to 32 rationalized arcsine units in intelligibility. Conclusion The present study provides information on the actual benefits of Korean clear speech for listeners with varying hearing sensitivities. Audiologists or hearing professionals may use this information to establish communication strategies for Korean patients with hearing loss.

Download Full-text

The Clear-Speech Benefit for School-Age Children: Speech-in-Noise and Speech-in-Speech Recognition

Journal of Speech Language and Hearing Research ◽

10.1044/2020_jslhr-20-00353 ◽

2020 ◽

Vol 63 (12) ◽

pp. 4265-4276

Author(s):

Lauren Calandruccio ◽

Heather L. Porter ◽

Lori J. Leibold ◽

Emily Buss

Keyword(s):

Speech Recognition ◽

School Age Children ◽

School Age ◽

Clear Speech ◽

Adaptive Procedure ◽

Sentence Recognition ◽

Speech In Noise ◽

Noise Masker ◽

Children Speech ◽

Speaking Style

Purpose Talkers often modify their speech when communicating with individuals who struggle to understand speech, such as listeners with hearing loss. This study evaluated the benefit of clear speech in school-age children and adults with normal hearing for speech-in-noise and speech-in-speech recognition. Method Masked sentence recognition thresholds were estimated for school-age children and adults using an adaptive procedure. In Experiment 1, the target and masker were summed and presented over a loudspeaker located directly in front of the listener. The masker was either speech-shaped noise or two-talker speech, and target sentences were produced using a clear or conversational speaking style. In Experiment 2, stimuli were presented over headphones. The two-talker speech masker was diotic (M 0 ). Clear and conversational target sentences were presented either in-phase (T 0 ) or out-of-phase (T π ) between the two ears. The M 0 T π condition introduces a segregation cue that was expected to improve performance. Results For speech presented over a single loudspeaker (Experiment 1), the clear-speech benefit was independent of age for the noise masker, but it increased with age for the two-talker masker. Similar age effects for the two-talker speech masker were seen under headphones with diotic presentation (M 0 T 0 ), but comparable clear-speech benefit as a function of age was observed with a binaural cue to facilitate segregation (M 0 T π ). Conclusions Consistent with prior research, children showed a robust clear-speech benefit for speech-in-noise recognition. Immaturity in the ability to segregate target from masker speech may limit young children's ability to benefit from clear-speech modifications for speech-in-speech recognition under some conditions. When provided with a cue that facilitates segregation, children as young as 4–7 years of age derived a clear-speech benefit in a two-talker masker that was similar to the benefit experienced by adults.

Download Full-text

Speaking Clearly for the Hard of Hearing III

Journal of Speech Language and Hearing Research ◽

10.1044/jshr.3203.600 ◽

1989 ◽

Vol 32 (3) ◽

pp. 600-603 ◽

Cited By ~ 69

Author(s):

M. A. Picheny ◽

N. I. Durlach ◽

L. D. Braida

Keyword(s):

Hard Of Hearing ◽

Speech Intelligibility ◽

Hearing Impaired ◽

Speaking Rate ◽

Conversational Speech ◽

Clear Speech ◽

Probe Experiment ◽

Simple Processing

Previous studies (Picheny, Durlach, & Braida, 1985, 1986) have demonstrated that substantial intelligibility differences exist for hearing-impaired listeners for speech spoken clearly compared to speech spoken conversationally. This paper presents the results of a probe experiment intended to determine the contribution of speaking rate to the intelligibility differences. Clear sentences were processed to have the durational properties of conversational speech, and conversational sentences were processed to have the durational properties of clear speech. Intelligibility testing with hearing-impaired listeners revealed both sets of materials to be degraded after processing. However, the degradation could not be attributable to processing artifacts because reprocessing the materials to restore their original durations produced intelligibility scores close to those observed for the unprocessed materials. We conclude that the simple processing to alter the relative durations of the speech materials was not adequate to assess the contribution of speaking rate to the intelligibility differences; further studies are proposed to address this question.

Download Full-text

A Matrixed Speech-in-Noise Test to Discriminate Favorable Listening Conditions by Means of Intelligibility and Response Time Results

Journal of Speech Language and Hearing Research ◽

10.1044/2018_jslhr-h-17-0418 ◽

2018 ◽

Vol 61 (6) ◽

pp. 1497-1516 ◽

Cited By ~ 4

Author(s):

Chiara Visentin ◽

Nicola Prodi

Keyword(s):

Speech Intelligibility ◽

Response Times ◽

Closed Set ◽

Sentence Recognition ◽

Speech In Noise ◽

Word Sequence ◽

Transmission Index ◽

Disyllabic Word ◽

Disyllabic Words ◽

Noise Test

Purpose The primary aim of this study was to develop and examine the potentials of a new speech-in-noise test in discriminating the favorable listening conditions targeted in the acoustical design of communication spaces. The test is based on the recognition and recall of disyllabic word sequences. A secondary aim was to compare the test with current speech-in-noise tests, assessing its benefits and limitations. Method Young adults (19–40 years old), self-reporting normal hearing, were presented with the newly developed Words Sequence Test (WST; 16 participants, Experiment 1) and with a consonant confusion test and a sentence recognition test (Experiment 2, 36 participants randomly assigned to the 2 tests). Participants performing the WST were presented with word sequences of different lengths (from 2 up to 6 words). Two listening conditions were selected: (a) no noise and no reverberation, and (b) reverberant, steady-state noise (Speech Transmission Index: 0.47). The tests were presented in a closed-set format; data on the number of words correctly recognized (speech intelligibility, IS) and the response times (RTs) were collected (onset RT, single words' RT). Results It was found that a sequence composed of 4 disyllabic words ensured both the full recognition score in quiet conditions and a significant decrease in IS results when noise and reverberation degraded the speech signal. RTs increased with the worsening of the listening conditions and the number of words of the sequence. The greatest onset RT variation was found when using a sequence of 4 words. In the comparison with current speech-in-noise tests, it was found that the WST maximized the IS difference between the selected listening conditions as well as the RT increase. Conclusions Overall, the results suggest that the new speech-in-noise test has good potentials in discriminating conditions with near-ceiling accuracy. As compared with current speech-in-noise tests, it appears that the WST with a 4-word sequence allows for a finer mapping of the acoustical design target conditions of public spaces through accuracy and onset RT data.

Download Full-text

Effects of Wearing Face Masks While Using Different Speaking Styles in Noise on Speech Intelligibility During the COVID-19 Pandemic

Frontiers in Psychology ◽

10.3389/fpsyg.2021.682677 ◽

2021 ◽

Vol 12 ◽

Author(s):

Hoyoung Yi ◽

Ashly Pingsterhaus ◽

Woonyoung Song

Keyword(s):

Background Noise ◽

Visual Information ◽

Speech Intelligibility ◽

Visual Cues ◽

Word Identification ◽

Signal To Noise Ratio ◽

Face Mask ◽

Presentation Mode ◽

Mask Condition ◽

Clear Speech

The coronavirus pandemic has resulted in the recommended/required use of face masks in public. The use of a face mask compromises communication, especially in the presence of competing noise. It is crucial to measure the potential effects of wearing face masks on speech intelligibility in noisy environments where excessive background noise can create communication challenges. The effects of wearing transparent face masks and using clear speech to facilitate better verbal communication were evaluated in this study. We evaluated listener word identification scores in the following four conditions: (1) type of mask condition (i.e., no mask, transparent mask, and disposable face mask), (2) presentation mode (i.e., auditory only and audiovisual), (3) speaking style (i.e., conversational speech and clear speech), and (4) with two types of background noise (i.e., speech shaped noise and four-talker babble at −5 signal-to-noise ratio). Results indicate that in the presence of noise, listeners performed less well when the speaker wore a disposable face mask or a transparent mask compared to wearing no mask. Listeners correctly identified more words in the audiovisual presentation when listening to clear speech. Results indicate the combination of face masks and the presence of background noise negatively impact speech intelligibility for listeners. Transparent masks facilitate the ability to understand target sentences by providing visual information. Use of clear speech was shown to alleviate challenging communication situations including compensating for a lack of visual cues and reduced acoustic signals.

Download Full-text

Linking audiovisual integration to audiovisual speech recognition in noise

10.31219/osf.io/46caf ◽

2020 ◽

Author(s):

Anja Gieseler ◽

Stephanie Rosemann ◽

Maike Tahden ◽

Kirsten C Wagener ◽

Christiane Thiel ◽

...

Keyword(s):

Hearing Loss ◽

Speech Recognition ◽

Hearing Impairment ◽

Visual Information ◽

Speech Intelligibility ◽

Individual Variability ◽

Audiovisual Integration ◽

Audiovisual Speech ◽

Specific Test ◽

Speech In Noise

Especially in challenging listening conditions, listeners can benefit from the audiovisual nature of speech by using visual information. Yet there exists great inter-individual variability, not only in understanding speech in noise, but also in the benefit obtained from additional visual cues. First empirical evidence suggests that the ability to integrate auditory and visual input, i.e. audiovisual integration, is altered in hearing impairment and is, at the same time, relevant for audiovisual speech intelligibility. The distinct role of mild hearing loss on audiovisual integration and the significance of these changes for speech intelligibility, however, need further scrutiny. Thus, here we investigated differences in audiovisual integration capacities between elderly, normal-hearing and hearing-impaired individuals using two tests of audiovisual integration (sound-induced flash illusion, McGurk task). To explore whether potential differences in audiovisual integration are meaningful for natural speech intelligibility, we then linked audiovisual integration capacities to speech-in-noise recognition using an audiovisual speech-reception threshold test, expecting this to reflect a more realistic listening scenario. Our results indicate that audiovisual integration abilities are already altered in mild hearing impairment, while the magnitude and direction of the effect depend on the specific test used. At the same time, audiovisual integration capacities seem relevant for predicting audiovisual speech intelligibility in noise, especially in those individuals with a hearing loss. We conclude that audiovisual integration abilities should therefore be considered for future predictions of speech recognition outcomes, which – in turn – should be assessed audiovisually, to account for the multisensory nature of speech and communication.

Download Full-text

Visual Information Shapes the Dynamics of Corticobasal Ganglia Pathways during Response Selection and Inhibition

Journal of Cognitive Neuroscience ◽

10.1162/jocn_a_00792 ◽

2015 ◽

Vol 27 (7) ◽

pp. 1344-1359 ◽

Cited By ~ 21

Author(s):

Sara Jahfari ◽

Lourens Waldorp ◽

K. Richard Ridderinkhof ◽

H. Steven Scholte

Keyword(s):

Visual Cortex ◽

Visual Information ◽

Response Selection ◽

Selection Process ◽

Effective Connectivity ◽

Action Selection ◽

Visual Input ◽

Fmri Data ◽

Response Strategies ◽

Fast Flow

Action selection often requires the transformation of visual information into motor plans. Preventing premature responses may entail the suppression of visual input and/or of prepared muscle activity. This study examined how the quality of visual information affects frontobasal ganglia (BG) routes associated with response selection and inhibition. Human fMRI data were collected from a stop task with visually degraded or intact face stimuli. During go trials, degraded spatial frequency information reduced the speed of information accumulation and response cautiousness. Effective connectivity analysis of the fMRI data showed action selection to emerge through the classic direct and indirect BG pathways, with inputs deriving form both prefrontal and visual regions. When stimuli were degraded, visual and prefrontal regions processing the stimulus information increased connectivity strengths toward BG, whereas regions evaluating visual scene content or response strategies reduced connectivity toward BG. Response inhibition during stop trials recruited the indirect and hyperdirect BG pathways, with input from visual and prefrontal regions. Importantly, when stimuli were nondegraded and processed fast, the optimal stop model contained additional connections from prefrontal to visual cortex. Individual differences analysis revealed that stronger prefrontal-to-visual connectivity covaried with faster inhibition times. Therefore, prefrontal-to-visual cortex connections appear to suppress the fast flow of visual input for the go task, such that the inhibition process can finish before the selection process. These results indicate response selection and inhibition within the BG to emerge through the interplay of top–down adjustments from prefrontal and bottom–up input from sensory cortex.

Download Full-text