scholarly journals About face: Seeing the talker improves spoken word recognition but increases listening effort

2019 ◽  
Author(s):  
Violet Aurora Brown ◽  
Julia Feld Strand

It is widely accepted that seeing a talker improves a listener’s ability to understand what a talker is saying in background noise (e.g., Erber, 1969; Sumby & Pollack, 1954). The literature is mixed, however, regarding the influence of the visual modality on the listening effort required to recognize speech (e.g., Fraser, Gagné, Alepins, & Dubois, 2010; Sommers & Phelps, 2016). Here, we present data showing that even when the visual modality robustly benefits recognition, processing audiovisual speech can still result in greater cognitive load than processing speech in the auditory modality alone. We show using a dual-task paradigm that the costs associated with audiovisual speech processing are more pronounced in easy listening conditions, in which speech can be recognized at high rates in the auditory modality alone—indeed, effort did not differ between audiovisual and audio-only conditions when the background noise was presented at a more difficult level. Further, we show that though these effects replicate with different stimuli and participants, they do not emerge when effort is assessed with a recall paradigm rather than a dual-task paradigm. Together, these results suggest that the widely cited audiovisual recognition benefit may come at a cost under more favorable listening conditions, and add to the growing body of research suggesting that various measures of effort may not be tapping into the same underlying construct (Strand et al., 2018).

2020 ◽  
Author(s):  
Sarah Elizabeth Margaret Colby ◽  
Bob McMurray

Purpose: Listening effort is quickly becoming an important metric for assessing speech perception in less-than-ideal situations. However, the relationship between the construct of listening effort and the measures used to assess it remain unclear. We compared two measures of listening effort: a cognitive dual task and a physiological pupillometry task. We sought to investigate the relationship between these measures of effort and whether engaging effort impacts speech accuracy.Method: In Experiment 1, 30 participants completed a dual task and pupillometry task that were carefully matched in stimuli and design. The dual task consisted of a spoken word recognition task and a visual match-to-sample task. In the pupillometry task, pupil size was monitored while participants completed a spoken word recognition task. Both tasks presented words at three levels of listening difficulty (unmodified, 8-channel vocoding, and 4-channel vocoding) and provided response feedback on every trial. We refined the pupillometry task in Experiment 2 (n=31); crucially, participants no longer received response feedback. Finally, we ran a new group of subjects on both tasks in Experiment 3 (n=30).Results: In Experiment 1, accuracy in the visual task decreased with increased listening difficulty in the dual task, but pupil size was sensitive to accuracy and not listening difficulty. After removing feedback in Experiment 2, changes in pupil size were predicted by listening difficulty, suggesting the task was now sensitive to engaged effort. Both tasks were sensitive to listening difficulty in Experiment 3, but there was no relationship between the tasks and neither task predicted speech accuracy.Conclusions: Consistent with previous work, we found little evidence for a relationship between different measures of listening effort. We also found no evidence that effort predicts speech accuracy, suggesting that engaging more effort does not lead to improved speech recognition. Cognitive and physiological measures of listening effort are likely sensitive to different aspects of the construct of listening effort.


2019 ◽  
Author(s):  
Violet Aurora Brown ◽  
Julia Feld Strand ◽  
Kristin J. Van Engen

Objectives. Perceiving spoken language in noise can be a cognitively demanding task, particularly for older adults and those with hearing impairment. The current research assessed whether an abstract visual stimulus—a circle that modulates with the acoustic amplitude envelope of the speech—can affect speech processing in older adults. We hypothesized that, in line with recent research on younger adults, the circle would reduce listening effort during a word identification task. Given that older adults have slower processing speeds and poorer auditory temporal sensitivity than young adults, we expected that the abstract visual stimulus may have additional benefits for older adults, as it provides another source of information to compensate for limitations in auditory processing. Thus, we further hypothesized that, in contrast to the results from research on young adults, the circle would also improve word identification in noise for older adults.Design. Sixty-five older adults ages 65 to 83 (M = 71.11; SD = 4.01) with age-appropriate hearing completed four blocks of trials: two blocks (one with the modulating circle, one without) with a word identification task in two-talker babble, followed by two more word identification blocks that also included a simultaneous dual-task paradigm to assess listening effort.Results. Relative to an audio-only condition, the presence of the modulating circle substantially reduced listening effort (as indicated by faster responses to the secondary task in the dual-task paradigm) and also moderately improved spoken word intelligibility. Conclusions. Seeing the face of the talker substantially improves spoken word identification, but this is the first demonstration that another form of visual input—an abstract modulating circle—can also provide modest intelligibility benefits and substantial reductions in listening effort. These findings could have clinical or practical applications, as the modulating circle can be generated in real time to accompany speech in noisy situations, thereby improving speech intelligibility and reducing effort or fatigue for individuals who may have particular difficulty recognizing speech in background noise.


Author(s):  
Sarah Colby ◽  
Bob McMurray

Purpose Listening effort is quickly becoming an important metric for assessing speech perception in less-than-ideal situations. However, the relationship between the construct of listening effort and the measures used to assess it remains unclear. We compared two measures of listening effort: a cognitive dual task and a physiological pupillometry task. We sought to investigate the relationship between these measures of effort and whether engaging effort impacts speech accuracy. Method In Experiment 1, 30 participants completed a dual task and a pupillometry task that were carefully matched in stimuli and design. The dual task consisted of a spoken word recognition task and a visual match-to-sample task. In the pupillometry task, pupil size was monitored while participants completed a spoken word recognition task. Both tasks presented words at three levels of listening difficulty (unmodified, eight-channel vocoding, and four-channel vocoding) and provided response feedback on every trial. We refined the pupillometry task in Experiment 2 ( n = 31); crucially, participants no longer received response feedback. Finally, we ran a new group of subjects on both tasks in Experiment 3 ( n = 30). Results In Experiment 1, accuracy in the visual task decreased with increased signal degradation in the dual task, but pupil size was sensitive to accuracy and not vocoding condition. After removing feedback in Experiment 2, changes in pupil size were predicted by listening condition, suggesting the task was now sensitive to engaged effort. Both tasks were sensitive to listening difficulty in Experiment 3, but there was no relationship between the tasks and neither task predicted speech accuracy. Conclusions Consistent with previous work, we found little evidence for a relationship between different measures of listening effort. We also found no evidence that effort predicts speech accuracy, suggesting that engaging more effort does not lead to improved speech recognition. Cognitive and physiological measures of listening effort are likely sensitive to different aspects of the construct of listening effort. Supplemental Material https://doi.org/10.23641/asha.16455900


2021 ◽  
Vol Publish Ahead of Print ◽  
Author(s):  
Sofie Degeest ◽  
Katrien Kestens ◽  
Hannah Keppler

1984 ◽  
Vol 59 (3) ◽  
pp. 959-965 ◽  
Author(s):  
Robert M. Godley ◽  
Robert E. Estes ◽  
Glenn P. Fournet

Researchers have continued to echo McGeoch and Irion's (1952) statement concerning the superiority of the auditory modality for young children and the visual modality for older children in paired-associate learning despite conflicting results. In the present study, in which the performance of second and fifth grade children on a paired-associate task under 6 different modes of presentation was compared, mode of presentation did not vary as a function of age. The picture/sound combined condition was superior to the sound and printed/spoken-word conditions but provided no advantage over the picture condition alone. No significant differences were found among the printed-word, spoken-word, and combined printed/spoken-word conditions. Difficulties in making comparisons among studies because methods differed and implications for further research were discussed.


2017 ◽  
Vol 21 ◽  
pp. 233121651668728 ◽  
Author(s):  
Jean-Pierre Gagné ◽  
Jana Besser ◽  
Ulrike Lemke

B-ENT ◽  
2021 ◽  
Vol 17 (3) ◽  
pp. 135-144
Author(s):  
Sofie Degeest ◽  
◽  
Paul Corthals ◽  
Hannah Keppler ◽  
◽  
...  

2020 ◽  
Vol 73 (9) ◽  
pp. 1431-1443 ◽  
Author(s):  
Violet A Brown ◽  
Drew J McLaughlin ◽  
Julia F Strand ◽  
Kristin J Van Engen

In noisy settings or when listening to an unfamiliar talker or accent, it can be difficult to understand spoken language. This difficulty typically results in reductions in speech intelligibility, but may also increase the effort necessary to process the speech even when intelligibility is unaffected. In this study, we used a dual-task paradigm and pupillometry to assess the cognitive costs associated with processing fully intelligible accented speech, predicting that rapid perceptual adaptation to an accent would result in decreased listening effort over time. The behavioural and physiological paradigms provided converging evidence that listeners expend greater effort when processing nonnative- relative to native-accented speech, and both experiments also revealed an overall reduction in listening effort over the course of the experiment. Only the pupillometry experiment, however, revealed greater adaptation to nonnative- relative to native-accented speech. An exploratory analysis of the dual-task data that attempted to minimise practice effects revealed weak evidence for greater adaptation to the nonnative accent. These results suggest that even when speech is fully intelligible, resolving deviations between the acoustic input and stored lexical representations incurs a processing cost, and adaptation may attenuate this cost.


2019 ◽  
Author(s):  
Violet Aurora Brown ◽  
Julia Feld Strand

The McGurk effect is a multisensory phenomenon in which discrepant auditory and visual speech signals typically result in an illusory percept (McGurk & MacDonald, 1976). McGurk stimuli are often used in studies assessing the attentional requirements of audiovisual integration (e.g., Alsius et al., 2005), but no study has directly compared the costs associated with integrating congruent versus incongruent audiovisual speech. Some evidence suggests that the McGurk effect may not be representative of naturalistic audiovisual speech processing—susceptibility to the McGurk effect is not associated with the ability to derive benefit from the addition of the visual signal (Van Engen et al., 2017), and distinct cortical regions are recruited when processing congruent versus incongruent speech (Erickson et al., 2014). In two experiments, one using response times to identify congruent and incongruent syllables and one using a dual-task paradigm, we assessed whether congruent and incongruent audiovisual speech incur different attentional costs. We demonstrated that response times to both the speech task (Experiment 1) and a secondary vibrotactile task (Experiment 2) were indistinguishable for congruent compared to incongruent syllables, but McGurk fusions were responded to more quickly than McGurk non-fusions. These results suggest that despite documented differences in how congruent and incongruent stimuli are processed (Erickson et al., 2014; Van Engen, Xie, & Chandrasekaran, 2017), they do not appear to differ in terms of processing time or effort. However, responses that result in McGurk fusions are processed more quickly than those that result in non-fusions, though attentional cost is comparable for the two response types.


Sign in / Sign up

Export Citation Format

Share Document