visual speech information
Recently Published Documents


TOTAL DOCUMENTS

37
(FIVE YEARS 6)

H-INDEX

10
(FIVE YEARS 1)

Author(s):  
Antony S. Trotter ◽  
Briony Banks ◽  
Patti Adank

Purpose This study first aimed to establish whether viewing specific parts of the speaker's face (eyes or mouth), compared to viewing the whole face, affected adaptation to distorted noise-vocoded sentences. Second, this study also aimed to replicate results on processing of distorted speech from lab-based experiments in an online setup. Method We monitored recognition accuracy online while participants were listening to noise-vocoded sentences. We first established if participants were able to perceive and adapt to audiovisual four-band noise-vocoded sentences when the entire moving face was visible (AV Full). Four further groups were then tested: a group in which participants viewed the moving lower part of the speaker's face (AV Mouth), a group in which participants only see the moving upper part of the face (AV Eyes), a group in which participants could not see the moving lower or upper face (AV Blocked), and a group in which participants saw an image of a still face (AV Still). Results Participants repeated around 40% of the key words correctly and adapted during the experiment, but only when the moving mouth was visible. In contrast, performance was at floor level, and no adaptation took place, in conditions when the moving mouth was occluded. Conclusions The results show the importance of being able to observe relevant visual speech information from the speaker's mouth region, but not the eyes/upper face region, when listening and adapting to distorted sentences online. Second, the results also demonstrated that it is feasible to run speech perception and adaptation studies online, but that not all findings reported for lab studies replicate. Supplemental Material https://doi.org/10.23641/asha.14810523


2020 ◽  
pp. 1-24
Author(s):  
Núria Esteve-Gibert ◽  
Carmen Muñoz

Abstract Previous studies have shown that visual information is a crucial input in early language learning. In the present study we examine what type of visual input helps preschoolers in acquiring nonnative phonological contrasts. Catalan/Spanish-speaking children (4–5 years, N = 47) participated in a task to assess their phonological discrimination abilities before and after a training. Three training conditions were presented: one with clear oral/visual speech information, one with an ostensive object-sound mapping, and one with a rich social interaction. Children’s looking patterns were tracked to examine their focus of interest while being trained. Results revealed that preschoolers’ discrimination abilities increase in all trained conditions, but the condition where the speaker created an ostensive object–sound mapping led to higher long-term gains (especially for younger children). Eye-tracking results further showed that children looked to the object of reference while being exposed to the novel phonological input, which may explain the higher learning gains in this condition. Our results indicate that preschoolers’ learning of nonnative phonological contrasts is particularly boosted when the speech input is accompanied by an object of reference that is signaled ostensively and contingently in the visual space, compared to when the visual space only contains clear oral/visual speech information or social interactivity cues.


2020 ◽  
Author(s):  
Anthony Trotter ◽  
Briony Banks ◽  
patti adank

The ability to quickly adapt to distorted speech signals, such as noise-vocoding, is one of the mechanisms listeners employ to understand one another in challenging listening conditions. In addition, listeners have the ability to exploit information offered by visual aspects of speech, and being able to see the speaker’s face while perceiving distorted speech improves perception of and adaptation to these distorted speech signals. However, it is unclear how important viewing specific parts of the speaker’s face is to the successful use of visual speech information – particularly, does looking at the speaker’s mouth specifically improve recognition of noise-vocoded speech, or is it equally effective to view the speaker’s entire face? This study aimed to establish whether viewing specific parts of the speaker’s face (eyes or mouth), compared to viewing the whole face, affected perception of and adaptation to distorted sentences. In a secondary aim, we wanted to establish whether it was possible to replicate results on processing of noise-vocoded speech from lab-based experiments in an online setting. We monitored speech recognition accuracy online while participants were listening to noise-vocoded sentences in a between-subjects design with five groups. We first established if participants were able to reliably perceive and adapt to audiovisual noise-vocoded sentences when the speaker’s whole face was visible (AV Full). Four further groups were tested: a group in which participants could only view the moving lower part of the speaker’s face – i.e., the mouth (AV Mouth), only see the moving upper part of the face (AV Eyes), a group in which participants could not see the speaker’s moving lower or upper face (AV Blocked), and a group in which they were presented with an image of a still face (AV Still). Participants repeated around 40% of key words correctly for the noise-vocoded sentences and adapted over the course of the experiment but only when the moving mouth was visible (AV Full and AV mouth). In contrast, performance was at floor level and no adaptation took place in conditions when the moving mouth was not visible (AV Blocked, AV Eyes, and AV Still). Our results show the importance of being able to observe relevant visual speech information from the speaker’s mouth region, but not the eyes/upper face region when listening and adapting to speech under challenging conditions online. Second, our results also demonstrated that it is feasible to run speech perception and adaptation studies online, but that not all findings reported for lab studies necessarily replicate.


2020 ◽  
Author(s):  
Johannes Rennig ◽  
Michael S Beauchamp

AbstractRegions of the human posterior superior temporal gyrus and sulcus (pSTG/S) respond to the visual mouth movements that constitute visual speech and the auditory vocalizations that constitute auditory speech. We hypothesized that these multisensory responses in pSTG/S underlie the observation that comprehension of noisy auditory speech is improved when it is accompanied by visual speech. To test this idea, we presented audiovisual sentences that contained either a clear auditory component or a noisy auditory component while measuring brain activity using BOLD fMRI. Participants reported the intelligibility of the speech on each trial with a button press. Perceptually, adding visual speech to noisy auditory sentences rendered them much more intelligible. Post-hoc trial sorting was used to examine brain activations during noisy sentences that were more or less intelligible, focusing on multisensory speech regions in the pSTG/S identified with an independent visual speech localizer. Univariate analysis showed that less intelligible noisy audiovisual sentences evoked a weaker BOLD response, while more intelligible sentences evoked a stronger BOLD response that was indistinguishable from clear sentences. To better understand these differences, we conducted a multivariate representational similarity analysis. The pattern of response for intelligible noisy audiovisual sentences was more similar to the pattern for clear sentences, while the response pattern for unintelligible noisy sentences was less similar. These results show that for both univariate and multivariate analyses, successful integration of visual and noisy auditory speech normalizes responses in pSTG/S, providing evidence that multisensory subregions of pSTG/S are responsible for the perceptual benefit of visual speech.Significance StatementEnabling social interactions, including the production and perception of speech, is a key function of the human brain. Speech perception is a complex computational problem that the brain solves using both visual information from the talker’s facial movements and auditory information from the talker’s voice. Visual speech information is particularly important under noisy listening conditions when auditory speech is difficult or impossible to understand alone Regions of the human cortex in posterior superior temporal lobe respond to the visual mouth movements that constitute visual speech and the auditory vocalizations that constitute auditory speech. We show that the pattern of activity in cortex reflects the successful multisensory integration of auditory and visual speech information in the service of perception.


Author(s):  
Florian Destoky ◽  
Julie Bertels ◽  
Maxime Niesen ◽  
Vincent Wens ◽  
Marc Vander Ghinst ◽  
...  

AbstractHumans’ propensity to acquire literacy relates to several factors, among which, the ability to understand speech in noise (SiN). Still, the nature of the relation between reading and SiN perception abilities remains poorly understood. Here, we dissect the interplay between (i) reading abilities, (ii) classical behavioral predictors of reading (phonological awareness, phonological memory and lexical access), and (iii) electrophysiological markers of SiN perception in 99 elementary school children (26 with dyslexia). We demonstrate that cortical representation of phrasal content of SiN relates to the development of the lexical (but not sublexical) reading strategy. In contrast, classical behavioral predictors of reading abilities and the ability to benefit from visual speech to represent the syllabic content of SiN account for global reading performance (i.e., speed and accuracy of lexical and sublexical reading). Finally, we found that individuals with dyslexia properly integrate visual speech information to optimize processing of syntactic information, but not to sustain acoustic/phonemic processing. These results clarify the nature of the relation between SiN perception and reading abilities in typical and dyslexic child readers, and identified novel electrophysiological markers of emergent literacy.


Author(s):  
Doğu Erdener

Speech perception has long been taken for granted as an auditory-only process. However, it is now firmly established that speech perception is an auditory-visual process in which visual speech information in the form of lip and mouth movements are taken into account in the speech perception process. Traditionally, foreign language (L2) instructional methods and materials are auditory-based. This chapter presents a general framework of evidence that visual speech information will facilitate L2 instruction. The author claims that this knowledge will form a bridge to cover the gap between psycholinguistics and L2 instruction as an applied field. The chapter also describes how orthography can be used in L2 instruction. While learners from a transparent L1 orthographic background can decipher phonology of orthographically transparent L2s –overriding the visual speech information – that is not the case for those from orthographically opaque L1s.


Languages ◽  
2018 ◽  
Vol 3 (4) ◽  
pp. 38 ◽  
Author(s):  
Arzu Yordamlı ◽  
Doğu Erdener

This study aimed to investigate how individuals with bipolar disorder integrate auditory and visual speech information compared to healthy individuals. Furthermore, we wanted to see whether there were any differences between manic and depressive episode bipolar disorder patients with respect to auditory and visual speech integration. It was hypothesized that the bipolar group’s auditory–visual speech integration would be weaker than that of the control group. Further, it was predicted that those in the manic phase of bipolar disorder would integrate visual speech information more robustly than their depressive phase counterparts. To examine these predictions, a McGurk effect paradigm with an identification task was used with typical auditory–visual (AV) speech stimuli. Additionally, auditory-only (AO) and visual-only (VO, lip-reading) speech perceptions were also tested. The dependent variable for the AV stimuli was the amount of visual speech influence. The dependent variables for AO and VO stimuli were accurate modality-based responses. Results showed that the disordered and control groups did not differ in AV speech integration and AO speech perception. However, there was a striking difference in favour of the healthy group with respect to the VO stimuli. The results suggest the need for further research whereby both behavioural and physiological data are collected simultaneously. This will help us understand the full dynamics of how auditory and visual speech information are integrated in people with bipolar disorder.


2018 ◽  
Author(s):  
Jasmine Virhia ◽  
Sonja A. Kotz ◽  
patti adank

This experiment provides evidence that the emotional valence of our own speech production affects the extent to which we are able to disregard conflicting distracting visual speech information. The emotional valence of the distracting information itself does not affect the extent to which we can ignore this information. Our results imply that our own emotional mood affects how much we automatically imitate our conversation partner, but that the emotional status of our interlocutor is less important. The results support theoretical accounts suggesting that imitation in everyday life is governed by general cognitive mechanisms. However, these accounts are to be extended to include predictions regarding the emotional valence of both interaction partners.


Author(s):  
Arzu Yordamlı ◽  
Doğu Erdener

The focus of this study was to investigate how individuals with bipolar disorder integrate auditory and visual speech information compared to non-disordered individuals and whether there were any differences in auditory and visual speech integration in the manic and depressive episodes in bipolar disorder patients. It was hypothesized that bipolar groups’ auditory-visual speech integration would be less robust than the control group. Further, it was predicted that those in the manic phase of bipolar disorder would integrate visual speech information more than their depressive phase counterparts. To examine these, the McGurk effect paradigm was used with typical auditory-visual speech (AV) as well as auditory-only (AO) speech perception on visual-only (VO) stimuli. Results. Results showed that the disordered and non-disordered groups did not differ on auditory-visual speech (AV) integration and auditory-only (AO) speech perception but on visual-only (VO) stimuli. The results are interpreted to pave the way for further research whereby both behavioural and physiological data are collected simultaneously. This will allow us understand the full dynamics of how, actually, the auditory and visual (relatively impoverished in bipolar disorder) speech information are integrated in people with bipolar disorder.


Sign in / Sign up

Export Citation Format

Share Document