Using Visual Speech Information in Masking Methods for Audio Speaker Separation

Research has shown that speaking in a deliberately clear manner can improve the accuracy of auditory speech recognition. Allowing listeners access to visual speech cues also enhances speech understanding. Whether the nature of information provided by speaking clearly and by using visual speech cues is redundant has not been determined. This study examined how speaking mode (clear vs. conversational) and presentation mode (auditory vs. auditory-visual) influenced the perception of words within nonsense sentences. In Experiment 1, 30 young listeners with normal hearing responded to videotaped stimuli presented audiovisually in the presence of background noise at one of three signal-to-noise ratios. In Experiment 2, 9 participants returned for an additional assessment using auditory-only presentation. Results of these experiments showed significant effects of speaking mode (clear speech was easier to understand than was conversational speech) and presentation mode (auditoryvisual presentation led to better performance than did auditory-only presentation). The benefit of clear speech was greater for words occurring in the middle of sentences than for words at either the beginning or end of sentences for both auditory-only and auditory-visual presentation, whereas the greatest benefit from supplying visual cues was for words at the end of sentences spoken both clearly and conversationally. The total benefit from speaking clearly and supplying visual cues was equal to the sum of each of these effects. Overall, the results suggest that speaking clearly and providing visual speech information provide complementary (rather than redundant) information.

Download Full-text

The Effect of a Concurrent Working Memory Task and Temporal Offsets on the Integration of Auditory and Visual Speech Information

Seeing and Perceiving ◽

10.1163/187847611x620937 ◽

2012 ◽

Vol 25 (1) ◽

pp. 87-106 ◽

Cited By ~ 13

Author(s):

Kevin G. Munhall ◽

Julie N. Buchan

Keyword(s):

Working Memory ◽

Memory Task ◽

Visual Speech ◽

Visual Speech Information ◽

Speech Information

Download Full-text

Second Language Instruction

Advances in Educational Technologies and Instructional Design - Handbook of Research on Bilingual and Intercultural Education ◽

10.4018/978-1-7998-2588-3.ch005 ◽

2020 ◽

pp. 105-123

Author(s):

Doğu Erdener

Keyword(s):

Speech Perception ◽

Applied Field ◽

General Framework ◽

Language Instruction ◽

Visual Speech ◽

Second Language Instruction ◽

Perception Process ◽

L2 Instruction ◽

Visual Speech Information ◽

Speech Information

Speech perception has long been taken for granted as an auditory-only process. However, it is now firmly established that speech perception is an auditory-visual process in which visual speech information in the form of lip and mouth movements are taken into account in the speech perception process. Traditionally, foreign language (L2) instructional methods and materials are auditory-based. This chapter presents a general framework of evidence that visual speech information will facilitate L2 instruction. The author claims that this knowledge will form a bridge to cover the gap between psycholinguistics and L2 instruction as an applied field. The chapter also describes how orthography can be used in L2 instruction. While learners from a transparent L1 orthographic background can decipher phonology of orthographically transparent L2s –overriding the visual speech information – that is not the case for those from orthographically opaque L1s.

Download Full-text

Hemispheric differences in perceiving and integrating dynamic visual speech information

The Journal of the Acoustical Society of America ◽

10.1121/1.417400 ◽

1996 ◽

Vol 100 (4) ◽

pp. 2570-2570 ◽

Cited By ~ 1

Author(s):

Jennifer A. Johnson ◽

Lawrence D. Rosenblum

Keyword(s):

Visual Speech ◽

Hemispheric Differences ◽

Visual Speech Information ◽

Speech Information

Download Full-text

Visual speech information improves discrimination of non‐native phonemes in late infancy.

The Journal of the Acoustical Society of America ◽

10.1121/1.4784782 ◽

2009 ◽

Vol 125 (4) ◽

pp. 2778-2778

Author(s):

Robin K. Panneton

Keyword(s):

Visual Speech ◽

Visual Speech Information ◽

Speech Information

Download Full-text

Auditory–Visual Speech Integration in Bipolar Disorder: A Preliminary Study

Languages ◽

10.3390/languages3040038 ◽

2018 ◽

Vol 3 (4) ◽

pp. 38 ◽

Cited By ~ 1

Author(s):

Arzu Yordamlı ◽

Doğu Erdener

Keyword(s):

Bipolar Disorder ◽

Mcgurk Effect ◽

Visual Speech ◽

Physiological Data ◽

Striking Difference ◽

Control Group ◽

Speech Stimuli ◽

Visual Speech Information ◽

Speech Information ◽

And Control

This study aimed to investigate how individuals with bipolar disorder integrate auditory and visual speech information compared to healthy individuals. Furthermore, we wanted to see whether there were any differences between manic and depressive episode bipolar disorder patients with respect to auditory and visual speech integration. It was hypothesized that the bipolar group’s auditory–visual speech integration would be weaker than that of the control group. Further, it was predicted that those in the manic phase of bipolar disorder would integrate visual speech information more robustly than their depressive phase counterparts. To examine these predictions, a McGurk effect paradigm with an identification task was used with typical auditory–visual (AV) speech stimuli. Additionally, auditory-only (AO) and visual-only (VO, lip-reading) speech perceptions were also tested. The dependent variable for the AV stimuli was the amount of visual speech influence. The dependent variables for AO and VO stimuli were accurate modality-based responses. Results showed that the disordered and control groups did not differ in AV speech integration and AO speech perception. However, there was a striking difference in favour of the healthy group with respect to the VO stimuli. The results suggest the need for further research whereby both behavioural and physiological data are collected simultaneously. This will help us understand the full dynamics of how auditory and visual speech information are integrated in people with bipolar disorder.

Download Full-text

A Novel Method to Extract Lip-Reading Features by Using LGEI and DWT

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.1079-1080.820 ◽

2014 ◽

Vol 1079-1080 ◽

pp. 820-823

Author(s):

Li Guo Zheng ◽

Mei Li Zhu ◽

Qing Qing Wang

Keyword(s):

Visual Speech ◽

Second Step ◽

Discrete Wavelet ◽

Noise Resistance ◽

Lip Reading ◽

Novel Method ◽

Reading System ◽

Visual Speech Information ◽

Speech Information ◽

Precision Rate

This paper proposes a novel algorithm used in extraction of lip feature extraction for to improved efficiency and robustness of lip-reading system. First, Lip Gray Energy Image (LGEI) is used to smooth noise, and improve noise resistance of the system. Second, Discrete Wavelet Analysis (DWT) is used to extract salient visual speech information from lip by decorrelating spectral information. Last, lip features are obtained by downsampling data from second step, the resample can effectively reduce the amount of computation. Experimental results show the performance of this method is exceedingly discriminative, accurate and computation efficient, the precision rate can rate 96%.

Download Full-text

Linking Activity in Human Superior Temporal Cortex to Perception of Noisy Audiovisual Speech

10.1101/2020.04.02.021774 ◽

2020 ◽

Author(s):

Johannes Rennig ◽

Michael S Beauchamp

Keyword(s):

Univariate Analysis ◽

Temporal Cortex ◽

Visual Speech ◽

Button Press ◽

Auditory Information ◽

Bold Response ◽

Auditory Speech ◽

Auditory Component ◽

Visual Speech Information ◽

Speech Information

AbstractRegions of the human posterior superior temporal gyrus and sulcus (pSTG/S) respond to the visual mouth movements that constitute visual speech and the auditory vocalizations that constitute auditory speech. We hypothesized that these multisensory responses in pSTG/S underlie the observation that comprehension of noisy auditory speech is improved when it is accompanied by visual speech. To test this idea, we presented audiovisual sentences that contained either a clear auditory component or a noisy auditory component while measuring brain activity using BOLD fMRI. Participants reported the intelligibility of the speech on each trial with a button press. Perceptually, adding visual speech to noisy auditory sentences rendered them much more intelligible. Post-hoc trial sorting was used to examine brain activations during noisy sentences that were more or less intelligible, focusing on multisensory speech regions in the pSTG/S identified with an independent visual speech localizer. Univariate analysis showed that less intelligible noisy audiovisual sentences evoked a weaker BOLD response, while more intelligible sentences evoked a stronger BOLD response that was indistinguishable from clear sentences. To better understand these differences, we conducted a multivariate representational similarity analysis. The pattern of response for intelligible noisy audiovisual sentences was more similar to the pattern for clear sentences, while the response pattern for unintelligible noisy sentences was less similar. These results show that for both univariate and multivariate analyses, successful integration of visual and noisy auditory speech normalizes responses in pSTG/S, providing evidence that multisensory subregions of pSTG/S are responsible for the perceptual benefit of visual speech.Significance StatementEnabling social interactions, including the production and perception of speech, is a key function of the human brain. Speech perception is a complex computational problem that the brain solves using both visual information from the talker’s facial movements and auditory information from the talker’s voice. Visual speech information is particularly important under noisy listening conditions when auditory speech is difficult or impossible to understand alone Regions of the human cortex in posterior superior temporal lobe respond to the visual mouth movements that constitute visual speech and the auditory vocalizations that constitute auditory speech. We show that the pattern of activity in cortex reflects the successful multisensory integration of auditory and visual speech information in the service of perception.

Download Full-text