Speech signals separation: a new approach exploiting the coherence of audio and visual speech

Author(s):  
L. Girin ◽  
A. Allard ◽  
J.-L. Schwartz
2021 ◽  
Author(s):  
Mate Aller ◽  
Heidi Solberg Okland ◽  
Lucy J MacGregor ◽  
Helen Blank ◽  
Matthew H. Davis

Speech perception in noisy environments is enhanced by seeing facial movements of communication partners. However, the neural mechanisms by which audio and visual speech are combined are not fully understood. We explore MEG phase locking to auditory and visual signals in MEG recordings from 14 human participants (6 female) that reported words from single spoken sentences. We manipulated the acoustic clarity and visual speech signals such that critical speech information is present in auditory, visual or both modalities. MEG coherence analysis revealed that both auditory and visual speech envelopes (auditory amplitude modulations and lip aperture changes) were phase-locked to 2-6Hz brain responses in auditory and visual cortex, consistent with entrainment to syllable-rate components. Partial coherence analysis was used to separate neural responses to correlated audio-visual signals and showed non-zero phase locking to auditory envelope in occipital cortex during audio-visual (AV) speech. Furthermore, phase-locking to auditory signals in visual cortex was enhanced for AV speech compared to audio-only (AO) speech that was matched for intelligibility. Conversely, auditory regions of the superior temporal gyrus (STG) did not show above-chance partial coherence with visual speech signals during AV conditions, but did show partial coherence in VO conditions. Hence, visual speech enabled stronger phase locking to auditory signals in visual areas, whereas phase-locking of visual speech in auditory regions only occurred during silent lip-reading. Differences in these cross-modal interactions between auditory and visual speech signals are interpreted in line with cross-modal predictive mechanisms during speech perception.


2009 ◽  
pp. 439-461
Author(s):  
Lynne E. Bernstein ◽  
Jintao Jiang

The information in optical speech signals is phonetically impoverished compared to the information in acoustic speech signals that are presented under good listening conditions. But high lipreading scores among prelingually deaf adults inform us that optical speech signals are in fact rich in phonetic information. Hearing lipreaders are not as accurate as deaf lipreaders, but they too demonstrate perception of detailed optical phonetic information. This chapter briefly sketches the historical context of and impediments to knowledge about optical phonetics and visual speech perception (lipreading). The authors review findings on deaf and hearing lipreaders. Then we review recent results on relationships between optical speech signals and visual speech perception. We extend the discussion of these relationships to the development of visual speech synthesis. We advocate for a close relationship between visual speech perception research and development of synthetic visible speech.


2015 ◽  
Vol 43 ◽  
pp. 51-61
Author(s):  
Mirza A.F.M. Rashidul Hasan ◽  
Rubaiyat Yasmin ◽  
Dipankar Das ◽  
M. M. Hoque ◽  
M. I. Pramanik ◽  
...  

In this paper, we proposed a correlation based method which is a new approach using the autocorrelation function is weighted by the reciprocal of the YIN and very useful for accurate fundamental frequency extraction. The autocorrelation function and also YIN is a popular measurement in estimating fundamental frequency in time domain. In our proposed method, instead of the original signal, we employ its center clipping signal for obtaining the autocorrelation function and this function is weighted by the reciprocal of the YIN for fundamental frequency detection. Comparative results on female and male voices in white and exhibition noise shows that the proposed method can detect fundamental frequency with better accuracy in terms of gross pitch errors as compared to other related methods.


1989 ◽  
Vol 27 (11) ◽  
pp. 65-71 ◽  
Author(s):  
B.P. Yuhas ◽  
M.H. Goldstein ◽  
T.J. Sejnowski

Sign in / Sign up

Export Citation Format

Share Document