audiovisual speech
Recently Published Documents


TOTAL DOCUMENTS

447
(FIVE YEARS 94)

H-INDEX

45
(FIVE YEARS 3)

2022 ◽  
Vol 15 ◽  
Author(s):  
Enrico Varano ◽  
Konstantinos Vougioukas ◽  
Pingchuan Ma ◽  
Stavros Petridis ◽  
Maja Pantic ◽  
...  

Understanding speech becomes a demanding task when the environment is noisy. Comprehension of speech in noise can be substantially improved by looking at the speaker’s face, and this audiovisual benefit is even more pronounced in people with hearing impairment. Recent advances in AI have allowed to synthesize photorealistic talking faces from a speech recording and a still image of a person’s face in an end-to-end manner. However, it has remained unknown whether such facial animations improve speech-in-noise comprehension. Here we consider facial animations produced by a recently introduced generative adversarial network (GAN), and show that humans cannot distinguish between the synthesized and the natural videos. Importantly, we then show that the end-to-end synthesized videos significantly aid humans in understanding speech in noise, although the natural facial motions yield a yet higher audiovisual benefit. We further find that an audiovisual speech recognizer (AVSR) benefits from the synthesized facial animations as well. Our results suggest that synthesizing facial motions from speech can be used to aid speech comprehension in difficult listening environments.


2021 ◽  
Author(s):  
Enrico Varano ◽  
Konstantinos Vougioukas ◽  
Pingchuan Ma ◽  
Stavros Petridis ◽  
Maja Pantic ◽  
...  

Understanding speech becomes a demanding task when the environment is noisy. Comprehension of speech in noise can be substantially improved by looking at the speake's face, and this audiovisual benefit is even more pronounced in people with hearing impairment. Recent advances in AI have allowed to synthesize photorealistic talking faces from a speech recording and a still image of a person's face in an end-to-end manner. However, it has remained unknown whether such facial animations improve speech-in-noise comprehension. Here we consider facial animations produced by a recently introduced generative adversarial network (GAN), and show that humans cannot distinguish between the synthesized and the natural videos. Importantly, we then show that the end-to-end synthesized videos significantly aid humans in understanding speech in noise, although the natural facial motions yield a yet higher audiovisual benefit. We further find that an audiovisual speech recognizer benefits from the synthesized facial animations as well. Our results suggest that synthesizing facial motions from speech can be used to aid speech comprehension in difficult listening environments.


2021 ◽  
pp. JN-RM-0114-21
Author(s):  
Jonathan E. Peelle ◽  
Brent Spehar ◽  
Michael S. Jones ◽  
Sarah McConkey ◽  
Joel Myerson ◽  
...  

2021 ◽  
pp. 147739
Author(s):  
Artturi Ylinen ◽  
Patrik Wikman ◽  
Miika Leminen ◽  
Kimmo Alho

2021 ◽  
Author(s):  
Ahmed Hussen Abdelaziz ◽  
Anushree Prasanna Kumar ◽  
Chloe Seivwright ◽  
Gabriele Fanelli ◽  
Justin Binder ◽  
...  

2021 ◽  
Author(s):  
Daniel Senkowski ◽  
James K. Moran

AbstractObjectivesPeople with Schizophrenia (SZ) show deficits in auditory and audiovisual speech recognition. It is possible that these deficits are related to aberrant early sensory processing, combined with an impaired ability to utilize visual cues to improve speech recognition. In this electroencephalography study we tested this by having SZ and healthy controls (HC) identify different unisensory auditory and bisensory audiovisual syllables at different auditory noise levels.MethodsSZ (N = 24) and HC (N = 21) identified one of three different syllables (/da/, /ga/, /ta/) at three different noise levels (no, low, high). Half the trials were unisensory auditory and the other half provided additional visual input of moving lips. Task-evoked mediofrontal N1 and P2 brain potentials triggered to the onset of the auditory syllables were derived and related to behavioral performance.ResultsIn comparison to HC, SZ showed speech recognition deficits for unisensory and bisensory stimuli. These deficits were primarily found in the no noise condition. Paralleling these observations, reduced N1 amplitudes to unisensory and bisensory stimuli in SZ were found in the no noise condition. In HC the N1 amplitudes were positively related to the speech recognition performance, whereas no such relationships were found in SZ. Moreover, no group differences in multisensory speech recognition benefits and N1 suppression effects for bisensory stimuli were observed.ConclusionOur study shows that reduced N1 amplitudes relate to auditory and audiovisual speech processing deficits in SZ. The findings that the amplitude effects were confined to salient speech stimuli and the attenuated relationship with behavioral performance, compared to HC, indicates a diminished decoding of the auditory speech signals in SZs. Our study also revealed intact multisensory benefits in SZs, which indicates that the observed auditory and audiovisual speech recognition deficits were primarily related to aberrant auditory speech processing.HighlightsSpeech processing deficits in schizophrenia related to reduced N1 amplitudes Audiovisual suppression effect in N1 preserved in schizophrenia Schizophrenia showed weakened P2 components in specifically audiovisual processing


2021 ◽  
Vol 150 (4) ◽  
pp. A275-A275
Author(s):  
Laura Koenig ◽  
Melissa Randazzo ◽  
Paul J. Smith ◽  
Ryan Priefer

Sign in / Sign up

Export Citation Format

Share Document