scholarly journals Visual Enhancement of Relevant Speech in a ‘Cocktail Party’

2020 ◽  
Vol 33 (3) ◽  
pp. 277-294 ◽  
Author(s):  
Niti Jaha ◽  
Stanley Shen ◽  
Jess R. Kerlin ◽  
Antoine J. Shahin

Abstract Lip-reading improves intelligibility in noisy acoustical environments. We hypothesized that watching mouth movements benefits speech comprehension in a ‘cocktail party’ by strengthening the encoding of the neural representations of the visually paired speech stream. In an audiovisual (AV) task, EEG was recorded as participants watched and listened to videos of a speaker uttering a sentence while also hearing a concurrent sentence by a speaker of the opposite gender. A key manipulation was that each audio sentence had a 200-ms segment replaced by white noise. To assess comprehension, subjects were tasked with transcribing the AV-attended sentence on randomly selected trials. In the auditory-only trials, subjects listened to the same sentences and completed the same task while watching a static picture of a speaker of either gender. Subjects directed their listening to the voice of the gender of the speaker in the video. We found that the N1 auditory-evoked potential (AEP) time-locked to white noise onsets was significantly more inhibited for the AV-attended sentences than for those of the auditorily-attended (A-attended) and AV-unattended sentences. N1 inhibition to noise onsets has been shown to index restoration of phonemic representations of degraded speech. These results underscore that attention and congruency in the AV setting help streamline the complex auditory scene, partly by reinforcing the neural representations of the visually attended stream, heightening the perception of continuity and comprehension.

2007 ◽  
Vol 97 (1-3) ◽  
pp. 173-183 ◽  
Author(s):  
Lars A. Ross ◽  
Dave Saint-Amour ◽  
Victoria M. Leavitt ◽  
Sophie Molholm ◽  
Daniel C. Javitt ◽  
...  

2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Taishi Hosaka ◽  
Marino Kimura ◽  
Yuko Yotsumoto

AbstractWe have a keen sensitivity when it comes to the perception of our own voices. We can detect not only the differences between ourselves and others, but also slight modifications of our own voices. Here, we examined the neural correlates underlying such sensitive perception of one’s own voice. In the experiments, we modified the subjects’ own voices by using five types of filters. The subjects rated the similarity of the presented voices to their own. We compared BOLD (Blood Oxygen Level Dependent) signals between the voices that subjects rated as least similar to their own voice and those they rated as most similar. The contrast revealed that the bilateral superior temporal gyrus exhibited greater activities while listening to the voice least similar to their own voice and lesser activation while listening to the voice most similar to their own. Our results suggest that the superior temporal gyrus is involved in neural sharpening for the own-voice. The lesser degree of activations observed by the voices that were similar to the own-voice indicates that these areas not only respond to the differences between self and others, but also respond to the finer details of own-voices.


1979 ◽  
Vol 44 (3) ◽  
pp. 354-362 ◽  
Author(s):  
Jeffrey L. Danhauer ◽  
Jonathan G. Leppler

Thirty-five normal-hearing listeners' speech discrimination scores were obtained for the California Consonant Test (CCT) in four noise competitors: (1) a four-talker complex (FT), (2) a nine-talker complex developed at Bowling Green State University (BGMTN), (3) cocktail party noise (CPN), and (4) white noise (WN). Five listeners received the CCT stimuli mixed ipsilaterally with each of the competing noises at one of seven different signal-to-noise ratios (S/Ns). Articulation functions were plotted for each noise competitor. Statistical analysis revealed that the noise types produced few differences on the CCT scores over most of the S/Ns tested, but that noise competitors similar to peripheral maskers (CPN and WN) had less effect on the scores at more severe levels than competitors more similar to perceptual maskers (FT and BGMTN). Results suggest that the CCT should be sufficiently difficult even without the presence of a noise competitor for normal-hearing listeners in many audiologic testing situations. Levels that should approximate CCT maximum discrimination (D-Max) scores for normal listeners are suggested for use when clinic time does not permit the establishment of articulation functions. The clinician should determine the S/N of the CCT tape itself before establishing listening levels.


2021 ◽  
Vol 12 ◽  
Author(s):  
Kendra Gimhani Kandana Arachchige ◽  
Wivine Blekic ◽  
Isabelle Simoes Loureiro ◽  
Laurent Lefebvre

Numerous studies have explored the benefit of iconic gestures in speech comprehension. However, only few studies have investigated how visual attention was allocated to these gestures in the context of clear versus degraded speech and the way information is extracted for enhancing comprehension. This study aimed to explore the effect of iconic gestures on comprehension and whether fixating the gesture is required for information extraction. Four types of gestures (i.e., semantically and syntactically incongruent iconic gestures, meaningless configurations, and congruent iconic gestures) were presented in a sentence context in three different listening conditions (i.e., clear, partly degraded or fully degraded speech). Using eye tracking technology, participants’ gaze was recorded, while they watched video clips after which they were invited to answer simple comprehension questions. Results first showed that different types of gestures differently attract attention and that the more speech was degraded, the less participants would pay attention to gestures. Furthermore, semantically incongruent gestures appeared to particularly impair comprehension although not being fixated while congruent gestures appeared to improve comprehension despite also not being fixated. These results suggest that covert attention is sufficient to convey information that will be processed by the listener.


2010 ◽  
pp. 61-79 ◽  
Author(s):  
Tariqullah Jan ◽  
Wenwu Wang

Cocktail party problem is a classical scientific problem that has been studied for decades. Humans have remarkable skills in segregating target speech from a complex auditory mixture obtained in a cocktail party environment. Computational modeling for such a mechanism is however extremely challenging. This chapter presents an overview of several recent techniques for the source separation issues associated with this problem, including independent component analysis/blind source separation, computational auditory scene analysis, model-based approaches, non-negative matrix factorization and sparse coding. As an example, a multistage approach for source separation is included. The application areas of cocktail party processing are explored. Potential future research directions are also discussed.


Author(s):  
Douglas B. Quine ◽  
David Regan ◽  
Thomas J. Murray

SUMMARY:Delays of auditory perception at three frequencies were measured in 30 multiple sclerosis patients using a pscyhophysical technique. Nineteen patients had abnormal delays at one or more tone frequencies, though 15 had normal audiograms at those frequencies. In addition, auditory acuity for left-right asynchrony was abnormally poor in 13 patients, 9 of whom had normal audiograms. Such delays of auditory perception within a restricted frequency band may provide a partial explanation for degraded speech comprehension in some multiple sclerosis patients.


2005 ◽  
Vol 17 (9) ◽  
pp. 1875-1902 ◽  
Author(s):  
Simon Haykin ◽  
Zhe Chen

This review presents an overview of a challenging problem in auditory perception, the cocktail party phenomenon, the delineation of which goes back to a classic paper by Cherry in 1953. In this review, we address the following issues: (1) human auditory scene analysis, which is a general process carried out by the auditory system of a human listener; (2) insight into auditory perception, which is derived from Marr's vision theory; (3) computational auditory scene analysis, which focuses on specific approaches aimed at solving the machine cocktail party problem; (4) active audition, the proposal for which is motivated by analogy with active vision, and (5) discussion of brain theory and independent component analysis, on the one hand, and correlative neural firing, on the other.


2015 ◽  
Vol 58 (5) ◽  
pp. 1570-1591 ◽  
Author(s):  
Meital Avivi-Reich ◽  
Agnes Jakubczyk ◽  
Meredyth Daneman ◽  
Bruce A. Schneider

Purpose We investigated how age and linguistic status affected listeners' ability to follow and comprehend 3-talker conversations, and the extent to which individual differences in language proficiency predict speech comprehension under difficult listening conditions. Method Younger and older L1s as well as young L2s listened to 3-talker conversations, with or without spatial separation between talkers, in either quiet or against moderate or high 12-talker babble background, and were asked to answer questions regarding their contents. Results After compensating for individual differences in speech recognition, no significant differences in conversation comprehension were found among the groups. As expected, conversation comprehension decreased as babble level increased. Individual differences in reading comprehension skill contributed positively to performance in younger EL1s and in young EL2s to a lesser degree but not in older EL1s. Vocabulary knowledge was significantly and positively related to performance only at the intermediate babble level. Conclusion The results indicate that the manner in which spoken language comprehension is achieved is modulated by the listeners' age and linguistic status.


Sign in / Sign up

Export Citation Format

Share Document