scholarly journals Highly accurate and robust identity perception from personally familiar voices

2020 ◽  
Author(s):  
Elise Kanber ◽  
Nadine Lavan ◽  
Carolyn McGettigan

Previous research suggests that familiarity with a voice can afford benefits for voice and speech perception. However, even familiar voice perception has been reported to be error-prone in previous research, especially in the face of challenges such as reduced verbal cues and acoustic distortions. It has been hypothesised that such findings may arise due to listeners not being “familiar enough” with the voices used in laboratory studies, and thus being inexperienced with their full vocal repertoire. By extension, voice perception based on highly familiar voices – acquired via substantial, naturalistic experience – should therefore be more robust than voice perception from less familiar voices. We investigated this proposal by contrasting voice perception of personally-familiar voices (participants’ romantic partners) versus lab-trained voices in challenging experimental tasks. Specifically, we tested how differences in familiarity may affect voice identity perception from non-verbal vocalisations and acoustically-modulated speech. Large benefits for the personally-familiar voice over less familiar, lab-trained voice were found for identity recognition, with listeners displaying both highly accurate yet more conservative recognition of personally familiar voices. However, no familiar-voice benefits were found for speech comprehension against background noise. Our findings suggest that listeners have fine-tuned representations of highly familiar voices that result in more robust and accurate voice recognition despite challenging listening contexts, yet these advantages may not always extend to speech perception. Our study therefore highlights that familiarity is indeed a continuum, with identity perception for personally-familiar voices being highly accurate.

2021 ◽  
Vol 12 (1) ◽  
pp. 33
Author(s):  
Andres Camarena ◽  
Grace Manchala ◽  
Julianne Papadopoulos ◽  
Samantha R. O’Connell ◽  
Raymond L. Goldsworthy

Cochlear implants have been used to restore hearing to more than half a million people around the world. The restored hearing allows most recipients to understand spoken speech without relying on visual cues. While speech comprehension in quiet is generally high for recipients, many complain about the sound of music. The present study examines consonance and dissonance perception in nine cochlear implant users and eight people with no known hearing loss. Participants completed web-based assessments to characterize low-level psychophysical sensitivities to modulation and pitch, as well as higher-level measures of musical pleasantness and speech comprehension in background noise. The underlying hypothesis is that sensitivity to modulation and pitch, in addition to higher levels of musical sophistication, relate to higher-level measures of music and speech perception. This hypothesis tested true with strong correlations observed between measures of modulation and pitch with measures of consonance ratings and speech recognition. Additionally, the cochlear implant users who were the most sensitive to modulations and pitch, and who had higher musical sophistication scores, had similar pleasantness ratings as those with no known hearing loss. The implication is that better coding and focused rehabilitation for modulation and pitch sensitivity will broadly improve perception of music and speech for cochlear implant users.


2018 ◽  
Vol 29 (09) ◽  
pp. 802-813 ◽  
Author(s):  
Allison Biever ◽  
Jan Gilden ◽  
Teresa Zwolan ◽  
Megan Mears ◽  
Anne Beiter

AbstractThe Nucleus® 6 sound processor is now compatible with the Nucleus® 22 (CI22M)—Cochlear’s first generation cochlear implant. The Nucleus 6 offers three new signal processing algorithms that purportedly facilitate improved hearing in background noise.These studies were designed to evaluate listening performance and user satisfaction with the Nucleus 6 sound processor.The research design was a prospective, single-participant, repeated measures designA group of 80 participants implanted with various Nucleus internal implant devices (CI22M, CI24M, Freedom® CI24RE, CI422, and CI512) were recruited from a total of six North American sites.Participants had their external sound processor upgraded to the Nucleus 6 sound processor. Final speech perception testing in noise and subjective questionnaires were completed after four or 12 weeks of take-home use with the Nucleus 6.Speech perception testing in noise showed significant improvement and participants reported increased satisfaction with the Nucleus 6.These studies demonstrated the benefit of the new algorithms in the Nucleus 6 over previous generations of sound processors.


2020 ◽  
Vol 32 (6) ◽  
pp. 1092-1103 ◽  
Author(s):  
Dan Kennedy-Higgins ◽  
Joseph T. Devlin ◽  
Helen E. Nuttall ◽  
Patti Adank

Successful perception of speech in everyday listening conditions requires effective listening strategies to overcome common acoustic distortions, such as background noise. Convergent evidence from neuroimaging and clinical studies identify activation within the temporal lobes as key to successful speech perception. However, current neurobiological models disagree on whether the left temporal lobe is sufficient for successful speech perception or whether bilateral processing is required. We addressed this issue using TMS to selectively disrupt processing in either the left or right superior temporal gyrus (STG) of healthy participants to test whether the left temporal lobe is sufficient or whether both left and right STG are essential. Participants repeated keywords from sentences presented in background noise in a speech reception threshold task while receiving online repetitive TMS separately to the left STG, right STG, or vertex or while receiving no TMS. Results show an equal drop in performance following application of TMS to either left or right STG during the task. A separate group of participants performed a visual discrimination threshold task to control for the confounding side effects of TMS. Results show no effect of TMS on the control task, supporting the notion that the results of Experiment 1 can be attributed to modulation of cortical functioning in STG rather than to side effects associated with online TMS. These results indicate that successful speech perception in everyday listening conditions requires both left and right STG and thus have ramifications for our understanding of the neural organization of spoken language processing.


2020 ◽  
Vol 14 ◽  
Author(s):  
Stephanie Haro ◽  
Christopher J. Smalt ◽  
Gregory A. Ciccarelli ◽  
Thomas F. Quatieri

Many individuals struggle to understand speech in listening scenarios that include reverberation and background noise. An individual's ability to understand speech arises from a combination of peripheral auditory function, central auditory function, and general cognitive abilities. The interaction of these factors complicates the prescription of treatment or therapy to improve hearing function. Damage to the auditory periphery can be studied in animals; however, this method alone is not enough to understand the impact of hearing loss on speech perception. Computational auditory models bridge the gap between animal studies and human speech perception. Perturbations to the modeled auditory systems can permit mechanism-based investigations into observed human behavior. In this study, we propose a computational model that accounts for the complex interactions between different hearing damage mechanisms and simulates human speech-in-noise perception. The model performs a digit classification task as a human would, with only acoustic sound pressure as input. Thus, we can use the model's performance as a proxy for human performance. This two-stage model consists of a biophysical cochlear-nerve spike generator followed by a deep neural network (DNN) classifier. We hypothesize that sudden damage to the periphery affects speech perception and that central nervous system adaptation over time may compensate for peripheral hearing damage. Our model achieved human-like performance across signal-to-noise ratios (SNRs) under normal-hearing (NH) cochlear settings, achieving 50% digit recognition accuracy at −20.7 dB SNR. Results were comparable to eight NH participants on the same task who achieved 50% behavioral performance at −22 dB SNR. We also simulated medial olivocochlear reflex (MOCR) and auditory nerve fiber (ANF) loss, which worsened digit-recognition accuracy at lower SNRs compared to higher SNRs. Our simulated performance following ANF loss is consistent with the hypothesis that cochlear synaptopathy impacts communication in background noise more so than in quiet. Following the insult of various cochlear degradations, we implemented extreme and conservative adaptation through the DNN. At the lowest SNRs (<0 dB), both adapted models were unable to fully recover NH performance, even with hundreds of thousands of training samples. This implies a limit on performance recovery following peripheral damage in our human-inspired DNN architecture.


2021 ◽  
Vol 2 (24 A) ◽  
pp. 137-149
Author(s):  
Wioletta A. Piegzik

This paper presents the phenomenon of anticipation which is one of the manifestations of linguistic maturity and language user rationality. Anticipation, taking place essentially in implicit structures and based on evolutionary old intuition, improves speech comprehension and increases the efficiency of cognitive processes. The phenomenon in question is presented on the example of foreign language communication, because it is there that the mechanisms governing the formulation of accurate hypotheses about form and content are particularly evident. The first part of the article discusses speech perception, and with it the categorization and selection of an appropriate cognitive schema conditioning accurate anticipation. The second part presents factors that facilitate and hinder the right hypothesis. Finally, conclusions and directions for future research on anticipation are formulated.


Author(s):  
Eric D. Young ◽  
Donata Oertel

Neuronal circuits in the brainstem convert the output of the ear, which carries the acoustic properties of ongoing sound, to a representation of the acoustic environment that can be used by the thalamocortical system. Most important, brainstem circuits reflect the way the brain uses acoustic cues to determine where sounds arise and what they mean. The circuits merge the separate representations of sound in the two ears and stabilize them in the face of disturbances such as loudness fluctuation or background noise. Embedded in these systems are some specialized analyses that are driven by the need to resolve tiny differences in the time and intensity of sounds at the two ears and to resolve rapid temporal fluctuations in sounds like the sequence of notes in music or the sequence of syllables in speech.


2012 ◽  
Vol 25 (0) ◽  
pp. 148
Author(s):  
Marcia Grabowecky ◽  
Emmanuel Guzman-Martinez ◽  
Laura Ortega ◽  
Satoru Suzuki

Watching moving lips facilitates auditory speech perception when the mouth is attended. However, recent evidence suggests that visual attention and awareness are mediated by separate mechanisms. We investigated whether lip movements suppressed from visual awareness can facilitate speech perception. We used a word categorization task in which participants listened to spoken words and determined as quickly and accurately as possible whether or not each word named a tool. While participants listened to the words they watched a visual display that presented a video clip of the speaker synchronously speaking the auditorily presented words, or the same speaker articulating different words. Critically, the speaker’s face was either visible (the aware trials), or suppressed from awareness using continuous flash suppression. Aware and suppressed trials were randomly intermixed. A secondary probe-detection task ensured that participants attended to the mouth region regardless of whether the face was visible or suppressed. On the aware trials responses to the tool targets were no faster with the synchronous than asynchronous lip movements, perhaps because the visual information was inconsistent with the auditory information on 50% of the trials. However, on the suppressed trials responses to the tool targets were significantly faster with the synchronous than asynchronous lip movements. These results demonstrate that even when a random dynamic mask renders a face invisible, lip movements are processed by the visual system with sufficiently high temporal resolution to facilitate speech perception.


2000 ◽  
Vol 31 (4) ◽  
pp. 376-384 ◽  
Author(s):  
Gary W. Siebein ◽  
Martin A. Gold ◽  
Glenn W. Siebein ◽  
Michael G. Ermann

The purpose of this article is to describe the use of impulse response measures and observations in Florida classrooms. As a result of measures and observations in "healthy" and poor acoustical environments, 10 practical recommendations are proposed for improving the acoustical environment in schools. The primary research for these recommendations consisted of recording acoustical measurements of reverberation time and background noise, as well as newer acoustical measurements based on impulse response techniques, in 56 actual classrooms. Observations of classroom situations occurred in a subset of these schools. Computer and physical models of eight classrooms were constructed and tested with varying room finish materials and background noise levels to study the comeverberation bined effects of these architectural items on speech perception in the model rooms. The primary recommendations all relate to school design and planning. These include air-conditioning system selection and noise control techniques to minimize interference with listening, interior classroom acoustical design principles for maximizing speech perception, and the documentation of teaching methods and classroom arrangements that result in improving speech intelligibility and other factors affecting speech perception.


Sign in / Sign up

Export Citation Format

Share Document