Speech intelligibility and talker gender classification with noise-vocoded and tone-vocoded speech

Cochlear implant (CI) listeners encounter difficulties in communicating with other persons in noisy listening environments. However, most CI research has been carried out using the English language. In this study, single-channel speech enhancement (SE) strategies as a pre-processing approach for the CI system were investigated in terms of Thai speech intelligibility improvement. Two SE algorithms, namely multi-band spectral subtraction (MBSS) and Weiner filter (WF) algorithms, were evaluated. Speech signals consisting of monosyllabic and bisyllabic Thai words were degraded by speech-shaped noise and babble noise at SNR levels of 0, 5, and 10 dB. Then the noisy words were enhanced using SE algorithms. The enhanced words were fed into the CI system to synthesize vocoded speech. The vocoded speech was presented to twenty normal-hearing listeners. The results indicated that speech intelligibility was marginally improved by the MBSS algorithm and significantly improved by the WF algorithm in some conditions. The enhanced bisyllabic words showed a noticeably higher intelligibility improvement than the enhanced monosyllabic words in all conditions, particularly in speech-shaped noise. Such outcomes may be beneficial to Thai-speaking CI listeners.

Download Full-text

Role of binaural hearing in speech intelligibility and spatial release from masking using vocoded speech

The Journal of the Acoustical Society of America ◽

10.1121/1.3238242 ◽

2009 ◽

Vol 126 (5) ◽

pp. 2522-2535 ◽

Cited By ~ 17

Author(s):

Soha N. Garadat ◽

Ruth Y. Litovsky ◽

Gongqiang Yu ◽

Fan-Gang Zeng

Keyword(s):

Speech Intelligibility ◽

Binaural Hearing ◽

Spatial Release ◽

Release From Masking ◽

Vocoded Speech

Download Full-text

Effect of Dual-Carrier Processing on the Intelligibility of Concurrent Vocoded Sentences

Journal of Speech Language and Hearing Research ◽

10.1044/2018_jslhr-h-17-0234 ◽

2018 ◽

Vol 61 (11) ◽

pp. 2804-2813

Author(s):

Frédéric Apoux ◽

Brittney L. Carter ◽

Eric W. Healy

Keyword(s):

Sound Source ◽

Speech Intelligibility ◽

Target Sentence ◽

Sound Sources ◽

Single Carrier ◽

Sound Source Segregation ◽

Onset Asynchrony ◽

Source Segregation ◽

Vocoded Speech

Purpose The goal of this study was to examine the role of carrier cues in sound source segregation and the possibility to enhance the intelligibility of 2 sentences presented simultaneously. Dual-carrier (DC) processing (Apoux, Youngdahl, Yoho, & Healy, 2015) was used to introduce synthetic carrier cues in vocoded speech. Method Listeners with normal hearing heard sentences processed either with a DC or with a traditional single-carrier (SC) vocoder. One group was asked to repeat both sentences in a sentence pair (Experiment 1). The other group was asked to repeat only 1 sentence of the pair and was provided additional segregation cues involving onset asynchrony (Experiment 2). Results Both experiments showed that not only is the “target” sentence more intelligible in DC compared with SC, but the “background” sentence intelligibility is equally enhanced. The participants did not benefit from the additional segregation cues. Conclusions The data showed a clear benefit of using a distinct carrier to convey each sentence (i.e., DC processing). Accordingly, the poor speech intelligibility in noise typically observed with SC-vocoded speech may be partly attributed to the envelope of independent sound sources sharing the same carrier. Moreover, this work suggests that noise reduction may not be the only viable option to improve speech intelligibility in noise for users of cochlear implants. Alternative approaches aimed at enhancing sound source segregation such as DC processing may help to improve speech intelligibility while preserving and enhancing the background.

Download Full-text

Two Stages of Speech Envelope Tracking in Human Auditory Cortex Modulated by Speech Intelligibility

10.1101/2021.12.11.472249 ◽

2021 ◽

Author(s):

Na Xu ◽

Baotian Zhao ◽

Lu Luo ◽

Kai Zhang ◽

Xiaoqiu Shao ◽

...

Keyword(s):

Auditory Cortex ◽

Speech Intelligibility ◽

Primary Auditory Cortex ◽

Acoustic Features ◽

Envelope Tracking ◽

Power Stage ◽

Human Auditory Cortex ◽

Speech Envelope ◽

Two Stages ◽

Vocoded Speech

The envelope is essential for speech perception. Recent studies have shown that cortical activity can track the acoustic envelope. However, whether the tracking strength reflects the extent of speech intelligibility processing remains controversial. Here, using stereo-electroencephalogram (sEEG) technology, we directly recorded the activity in human auditory cortex while subjects listened to either natural or noise-vocoded speech. These two stimuli have approximately identical envelopes, but the noise-vocoded speech does not have speech intelligibility. We found two stages of envelope tracking in auditory cortex: an early high-γ (60-140 Hz) power stage (delay ≈ 49 ms) that preferred the noise-vocoded speech, and a late θ (4-8 Hz) phase stage (delay ≈ 178 ms) that preferred the natural speech. Furthermore, the decoding performance of high-γ power was better in primary auditory cortex than in non-primary auditory cortex, consistent with its short tracking delay. We also found distinct lateralization effects: high-γ power envelope tracking dominated left auditory cortex, while θ phase showed better decoding performance in right auditory cortex. In sum, we suggested a functional dissociation between high-γ power and θ phase: the former reflects fast and automatic processing of brief acoustic features, while the latter correlates to slow build-up processing facilitated by speech intelligibility.

Download Full-text

Visual Speech Benefit in Clear and Degraded Speech Depends on the Auditory Intelligibility of the Talker and the Number of Background Talkers

Trends in Hearing ◽

10.1177/2331216519837866 ◽

2019 ◽

Vol 23 ◽

pp. 233121651983786 ◽

Cited By ~ 2

Author(s):

Catherine L. Blackburn ◽

Pádraig T. Kitterick ◽

Gary Jones ◽

Christian J. Sumner ◽

Paula C. Stacey

Keyword(s):

Speech Intelligibility ◽

Noise Signal ◽

Sine Wave ◽

Theory Model ◽

Visual Speech ◽

Clear Speech ◽

Visual Integration ◽

Degraded Speech ◽

The Face ◽

Vocoded Speech

Perceiving speech in background noise presents a significant challenge to listeners. Intelligibility can be improved by seeing the face of a talker. This is of particular value to hearing impaired people and users of cochlear implants. It is well known that auditory-only speech understanding depends on factors beyond audibility. How these factors impact on the audio-visual integration of speech is poorly understood. We investigated audio-visual integration when either the interfering background speech (Experiment 1) or intelligibility of the target talkers (Experiment 2) was manipulated. Clear speech was also contrasted with sine-wave vocoded speech to mimic the loss of temporal fine structure with a cochlear implant. Experiment 1 showed that for clear speech, the visual speech benefit was unaffected by the number of background talkers. For vocoded speech, a larger benefit was found when there was only one background talker. Experiment 2 showed that visual speech benefit depended upon the audio intelligibility of the talker and increased as intelligibility decreased. Degrading the speech by vocoding resulted in even greater benefit from visual speech information. A single “independent noise” signal detection theory model predicted the overall visual speech benefit in some conditions but could not predict the different levels of benefit across variations in the background or target talkers. This suggests that, similar to audio-only speech intelligibility, the integration of audio-visual speech cues may be functionally dependent on factors other than audibility and task difficulty, and that clinicians and researchers should carefully consider the characteristics of their stimuli when assessing audio-visual integration.

Download Full-text

Lateralized Cerebral Processing of Abstract Linguistic Structure in Clear and Degraded Speech

Cerebral Cortex ◽

10.1093/cercor/bhaa245 ◽

2020 ◽

Vol 31 (1) ◽

pp. 591-602

Author(s):

Qingqing Meng ◽

Yiwen Li Hegner ◽

Iain Giblin ◽

Catherine McMahon ◽

Blake W Johnson

Keyword(s):

Cochlear Implant ◽

Speech Intelligibility ◽

Single Experiment ◽

Linguistic Information ◽

Degraded Speech ◽

Potential Clinical Utility ◽

Cortical Entrainment ◽

Cerebral Processing ◽

Linguistic Units ◽

Vocoded Speech

Abstract Human cortical activity measured with magnetoencephalography (MEG) has been shown to track the temporal regularity of linguistic information in connected speech. In the current study, we investigate the underlying neural sources of these responses and test the hypothesis that they can be directly modulated by changes in speech intelligibility. MEG responses were measured to natural and spectrally degraded (noise-vocoded) speech in 19 normal hearing participants. Results showed that cortical coherence to “abstract” linguistic units with no accompanying acoustic cues (phrases and sentences) were lateralized to the left hemisphere and changed parametrically with intelligibility of speech. In contrast, responses coherent to words/syllables accompanied by acoustic onsets were bilateral and insensitive to intelligibility changes. This dissociation suggests that cerebral responses to linguistic information are directly affected by intelligibility but also powerfully shaped by physical cues in speech. This explains why previous studies have reported widely inconsistent effects of speech intelligibility on cortical entrainment and, within a single experiment, provided clear support for conclusions about language lateralization derived from a large number of separately conducted neuroimaging studies. Since noise-vocoded speech resembles the signals provided by a cochlear implant device, the current methodology has potential clinical utility for assessment of cochlear implant performance.

Download Full-text

Investigating Cortical Responses to Noise-Vocoded Speech in Children with Normal Hearing Using Functional Near-Infrared Spectroscopy (fNIRS)

Journal of the Association for Research in Otolaryngology ◽

10.1007/s10162-021-00817-z ◽

2021 ◽

Author(s):

Faizah Mushtaq ◽

Ian M. Wiggins ◽

Pádraig T. Kitterick ◽

Carly A. Anderson ◽

Douglas E. H. Hartley

Keyword(s):

Infrared Spectroscopy ◽

Speech Perception ◽

Near Infrared Spectroscopy ◽

Speech Intelligibility ◽

Near Infrared ◽

Negative Relationship ◽

Functional Near Infrared Spectroscopy ◽

Speech Stimuli ◽

Degraded Speech ◽

Vocoded Speech

AbstractWhilst functional neuroimaging has been used to investigate cortical processing of degraded speech in adults, much less is known about how these signals are processed in children. An enhanced understanding of cortical correlates of poor speech perception in children would be highly valuable to oral communication applications, including hearing devices. We utilised vocoded speech stimuli to investigate brain responses to degraded speech in 29 normally hearing children aged 6–12 years. Intelligibility of the speech stimuli was altered in two ways by (i) reducing the number of spectral channels and (ii) reducing the amplitude modulation depth of the signal. A total of five different noise-vocoded conditions (with zero, partial or high intelligibility) were presented in an event-related format whilst participants underwent functional near-infrared spectroscopy (fNIRS) neuroimaging. Participants completed a word recognition task during imaging, as well as a separate behavioural speech perception assessment. fNIRS recordings revealed statistically significant sensitivity to stimulus intelligibility across several brain regions. More intelligible stimuli elicited stronger responses in temporal regions, predominantly within the left hemisphere, while right inferior parietal regions showed an opposite, negative relationship. Although there was some evidence that partially intelligible stimuli elicited the strongest responses in the left inferior frontal cortex, a region previous studies have suggested is associated with effortful listening in adults, this effect did not reach statistical significance. These results further our understanding of cortical mechanisms underlying successful speech perception in children. Furthermore, fNIRS holds promise as a clinical technique to help assess speech intelligibility in paediatric populations.

Download Full-text

A Family With Autosomal-Dominant Progressive Sensorineural Hearing Loss

American Journal of Audiology ◽

10.1044/1059-0889.0501.23 ◽

1996 ◽

Vol 5 (1) ◽

pp. 23-32 ◽

Cited By ~ 3

Author(s):

Chris Halpin ◽

Barbara Herrmann ◽

Margaret Whearty

Keyword(s):

Speech Production ◽

Hearing Aids ◽

Role Models ◽

Speech Intelligibility ◽

Large Scale ◽

Speech Language Pathology ◽

The Family ◽

Patient Will ◽

Language Pathology

The family described in this article provides an unusual opportunity to relate findings from genetic, histological, electrophysiological, psychophysical, and rehabilitative investigation. Although the total number evaluated is large (49), the known, living affected population is smaller (14), and these are spread from age 20 to age 59. As a result, the findings described above are those of a large-scale case study. Clearly, more data will be available through longitudinal study of the individuals documented in the course of this investigation but, given the slow nature of the progression in this disease, such studies will be undertaken after an interval of several years. The general picture presented to the audiologist who must rehabilitate these cases is that of a progressive cochlear degeneration that affects only thresholds at first, and then rapidly diminishes speech intelligibility. The expected result is that, after normal language development, the patient may accept hearing aids well, encouraged by the support of the family. Performance and satisfaction with the hearing aids is good, until the onset of the speech intelligibility loss, at which time the patient will encounter serious difficulties and may reject hearing aids as unhelpful. As the histological and electrophysiological results indicate, however, the eighth nerve remains viable, especially in the younger affected members, and success with cochlear implantation may be expected. Audiologic counseling efforts are aided by the presence of role models and support from the other affected members of the family. Speech-language pathology services were not considered important by the members of this family since their speech production developed normally and has remained very good. Self-correction of speech was supported by hearing aids and cochlear implants (Case 5’s speech production was documented in Perkell, Lane, Svirsky, & Webster, 1992). These patients received genetic counseling and, due to the high penetrance of the disease, exhibited serious concerns regarding future generations and the hope of a cure.

Download Full-text

Perceptual Learning of Vocoded Speech With and Without Contralateral Hearing: Implications for Cochlear Implant Rehabilitation

Journal of Speech Language and Hearing Research ◽

10.1044/2020_jslhr-20-00385 ◽

2020 ◽

pp. 1-10

Author(s):

Martin Chavant ◽

Alexis Hervais-Adelman ◽

Olivier Macherey

Keyword(s):

Cochlear Implant ◽

Perceptual Learning ◽

Speech Signal ◽

Training Phase ◽

Monosyllabic Words ◽

Low Pass ◽

Contralateral Ear ◽

Number Of Individuals ◽

Insight Into ◽

Vocoded Speech

Purpose An increasing number of individuals with residual or even normal contralateral hearing are being considered for cochlear implantation. It remains unknown whether the presence of contralateral hearing is beneficial or detrimental to their perceptual learning of cochlear implant (CI)–processed speech. The aim of this experiment was to provide a first insight into this question using acoustic simulations of CI processing. Method Sixty normal-hearing listeners took part in an auditory perceptual learning experiment. Each subject was randomly assigned to one of three groups of 20 referred to as NORMAL, LOWPASS, and NOTHING. The experiment consisted of two test phases separated by a training phase. In the test phases, all subjects were tested on recognition of monosyllabic words passed through a six-channel “PSHC” vocoder presented to a single ear. In the training phase, which consisted of listening to a 25-min audio book, all subjects were also presented with the same vocoded speech in one ear but the signal they received in their other ear differed across groups. The NORMAL group was presented with the unprocessed speech signal, the LOWPASS group with a low-pass filtered version of the speech signal, and the NOTHING group with no sound at all. Results The improvement in speech scores following training was significantly smaller for the NORMAL than for the LOWPASS and NOTHING groups. Conclusions This study suggests that the presentation of normal speech in the contralateral ear reduces or slows down perceptual learning of vocoded speech but that an unintelligible low-pass filtered contralateral signal does not have this effect. Potential implications for the rehabilitation of CI patients with partial or full contralateral hearing are discussed.

Download Full-text