Consonant Confusions in Amplitude-Expanded Speech

1996 ◽  
Vol 39 (6) ◽  
pp. 1124-1137 ◽  
Author(s):  
Richard L. Freyman ◽  
G. Patrick Nerbonne

The perceptual consequences of expanding the amplitude variations in speech were studied under conditions in which spectral information was obscured by signal correlated noise that had an envelope correlated with the speech envelope, but had a flat amplitude spectrum. The noise samples, created individually from 22 vowel-consonant-vowel nonsense words, were used as maskers of those words, with signal-to-noise ratios ranging from –15 to 0 dB. Amplitude expansion was by a factor of 3.0 in terms of decibels. In the first experiment, presentation level for speech peaks was 80 dB SPL. Consonant recognition performance for expanded speech by 50 listeners with normal hearing was as much as 30 percentage points poorer than for unexpanded speech and the types of errors were dramatically different, especially in the midrange of S-N ratios. In a second experiment presentation level was varied to determine whether reductions in consonant levels produced by expansion were responsible for the differences between conditions. Recognition performance for unexpanded speech at 40 dB SPL was nearly equivalent to that for expanded speech at 80 dB SPL. The error patterns obtained in these two conditions were different, suggesting that the differences between conditions in Experiment 1 were due largely to expanded amplitude envelopes rather than differences in audibility.

1991 ◽  
Vol 34 (2) ◽  
pp. 415-426 ◽  
Author(s):  
Richard L. Freyman ◽  
G. Patrick Nerbonne ◽  
Heather A. Cote

This investigation examined the degree to which modification of the consonant-vowel (C-V) intensity ratio affected consonant recognition under conditions in which listeners were forced to rely more heavily on waveform envelope cues than on spectral cues. The stimuli were 22 vowel-consonant-vowel utterances, which had been mixed at six different signal-to-noise ratios with white noise that had been modulated by the speech waveform envelope. The resulting waveforms preserved the gross speech envelope shape, but spectral cues were limited by the white-noise masking. In a second stimulus set, the consonant portion of each utterance was amplified by 10 dB. Sixteen subjects with normal hearing listened to the unmodified stimuli, and 16 listened to the amplified-consonant stimuli. Recognition performance was reduced in the amplified-consonant condition for some consonants, presumably because waveform envelope cues had been distorted. However, for other consonants, especially the voiced stops, consonant amplification improved recognition. Patterns of errors were altered for several consonant groups, including some that showed only small changes in recognition scores. The results indicate that when spectral cues are compromised, nonlinear amplification can alter waveform envelope cues for consonant recognition.


1992 ◽  
Vol 35 (4) ◽  
pp. 942-949 ◽  
Author(s):  
Christopher W. Turner ◽  
David A. Fabry ◽  
Stephanie Barrett ◽  
Amy R. Horwitz

This study examined the possibility that hearing-impaired listeners, in addition to displaying poorer-than-normal recognition of speech presented in background noise, require a larger signal-to-noise ratio for the detection of the speech sounds. Psychometric functions for the detection and recognition of stop consonants were obtained from both normal-hearing and hearing-impaired listeners. Expressing the speech levels in terms of their short-term spectra, the detection of consonants for both subject groups occurred at the same signal-to-noise ratio. In contrast, the hearing-impaired listeners displayed poorer recognition performance than the normal-hearing listeners. These results imply that the higher signal-to-noise ratios required for a given level of recognition by some subjects with hearing loss are not due in part to a deficit in detection of the signals in the masking noise, but rather are due exclusively to a deficit in recognition.


2014 ◽  
Vol 25 (06) ◽  
pp. 529-540 ◽  
Author(s):  
Erin C. Schafer ◽  
Danielle Bryant ◽  
Katie Sanders ◽  
Nicole Baldus ◽  
Katherine Algier ◽  
...  

Background: Several recent investigations support the use of frequency modulation (FM) systems in children with normal hearing and auditory processing or listening disorders such as those diagnosed with auditory processing disorders, autism spectrum disorders, attention-deficit hyperactivity disorder, Friedreich ataxia, and dyslexia. The American Academy of Audiology (AAA) published suggested procedures, but these guidelines do not cite research evidence to support the validity of the recommended procedures for fitting and verifying nonoccluding open-ear FM systems on children with normal hearing. Documenting the validity of these fitting procedures is critical to maximize the potential FM-system benefit in the abovementioned populations of children with normal hearing and those with auditory-listening problems. Purpose: The primary goal of this investigation was to determine the validity of the AAA real-ear approach to fitting FM systems on children with normal hearing. The secondary goal of this study was to examine speech-recognition performance in noise and loudness ratings without and with FM systems in children with normal hearing sensitivity. Research Design: A two-group, cross-sectional design was used in the present study. Study Sample: Twenty-six typically functioning children, ages 5–12 yr, with normal hearing sensitivity participated in the study. Intervention: Participants used a nonoccluding open-ear FM receiver during laboratory-based testing. Data Collection and Analysis: Participants completed three laboratory tests: (1) real-ear measures, (2) speech recognition performance in noise, and (3) loudness ratings. Four real-ear measures were conducted to (1) verify that measured output met prescribed-gain targets across the 1000–4000 Hz frequency range for speech stimuli, (2) confirm that the FM-receiver volume did not exceed predicted uncomfortable loudness levels, and (3 and 4) measure changes to the real-ear unaided response when placing the FM receiver in the child’s ear. After completion of the fitting, speech recognition in noise at a –5 signal-to-noise ratio and loudness ratings at a +5 signal-to-noise ratio were measured in four conditions: (1) no FM system, (2) FM receiver on the right ear, (3) FM receiver on the left ear, and (4) bilateral FM system. Results: The results of this study suggested that the slightly modified AAA real-ear measurement procedures resulted in a valid fitting of one FM system on children with normal hearing. On average, prescriptive targets were met for 1000, 2000, 3000, and 4000 Hz within 3 dB, and maximum output of the FM system never exceeded and was significantly lower than predicted uncomfortable loudness levels for the children. There was a minimal change in the real-ear unaided response when the open-ear FM receiver was placed into the ear. Use of the FM system on one or both ears resulted in significantly better speech recognition in noise relative to a no-FM condition, and the unilateral and bilateral FM receivers resulted in a comfortably loud signal when listening in background noise. Conclusions: Real-ear measures are critical for obtaining an appropriate fit of an FM system on children with normal hearing.


1998 ◽  
Vol 41 (2) ◽  
pp. 315-326 ◽  
Author(s):  
Pamela E. Souza ◽  
Christopher W. Turner

Although multichannel compression systems are quickly becoming integral components of programmable hearing aids, research results have not consistently demonstrated their benefit over conventional amplification. The present study examined two confounding factors that may have contributed to this inconsistency in results: alteration of temporal information and audibility of speech cues. Recognition of linearly amplified and multichannel-compressed speech was measured for listeners with mild-to-severe sensorineural hearing loss and for a control group of listeners with normal hearing. In addition to the standard speech signal, which provided both temporal and spectral information, the listener's ability to use temporal information in a multichannel compressed signal was directly tested using a signal-correlated noise (SCN) stimulus. This stimulus consisted of a time-varying speech envelope modulating a two-channel noise carrier. It preserved temporal cues but provided minimal spectral information. For each stimulus condition, short-term level measurements were used to determine the range of audible speech. Multichannel compression improved speech recognition under conditions where superior audibility was provided by the twochannel compression system over linear amplification. When audibility of both linearly amplified and multichannel-compressed speech was maximized, the multichannel compression had no significant effect on speech recognition score for speech containing both temporal and spectral cues. However, results for the SCN stimuli show that more extreme amounts of multichannel compression can reduce use of temporal information.


2003 ◽  
Vol 14 (09) ◽  
pp. 453-470 ◽  
Author(s):  
Richard H. Wilson

A simple word-recognition task in multitalker babble for clinic use was developed in the course of four experiments involving listeners with normal hearing and listeners with hearing loss. In Experiments 1 and 2, psychometric functions for the individual NU No. 6 words from Lists 2, 3, and 4 were obtained with each word in a unique segment of multitalker babble. The test paradigm that emerged involved ten words at each of seven signal-to-babble ratios (S/B) from 0 to 24 dB. Experiment 3 examined the effect that babble presentation level (70, 80, and 90 dB SPL) had on recognition performance in babble, whereas Experiment 4 studied the effect that monaural and binaural listening had on recognition performance. For listeners with normal hearing, the 90th percentile was 6 dB S/B. In comparison to the listeners with normal hearing, the 50% correct points on the functions for listeners with hearing loss were at 5 to 15 dB higher signal-to-babble ratios.


1995 ◽  
Vol 38 (5) ◽  
pp. 1150-1156 ◽  
Author(s):  
Sandra Gordon-Salant ◽  
Peter J. Fitzgibbons

This study investigated the hypothesis that age effects exert an increased influence on speech recognition performance as the number of acoustic degradations Of the speech signal increases. Four groups participated: young listeners with normal hearing, elderly listeners with normal hearing, young listeners with hearing loss, and elderly listeners with hearing loss. Recognition was assessed for sentence materials degraded by noise, reverberation, or time compression, either in isolation or in binary combinations. Performance scores were converted to an equivalent signal-to-noise ratio index to facilitate direct comparison of the effects of different forms of stimulus degradation. Age effects were observed primarily in multiple degradation conditions featuring time compression of the stimuli. These results are discussed in terms of a postulated change in functional signal-to-noise ratio with increasing age.


Author(s):  
Khamis A. Al-Karawi

Background & Objective: Speaker Recognition (SR) techniques have been developed into a relatively mature status over the past few decades through development work. Existing methods typically use robust features extracted from clean speech signals, and therefore in idealized conditions can achieve very high recognition accuracy. For critical applications, such as security and forensics, robustness and reliability of the system are crucial. Methods: The background noise and reverberation as often occur in many real-world applications are known to compromise recognition performance. To improve the performance of speaker verification systems, an effective and robust technique is proposed to extract features for speech processing, capable of operating in the clean and noisy condition. Mel Frequency Cepstrum Coefficients (MFCCs) and Gammatone Frequency Cepstral Coefficients (GFCC) are the mature techniques and the most common features, which are used for speaker recognition. MFCCs are calculated from the log energies in frequency bands distributed over a mel scale. While GFCC has been acquired from a bank of Gammatone filters, which was originally suggested to model human cochlear filtering. This paper investigates the performance of GFCC and the conventional MFCC feature in clean and noisy conditions. The effects of the Signal-to-Noise Ratio (SNR) and language mismatch on the system performance have been taken into account in this work. Conclusion: Experimental results have shown significant improvement in system performance in terms of reduced equal error rate and detection error trade-off. Performance in terms of recognition rates under various types of noise, various Signal-to-Noise Ratios (SNRs) was quantified via simulation. Results of the study are also presented and discussed.


2008 ◽  
Vol 19 (06) ◽  
pp. 496-506 ◽  
Author(s):  
Richard H. Wilson ◽  
Rachel McArdle ◽  
Heidi Roberts

Background: So that portions of the classic Miller, Heise, and Lichten (1951) study could be replicated, new recorded versions of the words and digits were made because none of the three common monosyllabic word lists (PAL PB-50, CID W-22, and NU–6) contained the 9 monosyllabic digits (1–10, excluding 7) that were used by Miller et al. It is well established that different psychometric characteristics have been observed for different lists and even for the same materials spoken by different speakers. The decision was made to record four lists of each of the three monosyllabic word sets, the monosyllabic digits not included in the three sets of word lists, and the CID W-1 spondaic words. A professional female speaker with a General American dialect recorded the materials during four recording sessions within a 2-week interval. The recording order of the 582 words was random. Purpose: To determine—on listeners with normal hearing—the psychometric properties of the five speech materials presented in speech-spectrum noise. Research Design: A quasi-experimental, repeated-measures design was used. Study Sample: Twenty-four young adult listeners (M = 23 years) with normal pure-tone thresholds (≤20-dB HL at 250 to 8000 Hz) participated. The participants were university students who were unfamiliar with the test materials. Data Collection and Analysis: The 582 words were presented at four signal-to-noise ratios (SNRs; −7-, −2-, 3-, and 8-dB) in speech-spectrum noise fixed at 72-dB SPL. Although the main metric of interest was the 50% point on the function for each word established with the Spearman-Kärber equation (Finney, 1952), the percentage correct on each word at each SNR was evaluated. The psychometric characteristics of the PB-50, CID W-22, and NU–6 monosyllabic word lists were compared with one another, with the CID W-1 spondaic words, and with the 9 monosyllabic digits. Results: Recognition performance on the four lists within each of the three monosyllabic word materials were equivalent, ±0.4 dB. Likewise, word-recognition performance on the PB-50, W-22, and NU–6 word lists were equivalent, ±0.2 dB. The mean recognition performance at the 50% point with the 36 W-1 spondaic words was ˜6.2 dB lower than the 50% point with the monosyllabic words. Recognition performance on the monosyllabic digits was 1–2 dB better than mean performance on the monosyllabic words. Conclusions: Word-recognition performances on the three sets of materials (PB-50, CID W-22, and NU–6) were equivalent, as were the performances on the four lists that make up each of the three materials. Phonetic/phonemic balance does not appear to be an important consideration in the compilation of word-recognition lists used to evaluate the ability of listeners to understand speech.A companion paper examines the acoustic, phonetic/phonological, and lexical variables that may predict the relative ease or difficulty for which these monosyllable words were recognized in noise (McArdle and Wilson, this issue).


PLoS ONE ◽  
2018 ◽  
Vol 13 (7) ◽  
pp. e0200890
Author(s):  
Tianquan Feng ◽  
Qingrong Chen ◽  
Ming Yi ◽  
Zhongdang Xiao

Sign in / Sign up

Export Citation Format

Share Document