Effect of Consonant-Vowel Ratio Modification on Amplitude Envelope Cues for Consonant Recognition

1991 ◽  
Vol 34 (2) ◽  
pp. 415-426 ◽  
Author(s):  
Richard L. Freyman ◽  
G. Patrick Nerbonne ◽  
Heather A. Cote

This investigation examined the degree to which modification of the consonant-vowel (C-V) intensity ratio affected consonant recognition under conditions in which listeners were forced to rely more heavily on waveform envelope cues than on spectral cues. The stimuli were 22 vowel-consonant-vowel utterances, which had been mixed at six different signal-to-noise ratios with white noise that had been modulated by the speech waveform envelope. The resulting waveforms preserved the gross speech envelope shape, but spectral cues were limited by the white-noise masking. In a second stimulus set, the consonant portion of each utterance was amplified by 10 dB. Sixteen subjects with normal hearing listened to the unmodified stimuli, and 16 listened to the amplified-consonant stimuli. Recognition performance was reduced in the amplified-consonant condition for some consonants, presumably because waveform envelope cues had been distorted. However, for other consonants, especially the voiced stops, consonant amplification improved recognition. Patterns of errors were altered for several consonant groups, including some that showed only small changes in recognition scores. The results indicate that when spectral cues are compromised, nonlinear amplification can alter waveform envelope cues for consonant recognition.

1996 ◽  
Vol 39 (6) ◽  
pp. 1124-1137 ◽  
Author(s):  
Richard L. Freyman ◽  
G. Patrick Nerbonne

The perceptual consequences of expanding the amplitude variations in speech were studied under conditions in which spectral information was obscured by signal correlated noise that had an envelope correlated with the speech envelope, but had a flat amplitude spectrum. The noise samples, created individually from 22 vowel-consonant-vowel nonsense words, were used as maskers of those words, with signal-to-noise ratios ranging from –15 to 0 dB. Amplitude expansion was by a factor of 3.0 in terms of decibels. In the first experiment, presentation level for speech peaks was 80 dB SPL. Consonant recognition performance for expanded speech by 50 listeners with normal hearing was as much as 30 percentage points poorer than for unexpanded speech and the types of errors were dramatically different, especially in the midrange of S-N ratios. In a second experiment presentation level was varied to determine whether reductions in consonant levels produced by expansion were responsible for the differences between conditions. Recognition performance for unexpanded speech at 40 dB SPL was nearly equivalent to that for expanded speech at 80 dB SPL. The error patterns obtained in these two conditions were different, suggesting that the differences between conditions in Experiment 1 were due largely to expanded amplitude envelopes rather than differences in audibility.


Author(s):  
Khamis A. Al-Karawi

Background & Objective: Speaker Recognition (SR) techniques have been developed into a relatively mature status over the past few decades through development work. Existing methods typically use robust features extracted from clean speech signals, and therefore in idealized conditions can achieve very high recognition accuracy. For critical applications, such as security and forensics, robustness and reliability of the system are crucial. Methods: The background noise and reverberation as often occur in many real-world applications are known to compromise recognition performance. To improve the performance of speaker verification systems, an effective and robust technique is proposed to extract features for speech processing, capable of operating in the clean and noisy condition. Mel Frequency Cepstrum Coefficients (MFCCs) and Gammatone Frequency Cepstral Coefficients (GFCC) are the mature techniques and the most common features, which are used for speaker recognition. MFCCs are calculated from the log energies in frequency bands distributed over a mel scale. While GFCC has been acquired from a bank of Gammatone filters, which was originally suggested to model human cochlear filtering. This paper investigates the performance of GFCC and the conventional MFCC feature in clean and noisy conditions. The effects of the Signal-to-Noise Ratio (SNR) and language mismatch on the system performance have been taken into account in this work. Conclusion: Experimental results have shown significant improvement in system performance in terms of reduced equal error rate and detection error trade-off. Performance in terms of recognition rates under various types of noise, various Signal-to-Noise Ratios (SNRs) was quantified via simulation. Results of the study are also presented and discussed.


1995 ◽  
Vol 24 (3) ◽  
pp. 165-173 ◽  
Author(s):  
Sirkku K. Salo ◽  
A. Heikki Lang ◽  
Altti J. Salmivalli

1979 ◽  
Vol 44 (3) ◽  
pp. 354-362 ◽  
Author(s):  
Jeffrey L. Danhauer ◽  
Jonathan G. Leppler

Thirty-five normal-hearing listeners' speech discrimination scores were obtained for the California Consonant Test (CCT) in four noise competitors: (1) a four-talker complex (FT), (2) a nine-talker complex developed at Bowling Green State University (BGMTN), (3) cocktail party noise (CPN), and (4) white noise (WN). Five listeners received the CCT stimuli mixed ipsilaterally with each of the competing noises at one of seven different signal-to-noise ratios (S/Ns). Articulation functions were plotted for each noise competitor. Statistical analysis revealed that the noise types produced few differences on the CCT scores over most of the S/Ns tested, but that noise competitors similar to peripheral maskers (CPN and WN) had less effect on the scores at more severe levels than competitors more similar to perceptual maskers (FT and BGMTN). Results suggest that the CCT should be sufficiently difficult even without the presence of a noise competitor for normal-hearing listeners in many audiologic testing situations. Levels that should approximate CCT maximum discrimination (D-Max) scores for normal listeners are suggested for use when clinic time does not permit the establishment of articulation functions. The clinician should determine the S/N of the CCT tape itself before establishing listening levels.


Behaviour ◽  
2019 ◽  
Vol 157 (1) ◽  
pp. 59-76 ◽  
Author(s):  
Erin E. Grabarczyk ◽  
Sharon A. Gill

Abstract During the breeding season, avian pairs coordinate interactions with songs and calls. For cavity nesting birds, females inside nest boxes may rely on male vocalizations for information. Anthropogenic noise masks male songs, which could affect information gained by females. We explored song transmission from a female house wren (Troglodytes aedon) perspective, testing the hypothesis that noise masking alters songs that reach females inside nest boxes. We broadcast songs at three distances up to 25 m from nest boxes and re-recorded songs using two microphones, positioned inside and outside nest boxes. We measured signal-to-noise ratios and cross-correlation factors to estimate the effects of masking on transmission. In noise, songs received inside nest boxes had lower signal-to-noise ratios and cross-correlation factors than songs recorded outside of boxes, and these effects decreased with distance. For females, noise may reduce information conveyed through male songs and in response pairs may need to adjust their interactions.


1993 ◽  
Vol 71 (5) ◽  
pp. 926-932 ◽  
Author(s):  
S. D. Turnbull ◽  
J. M. Terhune

Pure-tone hearing thresholds of a harbour seal (Phoca vitulina) were measured in air and underwater using behavioural psychophysical techniques. A 50-ms sinusoidal pulse was presented in both white-noise masked and unmasked situations at pulse repetition rates of 1, 2, 4, and 10/s. Test frequencies were 0.5, 1.0, 2.0, 4.0, and 8.0 kHz in air and 2.0, 4.0, 8.0, and 16.0 kHz underwater. Relative to 1 pulse/s, mean threshold shifts were −1, −3, and −5 dB at 2, 4, and 10 pulses/s, respectively. The threshold shifts from 1 to 10 pulses/s were significant (F = 12.457, df = 2,36, p < 0.001) and there was no difference in the threshold shifts between the masked and unmasked situations (F = 2.585; df = 1,50; p > 0.10). Broadband masking caused by meteorological or industrial sources will closely resemble the white-noise situation. At high calling rates, the numerous overlapping calls of some species (e.g., harp seal, Phoca groenlandica) present virtually continous "background noise" which also resembles the broadband white-noise masking situation. An implication of lower detection thresholds is that if a seal regularly repeats short vocalizations, the communication range of that call could be increased significantly (80% at 10 pulses/s). This could have important implications during the breeding season should storms or shipping noises occur or when some pinniped species become increasingly vocal and the background noise of conspecifics increases.


1974 ◽  
Vol 17 (2) ◽  
pp. 270-278 ◽  
Author(s):  
Brian E. Walden ◽  
Robert A. Prosek ◽  
Don W. Worthington

The redundancy between the auditory and visual recognition of consonants was studied in 100 hearing-impaired subjects who demonstrated a wide range of speech-discrimination abilities. Twenty English consonants, recorded in CV combination with the vowel /a/, were presented to the subjects for auditory, visual, and audiovisual identification. There was relatively little variation among subjects in the visual recognition of consonants. A measure of the expected degree of redundancy between an observer’s auditory and visual confusions among consonants was used in an effort to predict audiovisual consonant recognition ability. This redundancy measure was based on an information analysis of an observer’s auditory confusions among consonants and expressed the degree to which his auditory confusions fell within categories of visually homophenous consonants. The measure was found to have moderate predictive value in estimating an observer’s audiovisual consonant recognition score. These results suggest that the degree of redundancy between an observer’s auditory and visual confusions of speech elements is a determinant in the benefit that visual cues offer to that observer.


1994 ◽  
Vol 95 (5) ◽  
pp. 2991-2991
Author(s):  
Richard L. Freyman ◽  
G. Patrick Nerbonne ◽  
Diane Tharp ◽  
Emily Stanford

1981 ◽  
Vol 24 (2) ◽  
pp. 207-216 ◽  
Author(s):  
Brian E. Walden ◽  
Sue A. Erdman ◽  
Allen A. Montgomery ◽  
Daniel M. Schwartz ◽  
Robert A. Prosek

The purpose of this research was to determine some of the effects of consonant recognition training on the speech recognition performance of hearing-impaired adults. Two groups of ten subjects each received seven hours of either auditory or visual consonant recognition training, in addition to a standard two-week, group-oriented, inpatient aural rehabilitation program. A third group of fifteen subjects received the standard two-week program, but no supplementary individual consonant recognition training. An audiovisual sentence recognition test, as well as tests of auditory and visual consonant recognition, were administered both before and ibltowing training. Subjects in all three groups significantly increased in their audiovisual sentence recognition performance, but subjects receiving the individual consonant recognition training improved significantly more than subjects receiving only the standard two-week program. A significant increase in consonant recognition performance was observed in the two groups receiving the auditory or visual consonant recognition training. The data are discussed from varying statistical and clinical perspectives.


Sign in / Sign up

Export Citation Format

Share Document