Formant-Based Recognition of Words and Other Naturalistic Sounds in Rhesus Monkeys

In social animals, identifying sounds is critical for communication. In humans, the acoustic parameters involved in speech recognition, such as the formant frequencies derived from the resonance of the supralaryngeal vocal tract, have been well documented. However, how formants contribute to recognizing learned sounds in non-human primates remains unclear. To determine this, we trained two rhesus monkeys to discriminate target and non-target sounds presented in sequences of 1–3 sounds. After training, we performed three experiments: (1) We tested the monkeys’ accuracy and reaction times during the discrimination of various acoustic categories; (2) their ability to discriminate morphing sounds; and (3) their ability to identify sounds consisting of formant 1 (F1), formant 2 (F2), or F1 and F2 (F1F2) pass filters. Our results indicate that macaques can learn diverse sounds and discriminate from morphs and formants F1 and F2, suggesting that information from few acoustic parameters suffice for recognizing complex sounds. We anticipate that future neurophysiological experiments in this paradigm may help elucidate how formants contribute to the recognition of sounds.

Download Full-text

Red deer stags use formants as assessment cues during intrasexual agonistic interactions

Proceedings of The Royal Society B Biological Sciences ◽

10.1098/rspb.2004.2954 ◽

2005 ◽

Vol 272 (1566) ◽

pp. 941-947 ◽

Cited By ~ 198

Author(s):

David Reby ◽

Karen McComb ◽

Bruno Cargnelutti ◽

Chris Darwin ◽

W. Tecumseh Fitch ◽

...

Keyword(s):

Red Deer ◽

Vocal Tract ◽

Animal Communication ◽

Agonistic Interactions ◽

Playback Experiments ◽

Acoustic Parameters ◽

Human Speech ◽

Formant Frequencies ◽

Body Sizes

While vocal tract resonances or formants are key acoustic parameters that define differences between phonemes in human speech, little is known about their function in animal communication. Here, we used playback experiments to present red deer stags with re-synthesized vocalizations in which formant frequencies were systematically altered to simulate callers of different body sizes. In response to stimuli where lower formants indicated callers with longer vocal tracts, stags were more attentive, replied with more roars and extended their vocal tracts further in these replies. Our results indicate that mammals other than humans use formants in vital vocal exchanges and can adjust their own formant frequencies in relation to those that they hear.

Download Full-text

Influences of Fundamental Frequency, Formant Frequencies, Aperiodicity, and Spectrum Level on the Perception of Voice Gender

Journal of Speech Language and Hearing Research ◽

10.1044/1092-4388(2013/12-0314) ◽

2014 ◽

Vol 57 (1) ◽

pp. 285-296 ◽

Cited By ~ 44

Author(s):

Verena G. Skuk ◽

Stefan R. Schweinberger

Keyword(s):

Fundamental Frequency ◽

Vocal Tract ◽

Spectral Envelope ◽

Relative Importance ◽

Acoustic Parameters ◽

Spectrum Level ◽

Formant Frequencies ◽

Gender Perception ◽

Scale Factors ◽

Baseline Experiment

Purpose To determine the relative importance of acoustic parameters (fundamental frequency [F0], formant frequencies [FFs], aperiodicity, and spectrum level [SL]) on voice gender perception, the authors used a novel parameter-morphing approach that, unlike spectral envelope shifting, allows the application of nonuniform scale factors to transform formants and more direct comparison of parameter impact. Method In each of 2 experiments, 16 listeners with normal hearing (8 female, 8 male) classified voice gender for morphs between female and male speakers, using syllable tokens from 2 male–female speaker pairs. Morphs varied single acoustic parameters (Experiment 1) or selected combinations (Experiment 2), keeping residual parameters androgynous, as determined in a baseline experiment. Results The strongest cue related to gender perception was F0, followed by FF and SL. Aperiodicity did not systematically influence gender perception. Morphing F0 and FF in conjunction produced convincing changes in perceived gender—changes that were equivalent to those for Full morphs interpolating all parameters. Despite the importance of F0, morphing FF and SL in combination produced effective changes in voice gender perception. Conclusions The most important single parameters for gender perception are, in order, F0, FF, and SL. At the same time, F0 and vocal tract resonances have a comparable impact on voice gender perception.

Download Full-text

Enhancing vocal tract length normalization with elastic registration for automatic speech recognition

10.21437/interspeech.2012-393 ◽

2012 ◽

Author(s):

Florian Müller ◽

Alfred Mertins

Keyword(s):

Speech Recognition ◽

Automatic Speech Recognition ◽

Vocal Tract ◽

Elastic Registration ◽

Tract Length ◽

Vocal Tract Length Normalization

Download Full-text

Correlation between vocal tract length, body height, formant frequencies, and pitch frequency for the five Japanese vowels uttered by fifteen male speakers

10.21437/interspeech.2012-143 ◽

2012 ◽

Author(s):

Hiroaki Hatano ◽

Tatsuya Kitamura ◽

Hironori Takemoto ◽

Parham Mokhtari ◽

Kiyoshi Honda ◽

...

Keyword(s):

Vocal Tract ◽

Body Height ◽

Tract Length ◽

Formant Frequencies ◽

Pitch Frequency

Download Full-text

Formant Frequencies of Stuttered and Fluent Vowels

Journal of Speech Language and Hearing Research ◽

10.1044/jshr.3003.301 ◽

1987 ◽

Vol 30 (3) ◽

pp. 301-305 ◽

Cited By ~ 12

Author(s):

Robert A. Prosek ◽

Allen A. Montgomery ◽

Brian E. Walden ◽

David B. Hawkins

Keyword(s):

Vocal Tract ◽

Formant Frequencies ◽

Vowel Space

The formant frequencies of 15 adult stutterers' fluent and disfluent vowels and the formant frequencies of stutterers' and nonstutterers' fluent vowels were compared in an F1-F2 vowel space and in a normalized F1-F2 vowel space. The results indicated that differences in formant frequencies observed between the stutterers' and nonstutterers' vowels can be accounted for by differences among the vocal tract dimensions of the talkers. In addition, no differences were found between the formant frequencies of the fluent and disfluent vowels produced by the stutterers. The overall pattern of these results indicates that, contrary to recent reports (Klich & May, 1982), stutterers do not exhibit significantly greater vowel centralization than nonstutterers.

Download Full-text

Vocal tract length normalisation approaches to DNN-based children's and adults' speech recognition

2014 IEEE Spoken Language Technology Workshop (SLT) ◽

10.1109/slt.2014.7078563 ◽

2014 ◽

Cited By ~ 22

Author(s):

Romain Serizel ◽

Diego Giuliani

Keyword(s):

Speech Recognition ◽

Vocal Tract ◽

Tract Length

Download Full-text

Field Propagation Experiments of Male African Savanna Elephant Rumbles: A Focus on the Transmission of Formant Frequencies

Animals ◽

10.3390/ani8100167 ◽

2018 ◽

Vol 8 (10) ◽

pp. 167 ◽

Cited By ~ 2

Author(s):

Anton Baotic ◽

Maxime Garcia ◽

Markus Boeckle ◽

Angela Stoeger

Keyword(s):

Fundamental Frequency ◽

Vocal Communication ◽

Vocal Tract ◽

Natural Habitat ◽

Ecological Factors ◽

Transmission Efficiency ◽

Long Distance ◽

African Savanna ◽

Formant Frequencies ◽

Resonance Frequencies

African savanna elephants live in dynamic fission–fusion societies and exhibit a sophisticated vocal communication system. Their most frequent call-type is the ‘rumble’, with a fundamental frequency (which refers to the lowest vocal fold vibration rate when producing a vocalization) near or in the infrasonic range. Rumbles are used in a wide variety of behavioral contexts, for short- and long-distance communication, and convey contextual and physical information. For example, maturity (age and size) is encoded in male rumbles by formant frequencies (the resonance frequencies of the vocal tract), having the most informative power. As sound propagates, however, its spectral and temporal structures degrade progressively. Our study used manipulated and resynthesized male social rumbles to simulate large and small individuals (based on different formant values) to quantify whether this phenotypic information efficiently transmits over long distances. To examine transmission efficiency and the potential influences of ecological factors, we broadcasted and re-recorded rumbles at distances of up to 1.5 km in two different habitats at the Addo Elephant National Park, South Africa. Our results show that rumbles were affected by spectral–temporal degradation over distance. Interestingly and unlike previous findings, the transmission of formants was better than that of the fundamental frequency. Our findings demonstrate the importance of formant frequencies for the efficiency of rumble propagation and the transmission of information content in a savanna elephant’s natural habitat.

Download Full-text

Vocal tract area functions and formant frequencies in opera tenors’ modal and falsetto registers

The Journal of the Acoustical Society of America ◽

10.1121/1.3589249 ◽

2011 ◽

Vol 129 (6) ◽

pp. 3955-3963 ◽

Cited By ~ 24

Author(s):

Matthias Echternach ◽

Johan Sundberg ◽

Tobias Baumann ◽

Michael Markl ◽

Bernhard Richter

Keyword(s):

Vocal Tract ◽

Formant Frequencies

Download Full-text

Changes in the Human Vocal Tract Due to Aging and the Acoustic Correlates of Speech Production

Journal of Speech Language and Hearing Research ◽

10.1044/1092-4388(2003/054) ◽

2003 ◽

Vol 46 (3) ◽

pp. 689-701 ◽

Cited By ~ 79

Author(s):

Steve An Xue ◽

Grace Jianping Hao

Keyword(s):

Speech Production ◽

Vocal Tract ◽

Cost Effective ◽

The Elderly ◽

Cross Sectional ◽

Men And Women ◽

Acoustic Reflection ◽

Dimensional Changes ◽

Formant Frequencies ◽

Age Related

This investigation used a derivation of acoustic reflection (AR) technology to make cross-sectional measurements of changes due to aging in the oral and pharyngeal lumina of male and female speakers. The purpose of the study was to establish preliminary normative data for such changes and to obtain acoustic measurements of changes due to aging in the formant frequencies of selected spoken vowels and their long-term average spectra (LTAS) analysis. Thirty- eight young men and women and 38 elderly men and women were involved in the study. The oral and pharyngeal lumina of the participants were measured with AR technology, and their formant frequencies were analyzed using the Kay Elemetrics Computerized Speech Lab. The findings have delineated specific and similar patterns of aging changes in human vocal tract configurations in speakers of both genders. Namely, the oral cavity length and volume of elderly speakers increased significantly compared to their young cohorts. The total vocal tract volume of elderly speakers also showed a significant increment, whereas the total vocal tract length of elderly speakers did not differ significantly from their young cohorts. Elderly speakers of both genders also showed similar patterns of acoustic changes of speech production, that is, consistent lowering of formant frequencies (especially F1) across selected vowel productions. Although new research models are still needed to succinctly account for the speech acoustic changes of the elderly, especially for their specific patterns of human vocal tract dimensional changes, this study has innovatively applied the noninvasive and cost-effective AR technology to monitor age-related human oral and pharyngeal lumina changes that have direct consequences for speech production.

Download Full-text