scholarly journals A New Approach to the Formant Measuring Problem

Proceedings ◽  
2019 ◽  
Vol 33 (1) ◽  
pp. 29 ◽  
Author(s):  
Marnix Van Soom ◽  
Bart de Boer

Formants are characteristic frequency components in human speech that are caused by resonances in the vocal tract during speech production. They are of primary concern in acoustic phonetics and speech recognition. Despite this, making accurate measurements of the formants, which we dub “the formant measurement problem” for convenience, is as yet not considered to be fully resolved. One particular shortcoming is the lack of error bars on the formant frequencies’ estimates. As a first step towards remedying this, we propose a new approach for the formant measuring problem in the particular case of steady-state vowels—a case which occurs quite abundantly in natural speech. The approach is to look at the formant measuring problem from the viewpoint of Bayesian spectrum analysis. We develop a pitch-synchronous linear model for steady-state vowels and apply it to the open-mid front unrounded vowel [ɛ] observed in a real speech utterance.

2003 ◽  
Vol 46 (3) ◽  
pp. 689-701 ◽  
Author(s):  
Steve An Xue ◽  
Grace Jianping Hao

This investigation used a derivation of acoustic reflection (AR) technology to make cross-sectional measurements of changes due to aging in the oral and pharyngeal lumina of male and female speakers. The purpose of the study was to establish preliminary normative data for such changes and to obtain acoustic measurements of changes due to aging in the formant frequencies of selected spoken vowels and their long-term average spectra (LTAS) analysis. Thirty- eight young men and women and 38 elderly men and women were involved in the study. The oral and pharyngeal lumina of the participants were measured with AR technology, and their formant frequencies were analyzed using the Kay Elemetrics Computerized Speech Lab. The findings have delineated specific and similar patterns of aging changes in human vocal tract configurations in speakers of both genders. Namely, the oral cavity length and volume of elderly speakers increased significantly compared to their young cohorts. The total vocal tract volume of elderly speakers also showed a significant increment, whereas the total vocal tract length of elderly speakers did not differ significantly from their young cohorts. Elderly speakers of both genders also showed similar patterns of acoustic changes of speech production, that is, consistent lowering of formant frequencies (especially F1) across selected vowel productions. Although new research models are still needed to succinctly account for the speech acoustic changes of the elderly, especially for their specific patterns of human vocal tract dimensional changes, this study has innovatively applied the noninvasive and cost-effective AR technology to monitor age-related human oral and pharyngeal lumina changes that have direct consequences for speech production.


Author(s):  
Jessica C Delmoral ◽  
Sandra M Rua Ventura ◽  
João Manuel RS Tavares

Quantification of the anatomic and functional aspects of the tongue is pertinent to analyse the mechanisms involved in speech production. Speech requires dynamic and complex articulation of the vocal tract organs, and the tongue is one of the main articulators during speech production. Magnetic resonance imaging has been widely used in speech-related studies. Moreover, the segmentation of such images of speech organs is required to extract reliable statistical data. However, standard solutions to analyse a large set of articulatory images have not yet been established. Therefore, this article presents an approach to segment the tongue in two-dimensional magnetic resonance images and statistically model the segmented tongue shapes. The proposed approach assesses the articulator morphology based on an active shape model, which captures the shape variability of the tongue during speech production. To validate this new approach, a dataset of mid-sagittal magnetic resonance images acquired from four subjects was used, and key aspects of the shape of the tongue during the vocal production of relevant European Portuguese vowels were evaluated.


Author(s):  
Isao Tokuda

In the source-filter theory, the mechanism of speech production is described as a two-stage process: (a) The air flow coming from the lungs induces tissue vibrations of the vocal folds (i.e., two small muscular folds located in the larynx) and generates the “source” sound. Turbulent airflows are also created at the glottis or at the vocal tract to generate noisy sound sources. (b) Spectral structures of these source sounds are shaped by the vocal tract “filter.” Through the filtering process, frequency components corresponding to the vocal tract resonances are amplified, while the other frequency components are diminished. The source sound mainly characterizes the vocal pitch (i.e., fundamental frequency), while the filter forms the timbre. The source-filter theory provides a very accurate description of normal speech production and has been applied successfully to speech analysis, synthesis, and processing. Separate control of the source (phonation) and the filter (articulation) is advantageous for acoustic communications, especially for human language, which requires expression of various phonemes realized by a flexible maneuver of the vocal tract configuration. Based on this idea, the articulatory phonetics focuses on the positions of the vocal organs to describe the produced speech sounds. The source-filter theory elucidates the mechanism of “resonance tuning,” that is, a specialized way of singing. To increase efficiency of the vocalization, soprano singers adjust the vocal tract filter to tune one of the resonances to the vocal pitch. Consequently, the main source sound is strongly amplified to produce a loud voice, which is well perceived in a large concert hall over the orchestra. It should be noted that the source–filter theory is based upon the assumption that the source and the filter are independent from each other. Under certain conditions, the source and the filter interact with each other. The source sound is influenced by the vocal tract geometry and by the acoustic feedback from the vocal tract. Such source–filter interaction induces various voice instabilities, for example, sudden pitch jump, subharmonics, resonance, quenching, and chaos.


2012 ◽  
Author(s):  
Hiroaki Hatano ◽  
Tatsuya Kitamura ◽  
Hironori Takemoto ◽  
Parham Mokhtari ◽  
Kiyoshi Honda ◽  
...  

1987 ◽  
Vol 30 (3) ◽  
pp. 301-305 ◽  
Author(s):  
Robert A. Prosek ◽  
Allen A. Montgomery ◽  
Brian E. Walden ◽  
David B. Hawkins

The formant frequencies of 15 adult stutterers' fluent and disfluent vowels and the formant frequencies of stutterers' and nonstutterers' fluent vowels were compared in an F1-F2 vowel space and in a normalized F1-F2 vowel space. The results indicated that differences in formant frequencies observed between the stutterers' and nonstutterers' vowels can be accounted for by differences among the vocal tract dimensions of the talkers. In addition, no differences were found between the formant frequencies of the fluent and disfluent vowels produced by the stutterers. The overall pattern of these results indicates that, contrary to recent reports (Klich & May, 1982), stutterers do not exhibit significantly greater vowel centralization than nonstutterers.


2006 ◽  
Vol 34 (03) ◽  
pp. 449-460 ◽  
Author(s):  
Yu Hsin Chang ◽  
Chia I Tsai ◽  
Jaung Geng Lin ◽  
Yue Der Lin ◽  
Tsai Chung Li ◽  
...  

Traditional Chinese Medicine (TCM) holds that Blood and Qi are fundamental substances in the human body for sustaining normal vital activity. The theory of Qi, Blood and Zang-Fu contribute the most important theoretical basis of human physiology in TCM. An animal model using conscious rats was employed in this study to further comprehend how organisms survive during acute hemorrhage by maintaining the functionalities of Qi and Blood through dynamically regulating visceral physiological conditions. Pulse waves of arterial blood pressure before and after the hemorrhage were taken in parallel to pulse spectrum analysis. Percentage differences of mean arterial blood pressure and harmonics were recorded in subsequent 5-minute intervals following the hemorrhage. Data were analyzed using a one-way analysis of variance (ANOVA) with Duncan's test for pairwise comparisons. Results showed that, within 30 minutes following the onset of acute hemorrhage,the reduction of mean arterial blood pressure was improved from 62% to 20%. Throughout the process, changes to the pulse spectrum appeared to result in a new balance over time. The percentage differences of the second and third harmonics, which were related to kidney and spleen, both increased significantly than baseline and towards another steady state. Apart from the steady state resulting from the previous stage, the percentage difference of the 4th harmonic decreased significantly to another steady state. The observed change could be attributed to the induction of functional Qi, and is a result of Qi-Blood balancing activity that organisms hold to survive against acute bleeding.


Author(s):  
Filipa M. B. Lã ◽  
Brian P. Gill

Singing performance is highly competitive; thus, finding strategies to accelerate the acquisition of knowledge that results in an efficient and effective vocal technique is of the utmost importance. There are many ways in which a singer may acquire an efficient and effective vocal technique, which can be based on the physiological processes of voice production. This chapter explores these processes within the context of singing performance. The authors examine three major aspects of singing: 1) efficient control of breathing, such that optimal airflow and subglottal pressure are available as needed, for a given frequency and intensity; 2) maximized laryngeal coordination, so that the voice source signal contains all the necessary frequency components for the desired tone; and 3) the modulation of the source signal by subtle shaping of the vocal tract. The advantages and disadvantages of various pedagogical methods are discussed, including breath management, known as appoggio, and different resonant strategies. The authors advocate for a scientifically-grounded teaching method, which allows for physiological differences between individuals, genders, and voice classifications.


Animals ◽  
2018 ◽  
Vol 8 (10) ◽  
pp. 167 ◽  
Author(s):  
Anton Baotic ◽  
Maxime Garcia ◽  
Markus Boeckle ◽  
Angela Stoeger

African savanna elephants live in dynamic fission–fusion societies and exhibit a sophisticated vocal communication system. Their most frequent call-type is the ‘rumble’, with a fundamental frequency (which refers to the lowest vocal fold vibration rate when producing a vocalization) near or in the infrasonic range. Rumbles are used in a wide variety of behavioral contexts, for short- and long-distance communication, and convey contextual and physical information. For example, maturity (age and size) is encoded in male rumbles by formant frequencies (the resonance frequencies of the vocal tract), having the most informative power. As sound propagates, however, its spectral and temporal structures degrade progressively. Our study used manipulated and resynthesized male social rumbles to simulate large and small individuals (based on different formant values) to quantify whether this phenotypic information efficiently transmits over long distances. To examine transmission efficiency and the potential influences of ecological factors, we broadcasted and re-recorded rumbles at distances of up to 1.5 km in two different habitats at the Addo Elephant National Park, South Africa. Our results show that rumbles were affected by spectral–temporal degradation over distance. Interestingly and unlike previous findings, the transmission of formants was better than that of the fundamental frequency. Our findings demonstrate the importance of formant frequencies for the efficiency of rumble propagation and the transmission of information content in a savanna elephant’s natural habitat.


Sign in / Sign up

Export Citation Format

Share Document