Vocal tract modeling techniques: from human voice to non-human primates vocalizations

In an aeroacoustic simulation of human voice production, the effect of the sub-grid scale (SGS) model on the acoustic spectrum was investigated. In the first step, incompressible airflow in a 3D model of larynx with vocal folds undergoing prescribed two-degree-of-freedom oscillation was simulated by laminar and Large-Eddy Simulations (LES), using the One-Equation and Wall-Adaptive Local-Eddy (WALE) SGS models. Second, the aeroacoustic sources and the sound propagation in a domain composed of the larynx and vocal tract were computed by the Perturbed Convective Wave Equation (PCWE) for vowels [u:] and [i:]. The results show that the SGS model has a significant impact not only on the flow field, but also on the spectrum of the sound sampled 1 cm downstream of the lips. With the WALE model, which is known to handle the near-wall and high-shear regions more precisely, the simulations predict significantly higher peak volumetric flow rates of air than those of the One-Equation model, only slightly lower than the laminar simulation. The usage of the WALE SGS model also results in higher sound pressure levels of the higher harmonic frequencies.

Download Full-text

i-Vector-Based Speaker Verification on Limited Data Using Fusion Techniques

Journal of Intelligent Systems ◽

10.1515/jisys-2017-0047 ◽

2018 ◽

Vol 29 (1) ◽

pp. 565-582

Author(s):

T.R. Jayanthi Kumari ◽

H.S. Jayanna

Keyword(s):

Linear Prediction ◽

Vocal Tract ◽

Speaker Verification ◽

Excitation Source ◽

Limited Data ◽

Score Level Fusion ◽

Verification System ◽

Modeling Techniques ◽

Prediction Residual ◽

Level Fusion

Abstract In many biometric applications, limited data speaker verification plays a significant role in practical-oriented systems to verify the speaker. The performance of the speaker verification system needs to be improved by applying suitable techniques to limited data condition. The limited data represent both train and test data duration in terms of few seconds. This article shows the importance of the speaker verification system under limited data condition using feature- and score-level fusion techniques. The baseline speaker verification system uses vocal tract features like mel-frequency cepstral coefficients, linear predictive cepstral coefficients and excitation source features like linear prediction residual and linear prediction residual phase as features along with i-vector modeling techniques using the NIST 2003 data set. In feature-level fusion, the vocal tract features are fused with excitation source features. As a result, on average, equal error rate (EER) is approximately equal to 4% compared to individual feature performance. Further in this work, two different types of score-level fusion are demonstrated. In the first case, fusing the scores of vocal tract features and excitation source features at score-level-maintaining modeling technique remains the same, which provides an average reduction approximately equal to 2% EER compared to feature-level fusion performance. In the second case, scores of the different modeling techniques are combined, which has resulted in EER reduction approximately equal to 4.5% compared with score-level fusion of different features.

Download Full-text

Exploring human voice prosodic features and the interaction between the excitation signal and vocal tract for Assamese speech

International Journal of Speech Technology ◽

10.1007/s10772-021-09946-5 ◽

2022 ◽

Author(s):

Sippee Bharadwaj ◽

Purnendu Bikash Acharjee

Keyword(s):

Vocal Tract ◽

Prosodic Features ◽

Excitation Signal ◽

Human Voice

Download Full-text

Anatomical structures involved in non-human vocalization

ZAS Papers in Linguistics ◽

10.21248/zaspil.40.2005.256 ◽

2005 ◽

Vol 40 ◽

pp. 33-43

Author(s):

Alban Gebler ◽

Roland Frey

Keyword(s):

Sound Production ◽

Vocal Tract ◽

Mammalian Species ◽

Thyroid Cartilage ◽

Vocal Folds ◽

Anatomical Structures ◽

Volume Increase ◽

Human Voice ◽

Acoustic Analyses ◽

Mass Increase

In order to understand the functional morphology of the human voice producing system, we are in need of data on the vocal tract anatomy of other mammalian species. The larynges and vocal tracts of four species of Artiodactyla were investigated in combination with acoustic analyses of their respective calls. Different evolutionary specializations of laryngeal characters may lead to similar effects on sound production. In the investigated species, such specializations are: the elongation and mass increase of the vocal folds, the volume increase of the laryngeal vestibulum by an enlarged thyroid cartilage and the formation of laryngeal ventricles. Both the elongation of the vocal folds and the increase of the oscillating masses lower the fundamental frequency. The influence of an increased volume of the laryngeal vestibulum on sound production remains unclear. The anatomical and acoustic results are presented together with considerations about the habitats and the mating systems of the respective species.

Download Full-text

Acoustic evolution of old Italian violins from Amati to Stradivari

Proceedings of the National Academy of Sciences ◽

10.1073/pnas.1800666115 ◽

2018 ◽

Vol 115 (23) ◽

pp. 5926-5931 ◽

Cited By ~ 4

Author(s):

Hwan-Ching Tai ◽

Yen-Ping Shen ◽

Jer-Horng Lin ◽

Dai-Ting Chung

Keyword(s):

Central Region ◽

Vocal Tract ◽

Predictive Coding ◽

Linear Predictive Coding ◽

Response Curves ◽

Male And Female ◽

Human Voice ◽

Construction Methods ◽

Acoustic Output ◽

The Ideal

The shape and design of the modern violin are largely influenced by two makers from Cremona, Italy: The instrument was invented by Andrea Amati and then improved by Antonio Stradivari. Although the construction methods of Amati and Stradivari have been carefully examined, the underlying acoustic qualities which contribute to their popularity are little understood. According to Geminiani, a Baroque violinist, the ideal violin tone should “rival the most perfect human voice.” To investigate whether Amati and Stradivari violins produce voice-like features, we recorded the scales of 15 antique Italian violins as well as male and female singers. The frequency response curves are similar between the Andrea Amati violin and human singers, up to ∼4.2 kHz. By linear predictive coding analyses, the first two formants of the Amati exhibit vowel-like qualities (F1/F2 = 503/1,583 Hz), mapping to the central region on the vowel diagram. Its third and fourth formants (F3/F4 = 2,602/3,731 Hz) resemble those produced by male singers. Using F1 to F4 values to estimate the corresponding vocal tract length, we observed that antique Italian violins generally resemble basses/baritones, but Stradivari violins are closer to tenors/altos. Furthermore, the vowel qualities of Stradivari violins show reduced backness and height. The unique formant properties displayed by Stradivari violins may represent the acoustic correlate of their distinctive brilliance perceived by musicians. Our data demonstrate that the pioneering designs of Cremonese violins exhibit voice-like qualities in their acoustic output.

Download Full-text

Estimation and Statistical Analysis of Human Voice Parameters to Investigate the Influence of Psychological Stress and to Determine the Vocal Tract Transfer Function of an Individual

Journal of Computer Networks and Communications ◽

10.1155/2014/290147 ◽

2014 ◽

Vol 2014 ◽

pp. 1-17 ◽

Cited By ~ 4

Author(s):

Puneet Kumar Mongia ◽

R. K. Sharma

Keyword(s):

Statistical Analysis ◽

Transfer Function ◽

Psychological Stress ◽

Vocal Tract ◽

Inverse Filtering ◽

Human Voice ◽

Filtering Technique ◽

The Mean ◽

The Relationship ◽

The Voice

In this study the principal focus is to examine the influence of psychological stress (both positive and negative stress) on the human articulation and to determine the vocal tract transfer function of an individual using inverse filtering technique. Both of these analyses are carried out by estimating various voice parameters. The outcomes of the analysis of psychological stress indicate that all the voice parameters are affected due to the influence of stress on humans. About 35 out of 51 parameters follow a unique course of variation from normal to positive and negative stress in 32% of the total analyzed signals. The upshot of the analysis is to determine the vocal tract transfer function for each vowel for an individual. The analysis indicates that it can be computed by estimating the mean of the pole zero plots of that individual’s vocal tract estimated for the whole day. Besides this, an analysis is presented to find the relationship between the LPC coefficients of the vocal tract and the vocal tract cavities. The results of the analysis indicate that all the LPC coefficients of the vocal tract are affected due to change in the position of any cavity.

Download Full-text

Contributions of Voice and Nonverbal Communication to Perceived Masculinity–Femininity for Cisgender and Transgender Communicators

Journal of Speech Language and Hearing Research ◽

10.1044/2019_jslhr-19-00387 ◽

2020 ◽

Vol 63 (4) ◽

pp. 931-947

Author(s):

Teresa L. D. Hardy ◽

Carol A. Boliek ◽

Daniel Aalto ◽

Justin Lewicke ◽

Kristopher Wells ◽

...

Keyword(s):

Fundamental Frequency ◽

Sound Pressure Level ◽

Sound Pressure ◽

Vocal Tract ◽

Presentation Mode ◽

Pressure Level ◽

Presentation Modes ◽

Audiovisual Stimuli ◽

Vocal Tract Resonance ◽

Point Light

Purpose The purpose of this study was twofold: (a) to identify a set of communication-based predictors (including both acoustic and gestural variables) of masculinity–femininity ratings and (b) to explore differences in ratings between audio and audiovisual presentation modes for transgender and cisgender communicators. Method The voices and gestures of a group of cisgender men and women ( n = 10 of each) and transgender women ( n = 20) communicators were recorded while they recounted the story of a cartoon using acoustic and motion capture recording systems. A total of 17 acoustic and gestural variables were measured from these recordings. A group of observers ( n = 20) rated each communicator's masculinity–femininity based on 30- to 45-s samples of the cartoon description presented in three modes: audio, visual, and audio visual. Visual and audiovisual stimuli contained point light displays standardized for size. Ratings were made using a direct magnitude estimation scale without modulus. Communication-based predictors of masculinity–femininity ratings were identified using multiple regression, and analysis of variance was used to determine the effect of presentation mode on perceptual ratings. Results Fundamental frequency, average vowel formant, and sound pressure level were identified as significant predictors of masculinity–femininity ratings for these communicators. Communicators were rated significantly more feminine in the audio than the audiovisual mode and unreliably in the visual-only mode. Conclusions Both study purposes were met. Results support continued emphasis on fundamental frequency and vocal tract resonance in voice and communication modification training with transgender individuals and provide evidence for the potential benefit of modifying sound pressure level, especially when a masculine presentation is desired.

Download Full-text

Assessment of Tongue Position and Laryngeal Height in Two Professional Voice Populations

Journal of Speech Language and Hearing Research ◽

10.1044/2019_jslhr-19-00164 ◽

2020 ◽

Vol 63 (1) ◽

pp. 109-124

Author(s):

Carly Jo Hosbach-Cannon ◽

Soren Y. Lowell ◽

Raymond H. Colton ◽

Richard T. Kelley ◽

Xue Bao

Keyword(s):

Vocal Fold ◽

Vocal Tract ◽

Current Knowledge ◽

Acoustic Resonance ◽

Contact Dynamics ◽

Voice Disorder ◽

Music Program ◽

Tongue Position ◽

Singer Groups ◽

Professional Voice

Purpose To advance our current knowledge of singer physiology by using ultrasonography in combination with acoustic measures to compare physiological differences between musical theater (MT) and opera (OP) singers under controlled phonation conditions. Primary objectives addressed in this study were (a) to determine if differences in hyolaryngeal and vocal fold contact dynamics occur between two professional voice populations (MT and OP) during singing tasks and (b) to determine if differences occur between MT and OP singers in oral configuration and associated acoustic resonance during singing tasks. Method Twenty-one singers (10 MT and 11 OP) were included. All participants were currently enrolled in a music program. Experimental procedures consisted of sustained phonation on the vowels /i/ and /ɑ/ during both a low-pitch task and a high-pitch task. Measures of hyolaryngeal elevation, tongue height, and tongue advancement were assessed using ultrasonography. Vocal fold contact dynamics were measured using electroglottography. Simultaneous acoustic recordings were obtained during all ultrasonography procedures for analysis of the first two formant frequencies. Results Significant oral configuration differences, reflected by measures of tongue height and tongue advancement, were seen between groups. Measures of acoustic resonance also showed significant differences between groups during specific tasks. Both singer groups significantly raised their hyoid position when singing high-pitched vowels, but hyoid elevation was not statistically different between groups. Likewise, vocal fold contact dynamics did not significantly differentiate the two singer groups. Conclusions These findings suggest that, under controlled phonation conditions, MT singers alter their oral configuration and achieve differing resultant formants as compared with OP singers. Because singers are at a high risk of developing a voice disorder, understanding how these two groups of singers adjust their vocal tract configuration during their specific singing genre may help to identify risky vocal behavior and provide a basis for prevention of voice disorders.

Download Full-text

Effects of a 6-Week Straw Phonation in Water Exercise Program on the Aging Voice

Journal of Speech Language and Hearing Research ◽

10.1044/2020_jslhr-19-00124 ◽

2020 ◽

Vol 63 (4) ◽

pp. 1018-1032

Author(s):

Chia-Hsin Wu ◽

Roger W. Chan

Keyword(s):

Acoustic Analysis ◽

Vocal Tract ◽

Exercise Program ◽

Analysis Of Covariance ◽

Elderly Subjects ◽

Control Group ◽

Perceptual Evaluation ◽

Positive Effects ◽

Aging Voice ◽

Before And After

Purpose Semi-occluded vocal tract (SOVT) exercises with tubes or straws have been widely used for a variety of voice disorders. Yet, the effects of longer periods of SOVT exercises (lasting for weeks) on the aging voice are not well understood. This study investigated the effects of a 6-week straw phonation in water (SPW) exercise program. Method Thirty-seven elderly subjects with self-perceived voice problems were assigned into two groups: (a) SPW exercises with six weekly sessions and home practice (experimental group) and (b) vocal hygiene education (control group). Before and after intervention (2 weeks after the completion of the exercise program), acoustic analysis, auditory–perceptual evaluation, and self-assessment of vocal impairment were conducted. Results Analysis of covariance revealed significant differences between the two groups in smoothed cepstral peak prominence measures, harmonics-to-noise ratio, the auditory–perceptual parameter of breathiness, and Voice Handicap Index-10 scores postintervention. No significant differences between the two groups were found for other measures. Conclusions Our results supported the positive effects of SOVT exercises for the aging voice, with a 6-week SPW exercise program being a clinical option. Future studies should involve long-term follow-up and additional outcome measures to better understand the efficacy of SOVT exercises, particularly SPW exercises, for the aging voice.

Download Full-text

A digital vocal tract simulator with boundary conditions at lips and glottis

Electrical Engineering in Japan ◽

10.1002/ecja.4400641104 ◽

1981 ◽

Vol 64 (11) ◽

pp. 18-26 ◽

Cited By ~ 3

Author(s):

Tetsuya Nomura ◽

Nobuhiro Miki ◽

Nobuo Nagai

Keyword(s):

Boundary Conditions ◽

Vocal Tract

Download Full-text