Anatomical structures involved in non-human vocalization

In order to understand the functional morphology of the human voice producing system, we are in need of data on the vocal tract anatomy of other mammalian species. The larynges and vocal tracts of four species of Artiodactyla were investigated in combination with acoustic analyses of their respective calls. Different evolutionary specializations of laryngeal characters may lead to similar effects on sound production. In the investigated species, such specializations are: the elongation and mass increase of the vocal folds, the volume increase of the laryngeal vestibulum by an enlarged thyroid cartilage and the formation of laryngeal ventricles. Both the elongation of the vocal folds and the increase of the oscillating masses lower the fundamental frequency. The influence of an increased volume of the laryngeal vestibulum on sound production remains unclear. The anatomical and acoustic results are presented together with considerations about the habitats and the mating systems of the respective species.

Download Full-text

Impact of the Sub-Grid Scale Turbulence Model in Aeroacoustic Simulation of Human Voice

Applied Sciences ◽

10.3390/app11041970 ◽

2021 ◽

Vol 11 (4) ◽

pp. 1970

Author(s):

Martin Lasota ◽

Petr Šidlof ◽

Manfred Kaltenbacher ◽

Stefan Schoder

Keyword(s):

Sound Propagation ◽

Vocal Tract ◽

Vocal Folds ◽

Equation Model ◽

Voice Production ◽

Human Voice ◽

Large Eddy ◽

Aeroacoustic Simulation ◽

Scale Turbulence ◽

The One

In an aeroacoustic simulation of human voice production, the effect of the sub-grid scale (SGS) model on the acoustic spectrum was investigated. In the first step, incompressible airflow in a 3D model of larynx with vocal folds undergoing prescribed two-degree-of-freedom oscillation was simulated by laminar and Large-Eddy Simulations (LES), using the One-Equation and Wall-Adaptive Local-Eddy (WALE) SGS models. Second, the aeroacoustic sources and the sound propagation in a domain composed of the larynx and vocal tract were computed by the Perturbed Convective Wave Equation (PCWE) for vowels [u:] and [i:]. The results show that the SGS model has a significant impact not only on the flow field, but also on the spectrum of the sound sampled 1 cm downstream of the lips. With the WALE model, which is known to handle the near-wall and high-shear regions more precisely, the simulations predict significantly higher peak volumetric flow rates of air than those of the One-Equation model, only slightly lower than the laminar simulation. The usage of the WALE SGS model also results in higher sound pressure levels of the higher harmonic frequencies.

Download Full-text

Design of Apparatus for Studying Aerodynamics of Voice Production

Fluids Engineering ◽

10.1115/imece2004-61822 ◽

2004 ◽

Author(s):

Michael Barry

Keyword(s):

Flow Visualization ◽

Vocal Fold ◽

Sound Production ◽

Vocal Tract ◽

Vocal Folds ◽

Flow Speed ◽

Working Fluid ◽

Voice Production ◽

Experimental Apparatus ◽

Glottal Flow

The design and testing of an experimental apparatus for in vitro study of phonatory aerodynamics (voice production) in humans is presented. The presentation includes not only the details of apparatus design, but flow visualization and Digital Particle Image Velocimetry (DPIV) measurements of the developing flow that occurs during the opening of the constriction from complete closure. The main features of the phonation process have long been understood. A proper combination of air flow from the lungs and of vocal fold tension initiates a vibration of the vocal folds, which in turn valves the airflow. The resulting periodic acceleration of the airstream through the glottis excites the acoustic modes of the vocal tract. It is further understood that the pressure gradient driving glottal flow is related to flow separation on the downstream side of the vocal folds. However, the details of this process and how it may contribute to effects such as aperiodicity of the voice and energy losses in voiced sound production are still not fully grasped. The experimental apparatus described in this paper is designed to address these issues. The apparatus itself consists of a scaled-up duct in which water flows through a constriction whose width is modulated by motion of the duct wall in a manner mimicking vocal fold vibration. Scaling the duct up 10 times and using water as the working fluid allows temporally and spatially resolved measurements of the dynamically similar flow velocity field using DPIV at video standard framing rates (15Hz). Dynamic similarity is ensured by matching the Reynolds number (based on glottal flow speed and glottis width) of 8000, and by varying the Strouhal number (based on vocal fold length, glottal flow speed, and a time scale characterizing the motion of the vocal folds) ranging from 0.01 to 0.1. The walls of the 28 cm × 28 cm test section and the vocal fold pieces are made of clear cast acrylic to allow optical access. The vocal fold pieces are 12.7 cm × 14 cm × 28 cm and are rectangular in shape, except for the surfaces which form the glottis, which are 6.35 cm radius half-circles. Dye injection slots are placed on the upstream side of both vocal field pieces to allow flow visualization. Prescribed motion of the vocal folds is provided by two linear stages. Linear bearings ensure smooth execution of the motion prescribed using a computer interface. Measurements described here use the Laser-Induced Fluorescence (LIF) flow visualization and DPIV techniques and are performed for two Strouhal numbers to assess the effect of opening time on the development of the glottal jet. These measurements are conducted on a plane oriented perpendicular to the glottis, at the duct midplane. LIF measurements use a 5W Argon ion laser to produce a light sheet, which illuminates the dye injected through a slot in each vocal fold piece. Two dye colors are used, one for each side. Quantitative information about the velocity and vorticity fields are obtained through DPIV measurements at the same location as the LIF measurements.

Download Full-text

Palatal Sound: a comprehensive model of vocal tract articulation*

Organised Sound ◽

10.1017/s1355771899002058 ◽

1999 ◽

Vol 4 (2) ◽

pp. 93-110

Author(s):

MICHAEL EDWARD EDGERTON

Keyword(s):

Sound Production ◽

Acoustic Analysis ◽

Vocal Tract ◽

Scientific Information ◽

Comprehensive Model ◽

Turbulent Structures ◽

Complete Mapping ◽

Current Trends ◽

Production Techniques ◽

Acoustic Analyses

Palatal Sound is a model of vocal tract articulation influenced by physiologic and acoustic analysis of the voice. Specifically, the term articulation refers to all movement within the vocal tract that results in open, filter-like sonorities, as well as in turbulent to absolute airflow modification. This model presents a complete mapping of place within the vocal tract that features flexibility across different vocal tract sizes and proportions. The principles behind this comprehensive mapping of acoustic and physical sound production techniques should not be foreign to those persons who create, combine, design, model or research sound. Therefore, this model might suggest avenues of sound exploration regardless of media or application. This text first presents a brief overview of the current trends of oral modification using vowels, followed by an introduction to and acoustic analyses of the comprehensive vocal tract model as applied to open-like sonorities. This model is then expanded through the presentation of other methods of open-like behaviours. Following the discussion of open sonorities, turbulent-like behaviours are discussed by first identifying the use of language-based fricatives and stops. After this (re-)exposition, the comprehensive model is applied to turbulent structures through examples and acoustic analyses. Finally, these turbulent methods are completed by additional, complementary methods of vocal tract turbulence. The intentions of this paper are: (i) to document this model clearly, (ii) to identify differences between speech and song articulatory behaviour and that of this comprehensive model with the aid of selected acoustic analyses, (iii) to suggest that this model renders valuable scientific information about the limits of vocal tract physiology, and (iv) to propose the practical use of this model by composers and performers.

Download Full-text

The impact of voice on speech realization

Journal of Education Culture and Society ◽

10.15503/jecs20142.93.101 ◽

2020 ◽

Vol 5 (2) ◽

pp. 93-101

Author(s):

Jelka Breznik

Keyword(s):

Sound Production ◽

Vocal Folds ◽

Spoken Word ◽

Communication Process ◽

Human Voice ◽

Voice Tone ◽

Identity Card ◽

Set Up ◽

The Impact ◽

The Voice

The study discusses spoken literary language and the impact of voice on speech realization. The voice consists of a sound made by a human being using the vocal folds for talking, singing, laughing, crying, screaming… The human voice is specifically the part of human sound production in which the vocal folds (vocal cords) are the primary sound source. Our voice is our instrument and identity card. How does the voice (voice tone) affect others and how do they respond, positively or negatively? How important is voice (voice tone) in communication process? The study presents how certain individuals perceive voice. The results of the research on the relationships between the spoken word, excellent speaker, voice and description / definition / identification of specific voices done by experts in the field of speech and voice as well as non-professionals are presented. The study encompasses two focus groups. One consists of amateurs (non-specialists in the field of speech or voice who have no knowledge in this field) and the other consists of professionals who work with speech or language or voice. The questions were intensified from general to specific, directly related to the topic. The purpose of such a method of questioning was to create relaxed atmosphere, promote discussion, allow participants to interact, complement, and to set up self-listening and additional comments.

Download Full-text

Vibrations of Nonlinear Elastic Structure Excited by Compressible Flow

Applied Sciences ◽

10.3390/app11114748 ◽

2021 ◽

Vol 11 (11) ◽

pp. 4748

Author(s):

Monika Balázsová ◽

Miloslav Feistauer ◽

Jaromír Horáček ◽

Adam Kosík

Keyword(s):

Compressible Flow ◽

Nonlinear Elasticity ◽

Vocal Tract ◽

Stokes Equations ◽

Vocal Folds ◽

Nonlinear Material ◽

Navier Stokes ◽

Arbitrary Lagrangian Eulerian ◽

Navier Stokes Equations ◽

Reliable Solution

This study deals with the development of an accurate, efficient and robust method for the numerical solution of the interaction of compressible flow and nonlinear dynamic elasticity. This problem requires the reliable solution of flow in time-dependent domains and the solution of deformations of elastic bodies formed by several materials with complicated geometry depending on time. In this paper, the fluid–structure interaction (FSI) problem is solved numerically by the space-time discontinuous Galerkin method (STDGM). In the case of compressible flow, we use the compressible Navier–Stokes equations formulated by the arbitrary Lagrangian–Eulerian (ALE) method. The elasticity problem uses the non-stationary formulation of the dynamic system using the St. Venant–Kirchhoff and neo-Hookean models. The STDGM for the nonlinear elasticity is tested on the Hron–Turek benchmark. The main novelty of the study is the numerical simulation of the nonlinear vocal fold vibrations excited by the compressible airflow coming from the trachea to the simplified model of the vocal tract. The computations show that the nonlinear elasticity model of the vocal folds is needed in order to obtain substantially higher accuracy of the computed vocal folds deformation than for the linear elasticity model. Moreover, the numerical simulations showed that the differences between the two considered nonlinear material models are very small.

Download Full-text

High-Speed Imaging to Study an Auto-Oscillating Vocal Fold Replica for Different Initial Conditions

International Journal of Applied Mechanics ◽

10.1142/s1758825117500648 ◽

2017 ◽

Vol 09 (05) ◽

pp. 1750064 ◽

Cited By ~ 2

Author(s):

A. Van Hirtum ◽

X. Pelorson

Keyword(s):

Vocal Fold ◽

High Speed ◽

Initial Conditions ◽

Vocal Folds ◽

High Speed Imaging ◽

Human Voice ◽

Manual Intervention ◽

Geometrical Features ◽

Upstream Pressure

Experiments on mechanical deformable vocal folds replicas are important in physical studies of human voice production to understand the underlying fluid–structure interaction. At current date, most experiments are performed for constant initial conditions with respect to structural as well as geometrical features. Varying those conditions requires manual intervention, which might affect reproducibility and hence the quality of experimental results. In this work, a setup is described which allows setting elastic and geometrical initial conditions in an automated way for a deformable vocal fold replica. High-speed imaging is integrated in the setup in order to decorrelate elastic and geometrical features. This way, reproducible, accurate and systematic measurements can be performed for prescribed initial conditions of glottal area, mean upstream pressure and vocal fold elasticity. Moreover, quantification of geometrical features during auto-oscillation is shown to contribute to the experimental characterization and understanding.

Download Full-text

Une nouvelle méthode de mesure de la fonction d'aire du conduit vocal : cas des voyelles

Canadian Journal of Physics ◽

10.1139/p05-026 ◽

2005 ◽

Vol 83 (7) ◽

pp. 721-737

Author(s):

H Teffahi ◽

B Guerin ◽

A Djeradi

Keyword(s):

Measurement Method ◽

Cross Correlation ◽

Sound Production ◽

Linear Prediction ◽

Vocal Tract ◽

Random Sequence ◽

Speech Sound ◽

Acoustic Properties ◽

External Excitation ◽

White Noise Excitation

Knowledge of vocal tract area functions is important for the understanding of phenomena occurring during speech production. We present here a new measurement method based on the external excitation of the vocal tract with a known pseudo-random sequence, where the area function is obtained by a linear prediction analysis applied to the cross-correlation between the sequence and the signal measured at the lips. The advantages of this method over methods based on sweep-tones or white noise excitation are (1) a much shorter measurement time (about 100 ms) and (2) the possibility of speech sound production during the measurement. This method has been checked against classical methods through systematic comparisons on a small corpus of vowels. Moreover, it has been verified that simultaneous speech sound production does not perturb significantly the measurements. This method should thus be a very helpful tool for the investigation of the acoustic properties of the vocal tract in various cases for vowels.

Download Full-text

Speech Emotional Features Extraction Based on Electroglottograph

Neural Computation ◽

10.1162/neco_a_00523 ◽

2013 ◽

Vol 25 (12) ◽

pp. 3294-3317 ◽

Cited By ~ 7

Author(s):

Lijiang Chen ◽

Xia Mao ◽

Pengfei Wei ◽

Angelo Compare

Keyword(s):

Emotion Recognition ◽

Speech Signal ◽

Vocal Tract ◽

Vocal Folds ◽

Distribution Coefficients ◽

Speech Emotion Recognition ◽

Support Vector ◽

Power Law Distribution ◽

Transform Coefficients ◽

Better Than

This study proposes two classes of speech emotional features extracted from electroglottography (EGG) and speech signal. The power-law distribution coefficients (PLDC) of voiced segments duration, pitch rise duration, and pitch down duration are obtained to reflect the information of vocal folds excitation. The real discrete cosine transform coefficients of the normalized spectrum of EGG and speech signal are calculated to reflect the information of vocal tract modulation. Two experiments are carried out. One is of proposed features and traditional features based on sequential forward floating search and sequential backward floating search. The other is the comparative emotion recognition based on support vector machine. The results show that proposed features are better than those commonly used in the case of speaker-independent and content-independent speech emotion recognition.

Download Full-text

Monitoring Vocal Fold Abduction through Vocal Fold Contact Area

Journal of Speech Language and Hearing Research ◽

10.1044/jshr.3103.338 ◽

1988 ◽

Vol 31 (3) ◽

pp. 338-351 ◽

Cited By ~ 106

Author(s):

Martin Rothenberg ◽

James J. Mahshie

Keyword(s):

Contact Area ◽

Time Variation ◽

Vocal Fold ◽

Thyroid Cartilage ◽

Electrical Conductance ◽

Vocal Folds ◽

Linear Phase ◽

Voice Production ◽

Voiced Speech ◽

High Pass

A number of commercial devices for measuring the transverse electrical conductance of the thyroid cartilage produce waveforms that can be useful for monitoring movements within the larynx during voice production, especially movements that are closely related to the time-variation of the contact between the vocal folds as they vibrate. This paper compares the various approaches that can be used to apply such a device, usually referred to as an electroglottograph, to the problem of monitoring the time-variation of vocal fold abduction and adduction during voiced speech. One method, in which a measure of relative vocal fold abduction is derived from the duty cycle of the linear-phase high pass filtered electroglottograph waveform, is developed in detail.

Download Full-text

Speech Emotion Analysis of Different Age Groups Using Clustering Techniques

International Journal of Information Retrieval Research ◽

10.4018/ijirr.2018010105 ◽

2018 ◽

Vol 8 (1) ◽

pp. 69-85 ◽

Cited By ~ 4

Author(s):

Hemanta Kumar Palo ◽

Mihir Narayan Mohanty ◽

Mahesh Chandra

Keyword(s):

Vocal Tract ◽

Speech Rate ◽

Age Groups ◽

Recognition Task ◽

Computation Time ◽

Vocal Folds ◽

Recognition System ◽

Clustering Techniques ◽

Fcm Algorithm ◽

Criminal Investigators

The shape, length, and size of the vocal tract and vocal folds vary with the age of the human being. The variation may be of different age or sickness or some other conditions. Arguably, the features extracted from the utterances for the recognition task may differ for different age group. It complicates further for different emotions. The recognition system demands suitable feature extraction and clustering techniques that can separate their emotional utterances. Psychologists, criminal investigators, professional counselors, law enforcement agencies and a host of other such entities may find such analysis useful. In this article, the emotion study has been evaluated for three different age groups of people using the basic age- dependent features like pitch, speech rate, and log energy. The feature sets have been clustered for different age groups by utilizing K-means and Fuzzy c-means (FCM) algorithm for the boredom, sadness, and anger states. K-means algorithm has outperformed the FCM algorithm in terms of better clustering and lower computation time as the authors' results suggest.

Download Full-text