Measurements of the aero-acoustic properties of the vocal folds and vocal tract by broad and narrow band probes during phonation into controlled acoustic loads

Emotions are age, gender, culture, speaker, and situationally dependent. Due to an underdeveloped vocal tract or the vocal folds of children and a weak or aged speech production mechanism of older adults, the acoustic properties differ with the age of a person. In this sense, the features describing the age and emotionally relevant information of human voice also differ. This motivates the authors to investigate a number of issues related to database collection, feature extraction, and clustering algorithms for effective characterization and identification of human age of his or her paralanguage information. The prosodic features such as the speech rate, pitch, log energy, and spectral parameters have been explored to characterize the chosen emotional utterances whereas the efficient K-means and Fuzzy C-means clustering algorithms have been used to partition age-related emotional features for a better understanding of the related issues.

Download Full-text

Vibrations of Nonlinear Elastic Structure Excited by Compressible Flow

Applied Sciences ◽

10.3390/app11114748 ◽

2021 ◽

Vol 11 (11) ◽

pp. 4748

Author(s):

Monika Balázsová ◽

Miloslav Feistauer ◽

Jaromír Horáček ◽

Adam Kosík

Keyword(s):

Compressible Flow ◽

Nonlinear Elasticity ◽

Vocal Tract ◽

Stokes Equations ◽

Vocal Folds ◽

Nonlinear Material ◽

Navier Stokes ◽

Arbitrary Lagrangian Eulerian ◽

Navier Stokes Equations ◽

Reliable Solution

This study deals with the development of an accurate, efficient and robust method for the numerical solution of the interaction of compressible flow and nonlinear dynamic elasticity. This problem requires the reliable solution of flow in time-dependent domains and the solution of deformations of elastic bodies formed by several materials with complicated geometry depending on time. In this paper, the fluid–structure interaction (FSI) problem is solved numerically by the space-time discontinuous Galerkin method (STDGM). In the case of compressible flow, we use the compressible Navier–Stokes equations formulated by the arbitrary Lagrangian–Eulerian (ALE) method. The elasticity problem uses the non-stationary formulation of the dynamic system using the St. Venant–Kirchhoff and neo-Hookean models. The STDGM for the nonlinear elasticity is tested on the Hron–Turek benchmark. The main novelty of the study is the numerical simulation of the nonlinear vocal fold vibrations excited by the compressible airflow coming from the trachea to the simplified model of the vocal tract. The computations show that the nonlinear elasticity model of the vocal folds is needed in order to obtain substantially higher accuracy of the computed vocal folds deformation than for the linear elasticity model. Moreover, the numerical simulations showed that the differences between the two considered nonlinear material models are very small.

Download Full-text

Impact of the Sub-Grid Scale Turbulence Model in Aeroacoustic Simulation of Human Voice

Applied Sciences ◽

10.3390/app11041970 ◽

2021 ◽

Vol 11 (4) ◽

pp. 1970

Author(s):

Martin Lasota ◽

Petr Šidlof ◽

Manfred Kaltenbacher ◽

Stefan Schoder

Keyword(s):

Sound Propagation ◽

Vocal Tract ◽

Vocal Folds ◽

Equation Model ◽

Voice Production ◽

Human Voice ◽

Large Eddy ◽

Aeroacoustic Simulation ◽

Scale Turbulence ◽

The One

In an aeroacoustic simulation of human voice production, the effect of the sub-grid scale (SGS) model on the acoustic spectrum was investigated. In the first step, incompressible airflow in a 3D model of larynx with vocal folds undergoing prescribed two-degree-of-freedom oscillation was simulated by laminar and Large-Eddy Simulations (LES), using the One-Equation and Wall-Adaptive Local-Eddy (WALE) SGS models. Second, the aeroacoustic sources and the sound propagation in a domain composed of the larynx and vocal tract were computed by the Perturbed Convective Wave Equation (PCWE) for vowels [u:] and [i:]. The results show that the SGS model has a significant impact not only on the flow field, but also on the spectrum of the sound sampled 1 cm downstream of the lips. With the WALE model, which is known to handle the near-wall and high-shear regions more precisely, the simulations predict significantly higher peak volumetric flow rates of air than those of the One-Equation model, only slightly lower than the laminar simulation. The usage of the WALE SGS model also results in higher sound pressure levels of the higher harmonic frequencies.

Download Full-text

Une nouvelle méthode de mesure de la fonction d'aire du conduit vocal : cas des voyelles

Canadian Journal of Physics ◽

10.1139/p05-026 ◽

2005 ◽

Vol 83 (7) ◽

pp. 721-737

Author(s):

H Teffahi ◽

B Guerin ◽

A Djeradi

Keyword(s):

Measurement Method ◽

Cross Correlation ◽

Sound Production ◽

Linear Prediction ◽

Vocal Tract ◽

Random Sequence ◽

Speech Sound ◽

Acoustic Properties ◽

External Excitation ◽

White Noise Excitation

Knowledge of vocal tract area functions is important for the understanding of phenomena occurring during speech production. We present here a new measurement method based on the external excitation of the vocal tract with a known pseudo-random sequence, where the area function is obtained by a linear prediction analysis applied to the cross-correlation between the sequence and the signal measured at the lips. The advantages of this method over methods based on sweep-tones or white noise excitation are (1) a much shorter measurement time (about 100 ms) and (2) the possibility of speech sound production during the measurement. This method has been checked against classical methods through systematic comparisons on a small corpus of vowels. Moreover, it has been verified that simultaneous speech sound production does not perturb significantly the measurements. This method should thus be a very helpful tool for the investigation of the acoustic properties of the vocal tract in various cases for vowels.

Download Full-text

Speech Emotional Features Extraction Based on Electroglottograph

Neural Computation ◽

10.1162/neco_a_00523 ◽

2013 ◽

Vol 25 (12) ◽

pp. 3294-3317 ◽

Cited By ~ 7

Author(s):

Lijiang Chen ◽

Xia Mao ◽

Pengfei Wei ◽

Angelo Compare

Keyword(s):

Emotion Recognition ◽

Speech Signal ◽

Vocal Tract ◽

Vocal Folds ◽

Distribution Coefficients ◽

Speech Emotion Recognition ◽

Support Vector ◽

Power Law Distribution ◽

Transform Coefficients ◽

Better Than

This study proposes two classes of speech emotional features extracted from electroglottography (EGG) and speech signal. The power-law distribution coefficients (PLDC) of voiced segments duration, pitch rise duration, and pitch down duration are obtained to reflect the information of vocal folds excitation. The real discrete cosine transform coefficients of the normalized spectrum of EGG and speech signal are calculated to reflect the information of vocal tract modulation. Two experiments are carried out. One is of proposed features and traditional features based on sequential forward floating search and sequential backward floating search. The other is the comparative emotion recognition based on support vector machine. The results show that proposed features are better than those commonly used in the case of speaker-independent and content-independent speech emotion recognition.

Download Full-text

Kinematics of birdsong: functional correlation of cranial movements and acoustic features in sparrows

Journal of Experimental Biology ◽

10.1242/jeb.182.1.147 ◽

1993 ◽

Vol 182 (1) ◽

pp. 147-171 ◽

Cited By ~ 10

Author(s):

M. W. Westneat ◽

J. H. Long ◽

W. Hoese ◽

S. Nowicki

Keyword(s):

Vocal Tract ◽

Active Role ◽

Acoustic Properties ◽

Zonotrichia Albicollis ◽

Low Frequencies ◽

Acoustic Frequency ◽

Mean Frequency ◽

Singing Behavior ◽

Resonance Properties ◽

Melospiza Georgiana

The movements of the head and beak of songbirds may play a functional role in vocal production by influencing the acoustic properties of songs. We investigated this possibility by synchronously measuring the acoustic frequency and amplitude and the kinematics (beak gape and head angle) of singing behavior in the white-throated sparrow (Zonotrichia albicollis) and the swamp sparrow (Melospiza georgiana). These birds are closely related emberizine sparrows, but their songs differ radically in frequency and amplitude structure. We found that the acoustic frequencies of notes in a song have a consistent, positive correlation with beak gape in both species. Beak gape increased significantly with increasing frequency during the first two notes in Z. albicollis song, with a mean frequency for note 1 of 3 kHz corresponding to a gape of 0.4 cm (a 15 degrees gape angle) and a mean frequency for note 2 of 4 kHz corresponding to a gape of 0.7 cm (a 30 degrees gape angle). The relationship between gape and frequency for the upswept third note in Z. albicollis also was significant. In M. georgiana, low frequencies of 3 kHz corresponding to beak gapes of 0.2-0.3 cm (a 10–15 degrees break angle), whereas frequencies of 7–8 kHz were associated with flaring of the beak to over 1 cm (a beak angle greater than 50 degrees). Beak gape and song amplitude are poorly correlated in both species. We conclude that cranial kinematics, particularly beak movements, influence the resonance properties of the vocal tract by varying its physical dimensions and thus play an active role in the production of birdsong.

Download Full-text

Speech Emotion Analysis of Different Age Groups Using Clustering Techniques

International Journal of Information Retrieval Research ◽

10.4018/ijirr.2018010105 ◽

2018 ◽

Vol 8 (1) ◽

pp. 69-85 ◽

Cited By ~ 4

Author(s):

Hemanta Kumar Palo ◽

Mihir Narayan Mohanty ◽

Mahesh Chandra

Keyword(s):

Vocal Tract ◽

Speech Rate ◽

Age Groups ◽

Recognition Task ◽

Computation Time ◽

Vocal Folds ◽

Recognition System ◽

Clustering Techniques ◽

Fcm Algorithm ◽

Criminal Investigators

The shape, length, and size of the vocal tract and vocal folds vary with the age of the human being. The variation may be of different age or sickness or some other conditions. Arguably, the features extracted from the utterances for the recognition task may differ for different age group. It complicates further for different emotions. The recognition system demands suitable feature extraction and clustering techniques that can separate their emotional utterances. Psychologists, criminal investigators, professional counselors, law enforcement agencies and a host of other such entities may find such analysis useful. In this article, the emotion study has been evaluated for three different age groups of people using the basic age- dependent features like pitch, speech rate, and log energy. The feature sets have been clustered for different age groups by utilizing K-means and Fuzzy c-means (FCM) algorithm for the boredom, sadness, and anger states. K-means algorithm has outperformed the FCM algorithm in terms of better clustering and lower computation time as the authors' results suggest.

Download Full-text

Naturally Nasal-esophageal Fiberscope in COVID-19 Pandemic - Prevent Sneezing Without Anesthesia

10.21203/rs.3.rs-360636/v1 ◽

2021 ◽

Author(s):

Koichi Tsunoda ◽

Ko Hentona ◽

Yamanobe Yoshiharu

Keyword(s):

Narrow Band ◽

Narrow Band Imaging ◽

Pathological Condition ◽

Vocal Folds ◽

Pyriform Sinus ◽

Natural Setting ◽

Sitting Position ◽

Case Presentation ◽

Air Supply ◽

Nasal Bleeding

Abstract Background: We are laryngologists, to observe natural phonatory and swallowing functions, in every clinical examination with trans-nasal laryngeal fiberscope (TNLF), before the observation, we use epinephrine to enlarge and smoothen inside common nasal meatus (bottom of nostril), then insert wet swab inside the nose, like a swab culture in nasopharynx. In particular current COVID-19 pandemic situation, this careful technique prevents any complications even nasal bleeding, painfulness, and inducing sneezing. Here we introduce our routine to observe esophageal movement in swallowing in natural setting (sitting position) without anesthesia.Case presentation: A case was 70-year-old female who complained something stuck esophagus or strange sensation below the larynx and pharynx. After enlarge and smoothen inside common nasal meatus we insert the TNLF (slim type ⌀29mm fiberscope, VNL8-J10, PENTAX Medical, Tokyo, Japan.) in a same way. Then observe the phonatory and swallowing movement of vocal folds. To get natural movements we had never used any anesthesia. There was no pathological condition in the pyriform sinus, we asked a patient to swallow the fiberscope. At that timing we push the TNLF and insert the tip a bit deeper simultaneously with swallowing, which make the fiberscope easily enter the esophagus like the insertion of nasogastric tube. Then asked the patient to swallow sip of water or saliva, the lumen of esophagus cleared and enlarged. This makes to observe esophagus easily without any air supply. The esophagus is completely normal except glycogenic acanthosis with tone enhancement scan. Conclusions: The advance point of this examination is easily able to perform in sitting position without anesthesia, also takes only a minute and minimum invasive to observe the physiologically natural swallowing. It is also possible without anesthesia until esophagogastric junction using with a thin type flexible bronchoscopy. In the future, diameter of gastric fiberscope even with narrow band imaging (NBI) function might be gradually getting thinner. Before that time every physician should know this technique. Just insert along the bottom of nose.

Download Full-text

Classification of speech under stress based on modeling of the vocal folds and vocal tract

EURASIP Journal on Audio Speech and Music Processing ◽

10.1186/1687-4722-2013-17 ◽

2013 ◽

Vol 2013 (1) ◽

Cited By ~ 6

Author(s):

Xiao Yao ◽

Takatoshi Jitsuhiro ◽

Chiyomi Miyajima ◽

Norihide Kitaoka ◽

Kazuya Takeda

Keyword(s):

Vocal Tract ◽

Vocal Folds

Download Full-text