Vowel Enhancement in Early Stage Spanish Esophageal Speech Using Natural Glottal Flow Pulse and Vocal Tract Frequency Warping

Congenital Heart Defects (CHD) are the critical heart disorders that can be observed at the birth stage of the infants. These are classified mainly into two, Cyanotic and Acyanotic. The present paper concentrates on the Acyanotic heart disorders. Acyanotic heart disorder cannot be observed on external checkup, whereas bluish skin is the indication for the infant affected with Cyanotic disorder. Acyanotic heart disorder can only be diagnosed using chest X-Ray, ECG, Echocardiogram, Cardiac Catheterization and MRI of the Heart. The present work aims at estimating the fundamental frequency (pitch) and the vocal tract resonant frequencies (formants) from the cry signal of the infants. The pitch frequency and formant frequencies are estimated using frequency domain (Cepstrum) and Linear Prediction Code (LPC) methods. The results show that the fundamental frequency of the cry signal was between 600Hz-800Hz for the infants with Acyanotic heart disorders. This fundamental frequency helps in identifying Acyanotic heart disorders at an early stage.

Download Full-text

Design of Apparatus for Studying Aerodynamics of Voice Production

Fluids Engineering ◽

10.1115/imece2004-61822 ◽

2004 ◽

Author(s):

Michael Barry

Keyword(s):

Flow Visualization ◽

Vocal Fold ◽

Sound Production ◽

Vocal Tract ◽

Vocal Folds ◽

Flow Speed ◽

Working Fluid ◽

Voice Production ◽

Experimental Apparatus ◽

Glottal Flow

The design and testing of an experimental apparatus for in vitro study of phonatory aerodynamics (voice production) in humans is presented. The presentation includes not only the details of apparatus design, but flow visualization and Digital Particle Image Velocimetry (DPIV) measurements of the developing flow that occurs during the opening of the constriction from complete closure. The main features of the phonation process have long been understood. A proper combination of air flow from the lungs and of vocal fold tension initiates a vibration of the vocal folds, which in turn valves the airflow. The resulting periodic acceleration of the airstream through the glottis excites the acoustic modes of the vocal tract. It is further understood that the pressure gradient driving glottal flow is related to flow separation on the downstream side of the vocal folds. However, the details of this process and how it may contribute to effects such as aperiodicity of the voice and energy losses in voiced sound production are still not fully grasped. The experimental apparatus described in this paper is designed to address these issues. The apparatus itself consists of a scaled-up duct in which water flows through a constriction whose width is modulated by motion of the duct wall in a manner mimicking vocal fold vibration. Scaling the duct up 10 times and using water as the working fluid allows temporally and spatially resolved measurements of the dynamically similar flow velocity field using DPIV at video standard framing rates (15Hz). Dynamic similarity is ensured by matching the Reynolds number (based on glottal flow speed and glottis width) of 8000, and by varying the Strouhal number (based on vocal fold length, glottal flow speed, and a time scale characterizing the motion of the vocal folds) ranging from 0.01 to 0.1. The walls of the 28 cm × 28 cm test section and the vocal fold pieces are made of clear cast acrylic to allow optical access. The vocal fold pieces are 12.7 cm × 14 cm × 28 cm and are rectangular in shape, except for the surfaces which form the glottis, which are 6.35 cm radius half-circles. Dye injection slots are placed on the upstream side of both vocal field pieces to allow flow visualization. Prescribed motion of the vocal folds is provided by two linear stages. Linear bearings ensure smooth execution of the motion prescribed using a computer interface. Measurements described here use the Laser-Induced Fluorescence (LIF) flow visualization and DPIV techniques and are performed for two Strouhal numbers to assess the effect of opening time on the development of the glottal jet. These measurements are conducted on a plane oriented perpendicular to the glottis, at the duct midplane. LIF measurements use a 5W Argon ion laser to produce a light sheet, which illuminates the dye injected through a slot in each vocal fold piece. Two dye colors are used, one for each side. Quantitative information about the velocity and vorticity fields are obtained through DPIV measurements at the same location as the LIF measurements.

Download Full-text

Production of child-like vowels with nonlinear interaction of glottal flow and vocal tract resonances

The Journal of the Acoustical Society of America ◽

10.1121/1.4806754 ◽

2013 ◽

Vol 133 (5) ◽

pp. 3617-3617

Author(s):

Brad H. Story ◽

Kate Bunton

Keyword(s):

Nonlinear Interaction ◽

Vocal Tract ◽

Glottal Flow

Download Full-text

Feature compensation based on the normalization of vocal tract length for the improvement of emotion-affected speech recognition

EURASIP Journal on Audio Speech and Music Processing ◽

10.1186/s13636-021-00216-5 ◽

2021 ◽

Vol 2021 (1) ◽

Author(s):

Masoud Geravanchizadeh ◽

Elnaz Forouhandeh ◽

Meysam Bashirpour

Keyword(s):

Speech Recognition ◽

Automatic Speech Recognition ◽

Vocal Tract ◽

Gaussian Mixture ◽

Recognition System ◽

Speech Recognition System ◽

Emotional States ◽

Emotional Speech ◽

Automatic Speech Recognition System ◽

Frequency Warping

AbstractThe performance of speech recognition systems trained with neutral utterances degrades significantly when these systems are tested with emotional speech. Since everybody can speak emotionally in the real-world environment, it is necessary to take account of the emotional states of speech in the performance of the automatic speech recognition system. Limited works have been performed in the field of emotion-affected speech recognition and so far, most of the researches have focused on the classification of speech emotions. In this paper, the vocal tract length normalization method is employed to enhance the robustness of the emotion-affected speech recognition system. For this purpose, two structures of the speech recognition system based on hybrids of hidden Markov model with Gaussian mixture model and deep neural network are used. To achieve this goal, frequency warping is applied to the filterbank and/or discrete-cosine transform domain(s) in the feature extraction process of the automatic speech recognition system. The warping process is conducted in a way to normalize the emotional feature components and make them close to their corresponding neutral feature components. The performance of the proposed system is evaluated in neutrally trained/emotionally tested conditions for different speech features and emotional states (i.e., Anger, Disgust, Fear, Happy, and Sad). In this system, frequency warping is employed for different acoustical features. The constructed emotion-affected speech recognition system is based on the Kaldi automatic speech recognition with the Persian emotional speech database and the crowd-sourced emotional multi-modal actors dataset as the input corpora. The experimental simulations reveal that, in general, the warped emotional features result in better performance of the emotion-affected speech recognition system as compared with their unwarped counterparts. Also, it can be seen that the performance of the speech recognition using the deep neural network-hidden Markov model outperforms the system employing the hybrid with the Gaussian mixture model.

Download Full-text

Relationship between intraglottal geometry, vocal tract constriction, and glottal flow during phonation of a canine larynx

The Journal of the Acoustical Society of America ◽

10.1121/1.5067817 ◽

2018 ◽

Vol 144 (3) ◽

pp. 1767-1767

Author(s):

Charles Farbos de Luzan ◽

Sid M. Khosla ◽

Liran Oren ◽

Alexandra Maddox ◽

Ephraim Gutmark

Keyword(s):

Vocal Tract ◽

Glottal Flow

Download Full-text

Finite-element modeling of vocal fold self-oscillations in interaction with vocal tract: Comparison of incompressible and compressible flow model

Applied and Computational Mechanics ◽

10.24132/acm.2021.672 ◽

2021 ◽

Vol 15 (2) ◽

Author(s):

Petr Hájek ◽

Pavel Švancara ◽

Jaromír Horáček ◽

Jan G. Švec

Keyword(s):

Finite Element ◽

Finite Element Modeling ◽

Vocal Fold ◽

Vocal Tract ◽

Stokes Equations ◽

Magnetic Resonance Images ◽

Coupling Scheme ◽

Arbitrary Lagrangian Eulerian ◽

Element Modeling ◽

Glottal Flow

Finite-element modeling of self-sustained vocal fold oscillations during voice production has mostly considered the air as incompressible, due to numerical complexity. This study overcomes this limitation and studies the influence of air compressibility on phonatory pressures, flow and vocal fold vibratory characteristics. A two-dimensional finite-element model is used, which incorporates layered vocal fold structure, vocal fold collisions, large deformations of the vocal fold tissue, morphing the fluid mesh according to the vocal fold motion by the arbitrary Lagrangian-Eulerian approach and vocal tract model of Czech vowel [i:] based on data from magnetic resonance images. Unsteady viscous compressible or incompressible airflow is described by the Navier-Stokes equations. An explicit coupling scheme with separated solvers for structure and fluid domain was used for modeling the fluid-structure-acoustic interaction. Results of the simulations show clear differences in the glottal flow and vocal fold vibration waveforms between the incompressible and compressible fluid flow. These results provide the evidence on the existence of the coupling between the vocal tract acoustics and the glottal flow (Level 1 interactions), as well as between the vocal tract acoustics and the vocal fold vibrations (Level 2 interactions).

Download Full-text