Vowel Enhancement in Early Stage Spanish Esophageal Speech Using Natural Glottal Flow Pulse and Vocal Tract Frequency Warping

2015 ◽  
Author(s):  
Rizwan Ishaq ◽  
Dhanananjaya Gowda ◽  
Paavo Alku ◽  
Begonya Garcia Zapirain
2005 ◽  
Vol 57 (4) ◽  
pp. 223-228
Author(s):  
Robert A. Prosek ◽  
Melissa B. Koch

Author(s):  
Radhika Rani L ◽  
S. Chandra lingam ◽  
Anjaneyulu T ◽  
Satyanarayana K

Congenital Heart Defects (CHD) are the critical heart disorders that can be observed at the birth stage of the infants. These are classified mainly into two, Cyanotic and Acyanotic. The present paper concentrates on the Acyanotic heart disorders. Acyanotic heart disorder cannot be observed on external checkup, whereas bluish skin is the indication for the infant affected with Cyanotic disorder. Acyanotic heart disorder can only be diagnosed using chest X-Ray, ECG, Echocardiogram, Cardiac Catheterization and MRI of the Heart. The present work aims at estimating the fundamental frequency (pitch) and the vocal tract resonant frequencies (formants) from the cry signal of the infants. The pitch frequency and formant frequencies are estimated using frequency domain (Cepstrum) and Linear Prediction Code (LPC) methods. The results show that the fundamental frequency of the cry signal was between 600Hz-800Hz for the infants with Acyanotic heart disorders. This fundamental frequency helps in identifying Acyanotic heart disorders at an early stage.


2004 ◽  
Author(s):  
Michael Barry

The design and testing of an experimental apparatus for in vitro study of phonatory aerodynamics (voice production) in humans is presented. The presentation includes not only the details of apparatus design, but flow visualization and Digital Particle Image Velocimetry (DPIV) measurements of the developing flow that occurs during the opening of the constriction from complete closure. The main features of the phonation process have long been understood. A proper combination of air flow from the lungs and of vocal fold tension initiates a vibration of the vocal folds, which in turn valves the airflow. The resulting periodic acceleration of the airstream through the glottis excites the acoustic modes of the vocal tract. It is further understood that the pressure gradient driving glottal flow is related to flow separation on the downstream side of the vocal folds. However, the details of this process and how it may contribute to effects such as aperiodicity of the voice and energy losses in voiced sound production are still not fully grasped. The experimental apparatus described in this paper is designed to address these issues. The apparatus itself consists of a scaled-up duct in which water flows through a constriction whose width is modulated by motion of the duct wall in a manner mimicking vocal fold vibration. Scaling the duct up 10 times and using water as the working fluid allows temporally and spatially resolved measurements of the dynamically similar flow velocity field using DPIV at video standard framing rates (15Hz). Dynamic similarity is ensured by matching the Reynolds number (based on glottal flow speed and glottis width) of 8000, and by varying the Strouhal number (based on vocal fold length, glottal flow speed, and a time scale characterizing the motion of the vocal folds) ranging from 0.01 to 0.1. The walls of the 28 cm × 28 cm test section and the vocal fold pieces are made of clear cast acrylic to allow optical access. The vocal fold pieces are 12.7 cm × 14 cm × 28 cm and are rectangular in shape, except for the surfaces which form the glottis, which are 6.35 cm radius half-circles. Dye injection slots are placed on the upstream side of both vocal field pieces to allow flow visualization. Prescribed motion of the vocal folds is provided by two linear stages. Linear bearings ensure smooth execution of the motion prescribed using a computer interface. Measurements described here use the Laser-Induced Fluorescence (LIF) flow visualization and DPIV techniques and are performed for two Strouhal numbers to assess the effect of opening time on the development of the glottal jet. These measurements are conducted on a plane oriented perpendicular to the glottis, at the duct midplane. LIF measurements use a 5W Argon ion laser to produce a light sheet, which illuminates the dye injected through a slot in each vocal fold piece. Two dye colors are used, one for each side. Quantitative information about the velocity and vorticity fields are obtained through DPIV measurements at the same location as the LIF measurements.


Author(s):  
Masoud Geravanchizadeh ◽  
Elnaz Forouhandeh ◽  
Meysam Bashirpour

AbstractThe performance of speech recognition systems trained with neutral utterances degrades significantly when these systems are tested with emotional speech. Since everybody can speak emotionally in the real-world environment, it is necessary to take account of the emotional states of speech in the performance of the automatic speech recognition system. Limited works have been performed in the field of emotion-affected speech recognition and so far, most of the researches have focused on the classification of speech emotions. In this paper, the vocal tract length normalization method is employed to enhance the robustness of the emotion-affected speech recognition system. For this purpose, two structures of the speech recognition system based on hybrids of hidden Markov model with Gaussian mixture model and deep neural network are used. To achieve this goal, frequency warping is applied to the filterbank and/or discrete-cosine transform domain(s) in the feature extraction process of the automatic speech recognition system. The warping process is conducted in a way to normalize the emotional feature components and make them close to their corresponding neutral feature components. The performance of the proposed system is evaluated in neutrally trained/emotionally tested conditions for different speech features and emotional states (i.e., Anger, Disgust, Fear, Happy, and Sad). In this system, frequency warping is employed for different acoustical features. The constructed emotion-affected speech recognition system is based on the Kaldi automatic speech recognition with the Persian emotional speech database and the crowd-sourced emotional multi-modal actors dataset as the input corpora. The experimental simulations reveal that, in general, the warped emotional features result in better performance of the emotion-affected speech recognition system as compared with their unwarped counterparts. Also, it can be seen that the performance of the speech recognition using the deep neural network-hidden Markov model outperforms the system employing the hybrid with the Gaussian mixture model.


2018 ◽  
Vol 144 (3) ◽  
pp. 1767-1767
Author(s):  
Charles Farbos de Luzan ◽  
Sid M. Khosla ◽  
Liran Oren ◽  
Alexandra Maddox ◽  
Ephraim Gutmark
Keyword(s):  

2021 ◽  
Vol 15 (2) ◽  
Author(s):  
Petr Hájek ◽  
Pavel Švancara ◽  
Jaromír Horáček ◽  
Jan G. Švec

Finite-element modeling of self-sustained vocal fold oscillations during voice production has mostly considered the air as incompressible, due to numerical complexity. This study overcomes this limitation and studies the influence of air compressibility on phonatory pressures, flow and vocal fold vibratory characteristics. A two-dimensional finite-element model is used, which incorporates layered vocal fold structure, vocal fold collisions, large deformations of the vocal fold tissue, morphing the fluid mesh according to the vocal fold motion by the arbitrary Lagrangian-Eulerian approach and vocal tract model of Czech vowel [i:] based on data from magnetic resonance images. Unsteady viscous compressible or incompressible airflow is described by the Navier-Stokes equations. An explicit coupling scheme with separated solvers for structure and fluid domain was used for modeling the fluid-structure-acoustic interaction. Results of the simulations show clear differences in the glottal flow and vocal fold vibration waveforms between the incompressible and compressible fluid flow. These results provide the evidence on the existence of the coupling between the vocal tract acoustics and the glottal flow (Level 1 interactions), as well as between the vocal tract acoustics and the vocal fold vibrations (Level 2 interactions).


2016 ◽  
Vol 140 (4) ◽  
pp. 3331-3331
Author(s):  
Charles P. Farbos de Luzan ◽  
Liran Oren ◽  
Ephraim Gutmark ◽  
Sid Khosla
Keyword(s):  

2018 ◽  
Vol 143 (3) ◽  
pp. 1966-1966
Author(s):  
Alexandra Maddox ◽  
Liran Oren ◽  
Ephraim Gutmark ◽  
Charles P. Farbos de Luzan ◽  
Sid M. Khosla

2016 ◽  
Vol 44 (1) ◽  
pp. 187-191 ◽  
Author(s):  
Joe Wolfe ◽  
Derek Tze Wei Chu ◽  
Jer-Ming Chen ◽  
John Smith

Sign in / Sign up

Export Citation Format

Share Document