scholarly journals Classification of speech under stress based on modeling of the vocal folds and vocal tract

Author(s):  
Xiao Yao ◽  
Takatoshi Jitsuhiro ◽  
Chiyomi Miyajima ◽  
Norihide Kitaoka ◽  
Kazuya Takeda
Keyword(s):  

The classification of pathological voice is a hot topic that has been expected significant consideration. Voice pathology is related with a vocal folds difficulty, and for this reason, the vocal tract area which is joined to vocal folds demonstrate random patterns in case of a pathological voice. This random pattern is considered to distinguish healthy and pathological voices. It is possible to utilize transmission line theory in discovering automatic voice pathology detection by taking into consideration the vocal tract as acoustic lines. The work concentrates on developing a feature extraction for detecting and classifying vocal fold polyp by investigating different vocal tract parameters. In this paper, the vocal tract length and area are utilized for computing electrical parameters of the vocal tract. Furthermore, these electrical parameters are used for the classification of pathological voice. Finally, using electrical parameters 97.3% accuracy is obtained with SVM classifier when compared with 88.2% with the acoustic parameters, 85.3% accuracy considering physical parameters and other methods used in the past. The outcomes demonstrate that electrical parameters of the vocal tract can be utilized all the more successfully with better precision in voice pathology identification.


2021 ◽  
Vol 11 (11) ◽  
pp. 4748
Author(s):  
Monika Balázsová ◽  
Miloslav Feistauer ◽  
Jaromír Horáček ◽  
Adam Kosík

This study deals with the development of an accurate, efficient and robust method for the numerical solution of the interaction of compressible flow and nonlinear dynamic elasticity. This problem requires the reliable solution of flow in time-dependent domains and the solution of deformations of elastic bodies formed by several materials with complicated geometry depending on time. In this paper, the fluid–structure interaction (FSI) problem is solved numerically by the space-time discontinuous Galerkin method (STDGM). In the case of compressible flow, we use the compressible Navier–Stokes equations formulated by the arbitrary Lagrangian–Eulerian (ALE) method. The elasticity problem uses the non-stationary formulation of the dynamic system using the St. Venant–Kirchhoff and neo-Hookean models. The STDGM for the nonlinear elasticity is tested on the Hron–Turek benchmark. The main novelty of the study is the numerical simulation of the nonlinear vocal fold vibrations excited by the compressible airflow coming from the trachea to the simplified model of the vocal tract. The computations show that the nonlinear elasticity model of the vocal folds is needed in order to obtain substantially higher accuracy of the computed vocal folds deformation than for the linear elasticity model. Moreover, the numerical simulations showed that the differences between the two considered nonlinear material models are very small.


2021 ◽  
Vol 11 (4) ◽  
pp. 1970
Author(s):  
Martin Lasota ◽  
Petr Šidlof ◽  
Manfred Kaltenbacher ◽  
Stefan Schoder

In an aeroacoustic simulation of human voice production, the effect of the sub-grid scale (SGS) model on the acoustic spectrum was investigated. In the first step, incompressible airflow in a 3D model of larynx with vocal folds undergoing prescribed two-degree-of-freedom oscillation was simulated by laminar and Large-Eddy Simulations (LES), using the One-Equation and Wall-Adaptive Local-Eddy (WALE) SGS models. Second, the aeroacoustic sources and the sound propagation in a domain composed of the larynx and vocal tract were computed by the Perturbed Convective Wave Equation (PCWE) for vowels [u:] and [i:]. The results show that the SGS model has a significant impact not only on the flow field, but also on the spectrum of the sound sampled 1 cm downstream of the lips. With the WALE model, which is known to handle the near-wall and high-shear regions more precisely, the simulations predict significantly higher peak volumetric flow rates of air than those of the One-Equation model, only slightly lower than the laminar simulation. The usage of the WALE SGS model also results in higher sound pressure levels of the higher harmonic frequencies.


2013 ◽  
Vol 25 (12) ◽  
pp. 3294-3317 ◽  
Author(s):  
Lijiang Chen ◽  
Xia Mao ◽  
Pengfei Wei ◽  
Angelo Compare

This study proposes two classes of speech emotional features extracted from electroglottography (EGG) and speech signal. The power-law distribution coefficients (PLDC) of voiced segments duration, pitch rise duration, and pitch down duration are obtained to reflect the information of vocal folds excitation. The real discrete cosine transform coefficients of the normalized spectrum of EGG and speech signal are calculated to reflect the information of vocal tract modulation. Two experiments are carried out. One is of proposed features and traditional features based on sequential forward floating search and sequential backward floating search. The other is the comparative emotion recognition based on support vector machine. The results show that proposed features are better than those commonly used in the case of speaker-independent and content-independent speech emotion recognition.


2018 ◽  
Vol 8 (1) ◽  
pp. 69-85 ◽  
Author(s):  
Hemanta Kumar Palo ◽  
Mihir Narayan Mohanty ◽  
Mahesh Chandra

The shape, length, and size of the vocal tract and vocal folds vary with the age of the human being. The variation may be of different age or sickness or some other conditions. Arguably, the features extracted from the utterances for the recognition task may differ for different age group. It complicates further for different emotions. The recognition system demands suitable feature extraction and clustering techniques that can separate their emotional utterances. Psychologists, criminal investigators, professional counselors, law enforcement agencies and a host of other such entities may find such analysis useful. In this article, the emotion study has been evaluated for three different age groups of people using the basic age- dependent features like pitch, speech rate, and log energy. The feature sets have been clustered for different age groups by utilizing K-means and Fuzzy c-means (FCM) algorithm for the boredom, sadness, and anger states. K-means algorithm has outperformed the FCM algorithm in terms of better clustering and lower computation time as the authors' results suggest.


2019 ◽  
Vol 9 (13) ◽  
pp. 2755 ◽  
Author(s):  
Lewis Fulcher ◽  
Alexander Lodermeyer ◽  
George Kähler ◽  
Stefan Becker ◽  
Stefan Kniesburges

In voice research, analytically-based models are efficient tools to investigate the basic physical mechanisms of phonation. Calculations based on lumped element models describe the effects of the air in the vocal tract upon threshold pressure (Pth) by its inertance. The latter depends on the geometrical boundary conditions prescribed by the vocal tract length (directly) and its cross-sectional area (inversely). Using Titze’s surface wave model (SWM) to account for the properties of the vocal folds, the influence of the vocal tract inertia is examined by two sets of calculations in combination with experiments that apply silicone-based vocal folds. In the first set, a vocal tract is constructed whose cross-sectional area is adjustable from 2.7 cm2 to 11.7 cm2. In the second set, the length of the vocal tract is varied from 4.0 cm to 59.0 cm. For both sets, the pressure and frequency data are collected and compared with calculations based on the SWM. In most cases, the measurements support the calculations; hence, the model is suited to describe and predict basic mechanisms of phonation and the inertial effects caused by a vocal tract.


Author(s):  
Jesús Bernardino Alonso Hernández ◽  
Patricia Henríquez Rodríguez

It is possible to implement help systems for diagnosis oriented to the evaluation of the fonator system using speech signal, by means of techniques based on expert systems. The application of these techniques allows the early detection of alterations in the fonator system or the temporary evaluation of patients with certain treatment, to mention some examples. The procedure of measuring the voice quality of a speaker from a digital recording consists of quantifying different acoustic characteristics of speech, which makes it possible to compare it with certain reference patterns, identified previously by a “clinical expert”. A speech acoustic quality measurement based on an auditory assessment is very hard to assess as a comparative reference amongst different voices and different human experts carrying out the assessment or evaluation. In the current bibliography, some attempts have been made to obtain objective measures of speech quality by means of multidimensional clinical measurements based on auditory methods. Well-known examples are: GRBAS scale from Japon (Hirano, M.,1981) and its extension developed and applied in Europe (Dejonckere, P. H. Remacle, M. Fresnel-Elbaz, E. Woisard, V. Crevier- Buchman, L. Millet, B.,1996), a set of perceptual and acoustic characteristics in Sweden (Hammarberg, B. & Gauffin, J., 1995), a set of phonetics characteristics with added information about the excitement of the vocal tract. The aim of these (quality speech measurements) procedures is to obtain an objective measurement from a subjective evaluation. There exist different works in which objective measurements of speech quality obtained from a recording are proposed (Alonso J. B.,2006), (Boyanov, B & Hadjitodorov, S., 1997),(Hansen, J.H.L., Gavidia-Ceballos, L. & Kaiser, J.F., 1998),(Stefan Hadjitodorov & Petar Mitev, 2002),(Michaelis D.; Frohlich M. & Strube H. W. ,1998),(Boyanov B., Doskov D., Mitev P., Hadjitodorov S. & Teston B.,2000),(Godino-Llorente, J.I.; Aguilera-Navarro, S. & Gomez-Vilda, P. , 2000). In these works a voiced sustained sound (usually a vowel) is recorded and then used to compute speech quality measurements. The utilization of a voiced sustained sound is due to the fact that during the production of this kind of sound, the speech system uses almost all its mechanisms (glottal flow of constant air, vocal folds vibration in a continuous way, …), enabling us to detect any anomaly in these mechanisms. In these works different sets of measurements are suggested in order to quantify speech quality objectively. In all these works one important fact is revealed; it is necessary to obtain different measurements of the speech signal in order to compile the different aspects of acoustic characteristics of the speech signal.


Author(s):  
Byron D. Erath ◽  
Matías Zañartu ◽  
Sean D. Peterson ◽  
Michael W. Plesniak

Voiced speech is initiated as air is expelled from the lungs and passes through the vocal tract inciting self-sustained oscillations of the vocal folds. While various approaches exist for investigating both normal and pathological speech, the relative inaccessibility of the vocal folds make multi-mass speech models an attractive alternative. Their behavior has been benchmarked with excised larynx experiments, and they have been used as analysis tools for both normal and disordered speech, including investigations of paralysis, vocal tremor, and breathiness. However, during pathological speech, vocal fold motion is often unstructured, resulting in chaotic motion and a wealth of nonlinear phenomena. Unfortunately, current methodologies for multi-mass speech models are unable to replicate the nonlinear vocal fold behavior that often occurs in physiological diseased voice for realistic values of subglottal pressure.


2004 ◽  
Author(s):  
Michael Barry

The design and testing of an experimental apparatus for in vitro study of phonatory aerodynamics (voice production) in humans is presented. The presentation includes not only the details of apparatus design, but flow visualization and Digital Particle Image Velocimetry (DPIV) measurements of the developing flow that occurs during the opening of the constriction from complete closure. The main features of the phonation process have long been understood. A proper combination of air flow from the lungs and of vocal fold tension initiates a vibration of the vocal folds, which in turn valves the airflow. The resulting periodic acceleration of the airstream through the glottis excites the acoustic modes of the vocal tract. It is further understood that the pressure gradient driving glottal flow is related to flow separation on the downstream side of the vocal folds. However, the details of this process and how it may contribute to effects such as aperiodicity of the voice and energy losses in voiced sound production are still not fully grasped. The experimental apparatus described in this paper is designed to address these issues. The apparatus itself consists of a scaled-up duct in which water flows through a constriction whose width is modulated by motion of the duct wall in a manner mimicking vocal fold vibration. Scaling the duct up 10 times and using water as the working fluid allows temporally and spatially resolved measurements of the dynamically similar flow velocity field using DPIV at video standard framing rates (15Hz). Dynamic similarity is ensured by matching the Reynolds number (based on glottal flow speed and glottis width) of 8000, and by varying the Strouhal number (based on vocal fold length, glottal flow speed, and a time scale characterizing the motion of the vocal folds) ranging from 0.01 to 0.1. The walls of the 28 cm × 28 cm test section and the vocal fold pieces are made of clear cast acrylic to allow optical access. The vocal fold pieces are 12.7 cm × 14 cm × 28 cm and are rectangular in shape, except for the surfaces which form the glottis, which are 6.35 cm radius half-circles. Dye injection slots are placed on the upstream side of both vocal field pieces to allow flow visualization. Prescribed motion of the vocal folds is provided by two linear stages. Linear bearings ensure smooth execution of the motion prescribed using a computer interface. Measurements described here use the Laser-Induced Fluorescence (LIF) flow visualization and DPIV techniques and are performed for two Strouhal numbers to assess the effect of opening time on the development of the glottal jet. These measurements are conducted on a plane oriented perpendicular to the glottis, at the duct midplane. LIF measurements use a 5W Argon ion laser to produce a light sheet, which illuminates the dye injected through a slot in each vocal fold piece. Two dye colors are used, one for each side. Quantitative information about the velocity and vorticity fields are obtained through DPIV measurements at the same location as the LIF measurements.


2003 ◽  
Vol 42 (03) ◽  
pp. 271-276 ◽  
Author(s):  
T. Braunschweig ◽  
J. Lohscheller ◽  
U. Eysholdt ◽  
U. Hoppe ◽  
M. Döllinger

Summary Objectives: A central point for quantitative evaluation of pathological and healthy voices is the analysis of vocal fold oscillations. By means of digital High Speed Glottography (HGG), vocal fold oscillations can be recorded in real time. Recently, a numerical inversion procedure was developed that allows the extraction of physiological parameters from digital high speed videos and a classification of voice disorders. The aim of this work was to validate the inversion procedure and to investigate the applicability to normal voices. Methods: High speed recordings were performed during phonation within a group of five female and five male persons with normal voices. By using knowledge based image processing algorithms, motion curves of the vocal folds were extracted at three different positions (dorsal, medial, ventral). These curves were used to obtain physiological voice parameters, and in particular the degree of symmetry of the vocal folds based upon a biomechanical model of the vocal folds. Results: The highest degree of symmetry was observed for the medial motion curves. While the dor-sally and ventrally extracted motion curves exhibited similar results concerning the degree of symmetry the performance of the algorithm was less stable. Conclusions: The inversion algorithm provides reasonable results for all subjects when applied to the medial motion curves. However, for dorsal and ventral motion curves, correct performance is reduced to 85 %.


Sign in / Sign up

Export Citation Format

Share Document