scholarly journals Rethinking glottal midline detection

2020 ◽  
Author(s):  
Andreas M. Kist ◽  
Julian Zilker ◽  
Pablo Gómez ◽  
Anne Schützenberger ◽  
Michael Döllinger

A healthy voice is crucial for verbal communication and hence in daily as well as professional life. The basis for a healthy voice are the sound producing vocal folds in the larynx. A hallmark of healthy vocal fold oscillation is the symmetric motion of the left and right vocal fold. Clinically, videoendoscopy is applied to assess the symmetry of the oscillation and evaluated subjectively. High-speed videoendoscopy, an emerging method that allows quantification of the vocal fold oscillation, is more commonly employed in research due to the amount of data and the complex, semi-automatic analysis. In this study, we provide a comprehensive evaluation of methods that detect fully automatically the glottal midline. We use a biophysical model to simulate different vocal fold oscillations, extended the openly available BAGLS dataset using manual annotations, utilized both, simulations and annotated endoscopic images, to train deep neural networks at different stages of the analysis workflow, and compared these to established computer vision algorithms. We found that classical computer vision perform well on detecting the glottal midline in glottis segmentation data, but are outper-formed by deep neural networks on this task. We further suggest GlottisNet, a multi-task neural architecture featuring the simultaneous prediction of both, the opening between the vocal folds and the symmetry axis, leading to a huge step forward towards clinical applicability of quantitative, deep learning-assisted laryngeal endoscopy, by fully automating segmentation and midline detection.

2020 ◽  
Vol 10 (1) ◽  
Author(s):  
Andreas M. Kist ◽  
Julian Zilker ◽  
Pablo Gómez ◽  
Anne Schützenberger ◽  
Michael Döllinger

AbstractA healthy voice is crucial for verbal communication and hence in daily as well as professional life. The basis for a healthy voice are the sound producing vocal folds in the larynx. A hallmark of healthy vocal fold oscillation is the symmetric motion of the left and right vocal fold. Clinically, videoendoscopy is applied to assess the symmetry of the oscillation and evaluated subjectively. High-speed videoendoscopy, an emerging method that allows quantification of the vocal fold oscillation, is more commonly employed in research due to the amount of data and the complex, semi-automatic analysis. In this study, we provide a comprehensive evaluation of methods that detect fully automatically the glottal midline. We used a biophysical model to simulate different vocal fold oscillations, extended the openly available BAGLS dataset using manual annotations, utilized both, simulations and annotated endoscopic images, to train deep neural networks at different stages of the analysis workflow, and compared these to established computer vision algorithms. We found that classical computer vision perform well on detecting the glottal midline in glottis segmentation data, but are outperformed by deep neural networks on this task. We further suggest GlottisNet, a multi-task neural architecture featuring the simultaneous prediction of both, the opening between the vocal folds and the symmetry axis, leading to a huge step forward towards clinical applicability of quantitative, deep learning-assisted laryngeal endoscopy, by fully automating segmentation and midline detection.


2017 ◽  
Vol 09 (05) ◽  
pp. 1750064 ◽  
Author(s):  
A. Van Hirtum ◽  
X. Pelorson

Experiments on mechanical deformable vocal folds replicas are important in physical studies of human voice production to understand the underlying fluid–structure interaction. At current date, most experiments are performed for constant initial conditions with respect to structural as well as geometrical features. Varying those conditions requires manual intervention, which might affect reproducibility and hence the quality of experimental results. In this work, a setup is described which allows setting elastic and geometrical initial conditions in an automated way for a deformable vocal fold replica. High-speed imaging is integrated in the setup in order to decorrelate elastic and geometrical features. This way, reproducible, accurate and systematic measurements can be performed for prescribed initial conditions of glottal area, mean upstream pressure and vocal fold elasticity. Moreover, quantification of geometrical features during auto-oscillation is shown to contribute to the experimental characterization and understanding.


2021 ◽  
Vol 7 (1) ◽  
Author(s):  
Rama K. Vasudevan ◽  
Maxim Ziatdinov ◽  
Lukas Vlcek ◽  
Sergei V. Kalinin

AbstractDeep neural networks (‘deep learning’) have emerged as a technology of choice to tackle problems in speech recognition, computer vision, finance, etc. However, adoption of deep learning in physical domains brings substantial challenges stemming from the correlative nature of deep learning methods compared to the causal, hypothesis driven nature of modern science. We argue that the broad adoption of Bayesian methods incorporating prior knowledge, development of solutions with incorporated physical constraints and parsimonious structural descriptors and generative models, and ultimately adoption of causal models, offers a path forward for fundamental and applied research.


Author(s):  
Andreas M. Kist ◽  
Pablo Gómez ◽  
Denis Dubrovskiy ◽  
Patrick Schlegel ◽  
Melda Kunduk ◽  
...  

Purpose High-speed videoendoscopy (HSV) is an emerging, but barely used, endoscopy technique in the clinic to assess and diagnose voice disorders because of the lack of dedicated software to analyze the data. HSV allows to quantify the vocal fold oscillations by segmenting the glottal area. This challenging task has been tackled by various studies; however, the proposed approaches are mostly limited and not suitable for daily clinical routine. Method We developed a user-friendly software in C# that allows the editing, motion correction, segmentation, and quantitative analysis of HSV data. We further provide pretrained deep neural networks for fully automatic glottis segmentation. Results We freely provide our software Glottis Analysis Tools (GAT). Using GAT, we provide a general threshold-based region growing platform that enables the user to analyze data from various sources, such as in vivo recordings, ex vivo recordings, and high-speed footage of artificial vocal folds. Additionally, especially for in vivo recordings, we provide three robust neural networks at various speed and quality settings to allow a fully automatic glottis segmentation needed for application by untrained personnel. GAT further evaluates video and audio data in parallel and is able to extract various features from the video data, among others the glottal area waveform, that is, the changing glottal area over time. In total, GAT provides 79 unique quantitative analysis parameters for video- and audio-based signals. Many of these parameters have already been shown to reflect voice disorders, highlighting the clinical importance and usefulness of the GAT software. Conclusion GAT is a unique tool to process HSV and audio data to determine quantitative, clinically relevant parameters for research, diagnosis, and treatment of laryngeal disorders. Supplemental Material https://doi.org/10.23641/asha.14575533


2022 ◽  
pp. 25-52
Author(s):  
Abhinav Goel ◽  
Caleb Tung ◽  
Xiao Hu ◽  
Haobo Wang ◽  
Yung-Hsiang Lu ◽  
...  

Author(s):  
S. Mantha ◽  
L. Mongeau ◽  
T. Siegmund

An experimental study of the vibratory deformation of the human vocal folds was conducted. Experiments were performed using model vocal folds [1, 2], Fig. 1, made of silicone rubber implemented into an air supply system, Fig. 2. The material used to cast the model is an isotropic homogeneous material, [3] with a tangent modulus E=5 kPa at ε = 0, i.e. elastic properties similar to those of the human vocal fold cover [4]. The advantages of the use of model larynx systems over the use of excised larynges include easy accessibility to fundamental studies of the vocal fold vibration without invasive testing. Acoustic analysis of voice or electroglottography provide certain insight into voice production processes but optical techniques for the study of vocal fold vibrations have drawn considerable attention. Videoendoscopy, stroboscopy, high-speed photography, and kymography have shown to provide a visual impression of vocal fold dynamics but are limited in providing insight into the fundamental deformation processes of the vocal folds. Quantitative measures of deformation have been conducted through micro-suture techniques but are invasive and allows for measurements of only view image points. Laser triangulation is non-invasive but is limited to only one local measurement point. Here, digital image correlation technique with the software VIC 3D [5] is applied. For the experimental set-up see Fig. 2. The analysis consists of (1) stereo correlation to obtain in-plane displacements and (2) stereo triangulation step to obtain out-of-plane deformation. For the stereo correlation images of the object at two different stages of deformation are compared. A point in the image of the undeformed object is matched with the corresponding point in the deformed stage. “Subsets” of digital images are traced via their gray value distribution from the undeformed reference image to the deformed image. The uniqueness of the matching is enabled by the creation of a speckle pattern on the object’s surface. Here, a white pigment is mixed into the silicone rubber and subsequently black enamel paint is sprayed onto the superior surface of the vocal folds. The stereo triangulation requires two images of the object at each stage of deformation. These are obtained in a single CCD frame by placing a beam splitter in the optical axis between camera and object. These images provide a “left” and “right” view of the model larynx. Thus, the deformed shape of the vocal folds can be obtained. The method allows for noninvasive measurement of the full-field displacement fields. Images of the superior surface of the model larynx are obtained by the use of a high speed digital camera with a frame rate of 3000 frames per second allowing for more than 30 image frames for each vibration cycle. For the 3D digital image correlation analysis two images of the object are obtained for each time instance as a beam splitter is placed in the optical axis between the camera and the model larynx. Phonation frequencies and onset pressure are given in Fig. 3, showing that the model larynx behavior is close to actual physiological data. Figs 4(a) and (b) provide superior views of the model larynx at maximum glottal opening and at glottal closure, respectively. As one example of measured strain fields, Figs 5(a) and (b) depict the distributions of the transverse strain component, on the glottal surface in a contour plot on the deformed superior surface. The knowledge of the distribution of this strain component is relevant to the assessment of the impact of vocal fold collision on potential tissue damage. In the position of maximum opening the vocal folds are deformed by a combination of a bulging-type deformation and the opening movement. At this time instance, the transverse strains at the medial surface are found to be negative, an indication of Poisson’s deformation. During the closing stage, vocal folds collide and simultaneously a mode 3 vibration pattern emerges. Closure of the glottal opening is not complete and two incomplete closure areas are formed during the closure stage. These open areas are located at the anterior and posterior ends of the model larynx, see Fig. 4(b). The finding of this type of incomplete closure is agreement with both actual glottal measurements [6] and 3D finite element simulations of [7]. Transverse strains during that stage are now positive and considerably larger that during the opening stage. Finally, Fig. 6 depicts the time evolution of the out of plane displacements along the medial surface for the closing phase and Fig. 7 depicts the maximum values of the longitudinal strain (at the coronal section of the medial surface) in dependence of the flow rate. These examples of measurements indicate that the DIC method is promising for studies of vocal fold dynamics.


1990 ◽  
Vol 33 (2) ◽  
pp. 245-254 ◽  
Author(s):  
D. G. Childers ◽  
D. M. Hicks ◽  
G. P. Moore ◽  
L. Eskenazi ◽  
A. L. Lalwani

The electroglottogram (EGG) is known to be related to vocal fold motion. A major hypothesis undergoing examination in several research centers is that the EGG is related to the area of contact of the vocal folds. This hypothesis is difficult to substantiate with direct measurements using human subjects. However, other supporting evidence can be offered. For this study we made measurements from synchronized ultra high-speed laryngeal films and from EGG waveforms collected from subjects with normal larynges and patients with vocal disorders. We compare certain features of the EGG waveform to (a) the instant of the opening of the glottis, (b) the instant of the closing of the glottis, and (c) the instant of the maximum opening of the glottis. In addition, we compare both the open quotient and the relative average perturbation measured from the glottal area to that estimated from the EGG. All of these comparisons indicate that vocal fold vibratory characteristics are reflected by features of the EGG waveform. This makes the EGG useful for speech analysis and synthesis as well as for modeling laryngeal behavior. The limitations of the EGG are discussed.


2017 ◽  
Vol 71 (4) ◽  
pp. 19-25 ◽  
Author(s):  
Bożena Kosztyła-Hojna ◽  
Diana Moskal ◽  
Anna Kuryliszyn-Moskal ◽  
Anna Andrzejewska ◽  
Anna Łobaczuk-Sitnik ◽  
...  

Introduction. The aim of the study is the evaluation of the usefulness of High-Speed Digital Imaging (HSDI) in the diagnosis of organic dysphonia in a form of oedematous-hypertrophic changes of vocal fold mucosa, morphologically confirmed by Transmission Electron Microscopy (TEM) method in patients working with voice occupationally. Material and methods. The group consisted of 30 patients working with voice occupationally with oedematous-hypertrophic changes of vocal fold mucosa. Parameters of vocal folds vibrations were evaluated using HSDI technique with a digital HS camera, HRES Endocam Richard Wolf GmbH. The image of vocal folds was recorded with a rate of 4000 frames per second. Postoperative material of the larynx was prepared in a routine way and observed in transmission electron microscope OPTON 900–PC. Results. HSDI technique allows to assess the real vibrations of vocal folds and determine many parameters. The results of TEM in the postoperative material showed destruction of epithelial cells with severe vacuolar degeneration, the enlargement of intercellular spaces and a large number of blood vessels in the stroma, which indicates the presence of oedematous-hypertrophic changes of the larynx. Discussion. The ultrastructural assessment confirm the particular usefulness of HSDI method in the diagnosis of organic dysphonia in a form of oedematous-hypertrophic changes. Key words: High-Speed Digital Imaging, oedematous-hypertrophic changes, vocal fold mucosa, larynx


2003 ◽  
Vol 42 (03) ◽  
pp. 271-276 ◽  
Author(s):  
T. Braunschweig ◽  
J. Lohscheller ◽  
U. Eysholdt ◽  
U. Hoppe ◽  
M. Döllinger

Summary Objectives: A central point for quantitative evaluation of pathological and healthy voices is the analysis of vocal fold oscillations. By means of digital High Speed Glottography (HGG), vocal fold oscillations can be recorded in real time. Recently, a numerical inversion procedure was developed that allows the extraction of physiological parameters from digital high speed videos and a classification of voice disorders. The aim of this work was to validate the inversion procedure and to investigate the applicability to normal voices. Methods: High speed recordings were performed during phonation within a group of five female and five male persons with normal voices. By using knowledge based image processing algorithms, motion curves of the vocal folds were extracted at three different positions (dorsal, medial, ventral). These curves were used to obtain physiological voice parameters, and in particular the degree of symmetry of the vocal folds based upon a biomechanical model of the vocal folds. Results: The highest degree of symmetry was observed for the medial motion curves. While the dor-sally and ventrally extracted motion curves exhibited similar results concerning the degree of symmetry the performance of the algorithm was less stable. Conclusions: The inversion algorithm provides reasonable results for all subjects when applied to the medial motion curves. However, for dorsal and ventral motion curves, correct performance is reduced to 85 %.


Sign in / Sign up

Export Citation Format

Share Document