Perceptual cues for individual voice quality

Author(s):  
Marianela Fernández Trinidad ◽  
José Manuel Rojo Abuin

The alteration of phonation modes may affect speaker identification, but it is not known to what extent changes in register can compromise discrimination in each case. To answer this question, specifically with respect to falsetto voice, a perception experiment was designed using disguised voice in order to analyze the listeners’ reaction to stimuli that differed only with respect to phonation modes. Results were significant, suggesting that falsetto voice does not compromise speaker recognition. In addition to the perception experiment, the chapter also provides a detailed analysis of the temporal aspects of each mode and the biomechanical behavior of the vocal folds. Connecting the perception experiment to the production analysis allows us, first, to know which parameters have been altered, and which have not, during the register shift to falsetto voice and, secondly, to discuss what perceptive weight such parameters may have on the recognition of individual voice quality.

Author(s):  
A. Nagesh

The feature vectors of speaker identification system plays a crucial role in the overall performance of the system. There are many new feature vectors extraction methods based on MFCC, but ultimately we want to maximize the performance of SID system.  The objective of this paper to derive Gammatone Frequency Cepstral Coefficients (GFCC) based a new set of feature vectors using Gaussian Mixer model (GMM) for speaker identification. The MFCC are the default feature vectors for speaker recognition, but they are not very robust at the presence of additive noise. The GFCC features in recent studies have shown very good robustness against noise and acoustic change. The main idea is  GFCC features based on GMM feature extraction is to improve the overall speaker identification performance in low signal to noise ratio (SNR) conditions.


The performance of Mel scale and Bark scale is evaluated for text-independent speaker identification system. Mel scale and Bark scale are designed according to human auditory system. The filter bank structure is defined using Mel and Bark scales for speech and speaker recognition systems to extract speaker specific speech features. In this work, performance of Mel scale and Bark scale is evaluated for text-independent speaker identification system. It is found that Bark scale centre frequencies are more effective than Mel scale centre frequencies in case of Indian dialect speaker databases. Mel scale is defined as per interpretation of pitch by human ear and Bark scale is based on critical band selectivity at which loudness becomes significantly different. The recognition rate achieved using Bark scale filter bank is 96% for AISSMSIOIT database and 95% for Marathi database.


Author(s):  
Musab T. S. Al-Kaltakchi ◽  
Haithem Abd Al-Raheem Taha ◽  
Mohanad Abd Shehab ◽  
Mohamed A.M. Abdullah

<p><span lang="EN-GB">In this paper, different feature extraction and feature normalization methods are investigated for speaker recognition. With a view to give a good representation of acoustic speech signals, Power Normalized Cepstral Coefficients (PNCCs) and Mel Frequency Cepstral Coefficients (MFCCs) are employed for feature extraction. Then, to mitigate the effect of linear channel, Cepstral Mean-Variance Normalization (CMVN) and feature warping are utilized. The current paper investigates Text-independent speaker identification system by using 16 coefficients from both the MFCCs and PNCCs features. Eight different speakers are selected from the GRID-Audiovisual database with two females and six males. The speakers are modeled using the coupling between the Universal Background Model and Gaussian Mixture Models (GMM-UBM) in order to get a fast scoring technique and better performance. The system shows 100% in terms of speaker identification accuracy. The results illustrated that PNCCs features have better performance compared to the MFCCs features to identify females compared to male speakers. Furthermore, feature wrapping reported better performance compared to the CMVN method. </span></p>


Author(s):  
Jesús Bernardino Alonso Hernández ◽  
Patricia Henríquez Rodríguez

It is possible to implement help systems for diagnosis oriented to the evaluation of the fonator system using speech signal, by means of techniques based on expert systems. The application of these techniques allows the early detection of alterations in the fonator system or the temporary evaluation of patients with certain treatment, to mention some examples. The procedure of measuring the voice quality of a speaker from a digital recording consists of quantifying different acoustic characteristics of speech, which makes it possible to compare it with certain reference patterns, identified previously by a “clinical expert”. A speech acoustic quality measurement based on an auditory assessment is very hard to assess as a comparative reference amongst different voices and different human experts carrying out the assessment or evaluation. In the current bibliography, some attempts have been made to obtain objective measures of speech quality by means of multidimensional clinical measurements based on auditory methods. Well-known examples are: GRBAS scale from Japon (Hirano, M.,1981) and its extension developed and applied in Europe (Dejonckere, P. H. Remacle, M. Fresnel-Elbaz, E. Woisard, V. Crevier- Buchman, L. Millet, B.,1996), a set of perceptual and acoustic characteristics in Sweden (Hammarberg, B. & Gauffin, J., 1995), a set of phonetics characteristics with added information about the excitement of the vocal tract. The aim of these (quality speech measurements) procedures is to obtain an objective measurement from a subjective evaluation. There exist different works in which objective measurements of speech quality obtained from a recording are proposed (Alonso J. B.,2006), (Boyanov, B & Hadjitodorov, S., 1997),(Hansen, J.H.L., Gavidia-Ceballos, L. & Kaiser, J.F., 1998),(Stefan Hadjitodorov & Petar Mitev, 2002),(Michaelis D.; Frohlich M. & Strube H. W. ,1998),(Boyanov B., Doskov D., Mitev P., Hadjitodorov S. & Teston B.,2000),(Godino-Llorente, J.I.; Aguilera-Navarro, S. & Gomez-Vilda, P. , 2000). In these works a voiced sustained sound (usually a vowel) is recorded and then used to compute speech quality measurements. The utilization of a voiced sustained sound is due to the fact that during the production of this kind of sound, the speech system uses almost all its mechanisms (glottal flow of constant air, vocal folds vibration in a continuous way, …), enabling us to detect any anomaly in these mechanisms. In these works different sets of measurements are suggested in order to quantify speech quality objectively. In all these works one important fact is revealed; it is necessary to obtain different measurements of the speech signal in order to compile the different aspects of acoustic characteristics of the speech signal.


2007 ◽  
Vol 122 (8) ◽  
pp. 824-828 ◽  
Author(s):  
E J Damrose ◽  
J F Damrose

AbstractObjective:This study evaluated the role of botulinum toxin type A in the treatment of refractory laryngeal granulomas.Study design and setting:Retrospective clinical review at a tertiary care hospital. Seven patients with vocal process granulomas underwent percutaneous injection of botulinum toxin into both vocal folds, performed in an office setting. Total doses ranged from 10 to 25 U, divided between both vocal folds.Results:All patients experienced resolution of their granulomas over two to seven weeks. No patient developed aspiration pneumonia. All patients experienced hoarseness secondary to the injections, but voice quality returned to baseline in all patients as the toxin was degraded.Conclusions:Botulinum toxin is safe and effective therapy in resolving vocal process granulomas in patients refractory to traditional therapy. The optimal treatment dose remains to be determined.Significance:Percutaneous botulinum toxin injection is helpful in resolving laryngeal granulomas.


2020 ◽  
Vol 74 (4) ◽  
Author(s):  
Bożena Kosztyła-Hojna ◽  
Emilia Duchnowska ◽  
Maciej Zdrojkowski ◽  
Anna Łobaczuk-Sitnik ◽  
Jolanta Biszewska

<b>Introduction:</b> The aging process of voice begins after the age of 60 and has an individually variable course. Voice quality disorders at this age are called senile voice (Presbyphonia or Vox Senium). Voice pathology is particularly severe in women. The aim of the study was to diagnose the clinical form of Presbyphonia in elderly women using High Speed Digital Imaging (HSDI) and acoustic voice analysis. <br><b>Material and methods:</b> Study included 50 elderly women (average age 69) with dysphonia (Group I). Control group (Group II) included 30 women (average age 71) without voice quality disorders. Visualization assessment has been conducted with High Speed Digital Imaging (HSDI) with High Speed camera (HS). Acoustic evaluation of voice included analysis isolated vowel “a” and continuous linguistic text with Diagnoscope Specialista software. Maximum Phonation Time (MPT) has been determined. <br><b>Results:</b> In Group I, 78% of women revealed vocal folds vibrations asymmetry, vibration amplitude increase, Mucousal Wave (MW) limitation and Type D glottal insufficiency (GTs). Acoustic voice analysis proved decrease in F0, increase in Jitter, Shimmer, NHR. In 22% of women, next to vibrations asymmetry, vibration amplitude reduction and MW limitation, Type E glottal insufficiency (GTs) have been found. Acoustic voice analysis revealed slight decrease in F0 and the presence of numerous non-harmonic components in the glottis region. <br><b>Conclusions:</b> Vocal folds visualization with HSDI showed edema, less often atrophy in elderly women. Both forms of dysphonia were caused abnormal values of F0, Jitter, Shimmer, NHR in the acoustic voice evaluation and significant reduction of MPT.


2014 ◽  
Vol 136 (11) ◽  
Author(s):  
Jun Yin ◽  
Zhaoyan Zhang

Although it is known vocal fold adduction is achieved through laryngeal muscle activation, it is still unclear how interaction between individual laryngeal muscle activations affects vocal fold adduction and vocal fold stiffness, both of which are important factors determining vocal fold vibration and the resulting voice quality. In this study, a three-dimensional (3D) finite element model was developed to investigate vocal fold adduction and changes in vocal fold eigenfrequencies due to the interaction between the lateral cricoarytenoid (LCA) and thyroarytenoid (TA) muscles. The results showed that LCA contraction led to a medial and downward rocking motion of the arytenoid cartilage in the coronal plane about the long axis of the cricoid cartilage facet, which adducted the posterior portion of the glottis but had little influence on vocal fold eigenfrequencies. In contrast, TA activation caused a medial rotation of the vocal folds toward the glottal midline, resulting in adduction of the anterior portion of the glottis and significant increase in vocal fold eigenfrequencies. This vocal fold-stiffening effect of TA activation also reduced the posterior adductory effect of LCA activation. The implications of the results for phonation control are discussed.


2008 ◽  
Vol 122 (4) ◽  
pp. 378-382 ◽  
Author(s):  
K Yelken ◽  
M Guven ◽  
M Topak ◽  
E Gultekin ◽  
F Turan

AbstractObjectives:To evaluate the effects of antituberculosis treatment on the voice quality of laryngeal tuberculosis patients, measured by patient self-assessment, perceptual analysis and acoustic analysis.Materials and methods:A total of 14 laryngeal tuberculosis patients were enrolled. Laryngeal tuberculosis was established either by biopsy and histopathological examination or by rapid regression of the laryngeal lesions after antituberculosis medication. Before and after treatment, all patients were evaluated perceptually (on a scale of zero to three), and 12 assessed their own voices using the voice handicap index-10 scale. Acoustic analysis was performed to allow objective evaluation.Results:Patients' ages ranged from 21 to 72 years (mean, 41). The male to female ratio was 12:2. Eight patients (57 per cent) had tuberculous involvement of the epiglottis, four (28 per cent) had involvement of the aryepiglottic fold and eight (57 per cent) had involvement of the false vocal folds. The glottis was the less commonly involved part of the larynx, including true vocal folds (28 per cent, n = 4) and posterior commissure (14 per cent, n = 2). Perceptual evaluation, on a scale of zero to three, gave the patients a median score of six; after commencement of treatment, the median score decreased to two. The mean voice handicap index-10 score decreased from 24 to 12 after treatment. An obvious improvement in acoustic analytical parameters was also found following treatment.Conclusions:Antituberculosis treatment clearly improved the voice outcomes of laryngeal tuberculosis patients, according to self-assessment, perceptual analysis and acoustic analysis.


Sign in / Sign up

Export Citation Format

Share Document