Automatic Voice Quality Measurement Based on Efficient Combination of Multiple Features

Author(s):  
Ji-Yeoun Lee ◽  
Sangbae Jeong ◽  
Minsoo Hahn ◽  
Hong-Shik Choi
2010 ◽  
Vol 30 (0) ◽  
pp. 104
Author(s):  
Pin-Hsuan Chen ◽  
Che-Nan Yang

2009 ◽  
Vol 19 (1) ◽  
pp. 79-103 ◽  
Author(s):  
Abdulhussain E. Mahdi ◽  
Dorel Picovici

Loquens ◽  
2017 ◽  
Vol 4 (1) ◽  
pp. 040
Author(s):  
Zulema Santana-López ◽  
Óscar Domínguez-Jaén ◽  
Jesús B. Alonso ◽  
María Del Carmen Mato-Carrodeguas

Voice pathologies, caused either by functional dysphonia or organic lesions, or even by just an inappropriate emission of the voice, may lead to vocal abuse, affecting significantly the communication process. The present study is based on the case of a single patient diagnosed with myasthenia gravis (Erb-Goldflam syndrome). In this case, this affection has caused, among other disruptions, a dysarthria. For its treatment, a technique for the education and re-education of the voice has been used, based on a resonator element: the cellophane screen. This article shows the results obtained in the patient after applying a vocal re-education technique called the Cimardi Method: the Cellophane Screen, which is a pioneering technique in this field. Changes in the patient’s voice signal have been studied before and after the application of the Cimardi Method in different domains of study: time-frequency, spectrum, and cepstrum. Moreover, parameters for voice quality measurement, such as shimmer, jitter and harmonic-to-noise ratio (HNR), have been used to quantify the results obtained with the Cimardi Method. Once the results were analyzed, it has been observed that the Cimardi Method helps to produce a more natural and free vocal emission, which is very useful as a rehabilitation therapy for those people presenting certain vocal disorders.


Author(s):  
Jesús Bernardino Alonso Hernández ◽  
Patricia Henríquez Rodríguez

It is possible to implement help systems for diagnosis oriented to the evaluation of the fonator system using speech signal, by means of techniques based on expert systems. The application of these techniques allows the early detection of alterations in the fonator system or the temporary evaluation of patients with certain treatment, to mention some examples. The procedure of measuring the voice quality of a speaker from a digital recording consists of quantifying different acoustic characteristics of speech, which makes it possible to compare it with certain reference patterns, identified previously by a “clinical expert”. A speech acoustic quality measurement based on an auditory assessment is very hard to assess as a comparative reference amongst different voices and different human experts carrying out the assessment or evaluation. In the current bibliography, some attempts have been made to obtain objective measures of speech quality by means of multidimensional clinical measurements based on auditory methods. Well-known examples are: GRBAS scale from Japon (Hirano, M.,1981) and its extension developed and applied in Europe (Dejonckere, P. H. Remacle, M. Fresnel-Elbaz, E. Woisard, V. Crevier- Buchman, L. Millet, B.,1996), a set of perceptual and acoustic characteristics in Sweden (Hammarberg, B. & Gauffin, J., 1995), a set of phonetics characteristics with added information about the excitement of the vocal tract. The aim of these (quality speech measurements) procedures is to obtain an objective measurement from a subjective evaluation. There exist different works in which objective measurements of speech quality obtained from a recording are proposed (Alonso J. B.,2006), (Boyanov, B & Hadjitodorov, S., 1997),(Hansen, J.H.L., Gavidia-Ceballos, L. & Kaiser, J.F., 1998),(Stefan Hadjitodorov & Petar Mitev, 2002),(Michaelis D.; Frohlich M. & Strube H. W. ,1998),(Boyanov B., Doskov D., Mitev P., Hadjitodorov S. & Teston B.,2000),(Godino-Llorente, J.I.; Aguilera-Navarro, S. & Gomez-Vilda, P. , 2000). In these works a voiced sustained sound (usually a vowel) is recorded and then used to compute speech quality measurements. The utilization of a voiced sustained sound is due to the fact that during the production of this kind of sound, the speech system uses almost all its mechanisms (glottal flow of constant air, vocal folds vibration in a continuous way, …), enabling us to detect any anomaly in these mechanisms. In these works different sets of measurements are suggested in order to quantify speech quality objectively. In all these works one important fact is revealed; it is necessary to obtain different measurements of the speech signal in order to compile the different aspects of acoustic characteristics of the speech signal.


2011 ◽  
pp. 1008-1016
Author(s):  
Jesús Bernardino Alonso Hernández ◽  
Patricia Henríquez Rodríguez

It is possible to implement help systems for diagnosis oriented to the evaluation of the fonator system using speech signal, by means of techniques based on expert systems. The application of these techniques allows the early detection of alterations in the fonator system or the temporary evaluation of patients with certain treatment, to mention some examples. The procedure of measuring the voice quality of a speaker from a digital recording consists of quantifying different acoustic characteristics of speech, which makes it possible to compare it with certain reference patterns, identified previously by a “clinical expert”. A speech acoustic quality measurement based on an auditory assessment is very hard to assess as a comparative reference amongst different voices and different human experts carrying out the assessment or evaluation. In the current bibliography, some attempts have been made to obtain objective measures of speech quality by means of multidimensional clinical measurements based on auditory methods. Well-known examples are: GRBAS scale from Japon (Hirano, M.,1981) and its extension developed and applied in Europe (Dejonckere, P. H. Remacle, M. Fresnel-Elbaz, E. Woisard, V. Crevier- Buchman, L. Millet, B.,1996), a set of perceptual and acoustic characteristics in Sweden (Hammarberg, B. & Gauffin, J., 1995), a set of phonetics characteristics with added information about the excitement of the vocal tract. The aim of these (quality speech measurements) procedures is to obtain an objective measurement from a subjective evaluation. There exist different works in which objective measurements of speech quality obtained from a recording are proposed (Alonso J. B.,2006), (Boyanov, B & Hadjitodorov, S., 1997),(Hansen, J.H.L., Gavidia-Ceballos, L. & Kaiser, J.F., 1998),(Stefan Hadjitodorov & Petar Mitev, 2002),(Michaelis D.; Frohlich M. & Strube H. W. ,1998),(Boyanov B., Doskov D., Mitev P., Hadjitodorov S. & Teston B.,2000),(Godino-Llorente, J.I.; Aguilera-Navarro, S. & Gomez-Vilda, P. , 2000). In these works a voiced sustained sound (usually a vowel) is recorded and then used to compute speech quality measurements. The utilization of a voiced sustained sound is due to the fact that during the production of this kind of sound, the speech system uses almost all its mechanisms (glottal flow of constant air, vocal folds vibration in a continuous way, …), enabling us to detect any anomaly in these mechanisms. In these works different sets of measurements are suggested in order to quantify speech quality objectively. In all these works one important fact is revealed; it is necessary to obtain different measurements of the speech signal in order to compile the different aspects of acoustic characteristics of the speech signal.


2013 ◽  
Vol 5 (2) ◽  
pp. 150-154
Author(s):  
Evaldas Stankevičius

The article deals with methods measuring the quality of voice transmitted over the mobile network as well as related problem, algorithms and options. It presents the created voice quality measurement system and discusses its adequacy as well as efficiency. Besides, the author presents the results of system application under the optimal hardware configuration. Under almost ideal conditions, the system evaluates the voice quality with MOS 3.85 average estimate; while the standardized TEMS Investigation 9.0 has 4.05 average MOS estimate. Next, the article presents the discussion of voice quality predictor implementation and investigates the predictor using nonlinear and linear prediction methods of voice quality dependence on the mobile network settings. Nonlinear prediction using artificial neural network resulted in the correlation coefficient of 0.62. While the linear prediction method using the least mean squares resulted in the correlation coefficient of 0.57. The analytical expression of voice quality features from the three network parameters: BER, C / I, RSSI is given as well. Article in Lithuanian. Santrauka Nagrinėjama mobiliuoju tinklu perduoto balso kokybės matavimo metodikos problematika, balso kokybės įvertinimo algoritmų pasirinkimo galimybės. Aptariamas sukurtos balso kokybės matavimo sistemos tinkamumas, efektyvumas. Pateikiami sukurtos sistemos taikymo rezultatai parinkus optimalią įrangos konfigūraciją. Sąlygomis, artimomis idealioms, gauta, kad sukurta sistema balso kokybę įvertina vidutiniu 3,85 MOS įverčiu, o standartizuota TEMS Investigation 9.0 – vidutiniu 4,05 MOS įverčiu. Aptarta balso kokybės prognozatoriaus sukūrimo galimybė. Ištirtas balso kokybės priklausomybės nuo mobiliojo tinklo parametrų prognozatorius, taikantis tiesinės ir netiesinės prognozės būdus. Netiesinė prognozė, taikant dirbtinius neuronų tinklus, teikia 0,62 koreliacijos koeficientą. Tiesinė prognozė mažiausiųjų kvadratų metodu teikia 0,57 koreliacijos koeficientą. Gauta analitinė balso kokybės funkcijos išraiška nuo trijų tinklo parametrų: BER, C/I, RSSI.


Sign in / Sign up

Export Citation Format

Share Document