A Methodological Study of Perturbation and Additive Noise in Synthetically Generated Voice Signals

1987 ◽  
Vol 30 (4) ◽  
pp. 448-461 ◽  
Author(s):  
James Hillenbrand

There is a relatively large body of research that is aimed at finding a set of acoustic measures of voice signals that can be used to: (a) aid in the detection, diagnosis, and evaluation of voice-quality disorders; (b) identify individual speakers by their voice characteristics; or (c) improve methods of voice synthesis. Three acoustic parameters that have received a relatively large share of attention, especially in the voice-disorders literature, are pitch perturbation, amplitude perturbation, and additive noise. The present study consisted of a series of simulations using a general-purpose formant synthesizer that were designed primarily to determine whether these three parameters could be measured independent of one another. Results suggested that changes in any single dimension can affect measured values of all three parameters. For example, adding noise to a voice signal resulted not only in a change in measured signal-to-noise ratio, but also in measured values of pitch and amplitude perturbation, These interactions were quite large in some cases, especially in view of the fact that the perturbation phenomena that are being measured are generally quite small. For the most part, the interactions appear to be readily explainable when the measurement techniques are viewed in relation to what is known about the acoustics of voice production.

Loquens ◽  
2017 ◽  
Vol 4 (1) ◽  
pp. 040
Author(s):  
Zulema Santana-López ◽  
Óscar Domínguez-Jaén ◽  
Jesús B. Alonso ◽  
María Del Carmen Mato-Carrodeguas

Voice pathologies, caused either by functional dysphonia or organic lesions, or even by just an inappropriate emission of the voice, may lead to vocal abuse, affecting significantly the communication process. The present study is based on the case of a single patient diagnosed with myasthenia gravis (Erb-Goldflam syndrome). In this case, this affection has caused, among other disruptions, a dysarthria. For its treatment, a technique for the education and re-education of the voice has been used, based on a resonator element: the cellophane screen. This article shows the results obtained in the patient after applying a vocal re-education technique called the Cimardi Method: the Cellophane Screen, which is a pioneering technique in this field. Changes in the patient’s voice signal have been studied before and after the application of the Cimardi Method in different domains of study: time-frequency, spectrum, and cepstrum. Moreover, parameters for voice quality measurement, such as shimmer, jitter and harmonic-to-noise ratio (HNR), have been used to quantify the results obtained with the Cimardi Method. Once the results were analyzed, it has been observed that the Cimardi Method helps to produce a more natural and free vocal emission, which is very useful as a rehabilitation therapy for those people presenting certain vocal disorders.


2021 ◽  
Vol 1 (1) ◽  
pp. 83-93
Author(s):  
Noor N. Edan ◽  
Nasser N. Khamiss

In mobile communication systems bit-rate reductions while maintaining an acceptable voice quality are necessary to achieve efficiency in channel bandwidth utilization and users satisfaction. As Long-Term Evolution(LTE) converging towards all-IP solutions and supporting VOIP service, the voice signals are converted into coded digital bit-stream and sent over the network. This paper proposes the implementation of codebook excited linear prediction (CELP) voice codec algorithm based on two source-rates of low 9.6Kbps and medium 16Kbps for achieving a perceptible level of voice quality, while efficiently using available bandwidth during the transmission over advanced LTE. The architecture of proposed CELP codec model is implemented to decompose the voice signal into a set of parameters that characterize each particular frame at the encoder part, these parameters are quantized and encoded for transmission to the decoder. The investigation showed that the configuration of the link and the applied CELP codec mode mainly influence on the obtained voice capacity and quality. The quantifying also shows that the voice quality can be traded for the enhanced capacity, since the low rate codec will produce lower voice quality than higher rate codec. Also, this paper is achieved, during theconfiguration of the system with higher channel quality indicator (CQI) index, increasing in the capacity gain to a saturated value of about 500 and 1000 users per cell over 5MHz bandwidth for transmit diversity (TD) and Open-Loop Spatial Multiplexing (OLSM) respectively and up to 1000 and 2000 users per cell over 10MHz channel bandwidth for TD and OLSM respectively.


1997 ◽  
Vol 106 (4) ◽  
pp. 279-285 ◽  
Author(s):  
David G. Hanson ◽  
Judy Chen ◽  
Jack J. Jiang ◽  
Barbara Roa Pauloski

Sixteen patients who had symptoms and signs of chronic posterior laryngitis were evaluated before, during, and after treatment with omeprazole and nocturnal antireflux precautions. Data were analyzed for patients who complained of some hoarseness, who had no smoking history, and who completed all of the voice recording protocol. The patients' voices were recorded before, during, and following treatment with omeprazole and nocturnal antireflux precautions. Voice quality was analyzed by perceptual analysis, and acoustic signal data were measured for jitter, shimmer, and signal-to-noise ratio. Measures of jitter, shimmer, and signal-to-noise ratio changed significantly with treatment of posterior laryngitis (p < .01 for change in each of the measures). Acoustic measures showed some trend of deterioration with cessation of treatment, although the overall improvement in acoustic measures of voice quality was still statistically significant after treatment with omeprazole was discontinued. Although perceived abnormality of voice increased and decreased with the magnitude of measured perturbation of the acoustic signal for some patients, the perceptual assessments were not highly correlated with acoustic measures for individual patients, and the perceptual analysis group data did not show a significant change with time during treatment, in contrast to the significance of change in acoustic measures. The data demonstrate that acoustic measures of jitter, shimmer, and signal-to-noise ratio improve significantly with antisecretory and antireflux treatment of chronic posterior laryngitis, and that for individual patients, these are changes that are detected by trained listeners, but not at statistically high levels of confidence.


2020 ◽  
Vol 148 (9-10) ◽  
pp. 560-564
Author(s):  
Sanja Djokovic ◽  
Vladan Plecevic ◽  
Tamara Kovacevic ◽  
Sinisa Solaja ◽  
Bojana Vukovic

Introduction/Objective. Tonsillitis is a very common condition found in the pediatric population but also in adult patients. One of the consequences of such conditions is poor voice quality. Hoarseness, poor voice impostation, interruption, and hypernazalization are just some of the differences in patient voice quality. The objective of this paper was to examine the effects of tonsillectomy on the voice quality. Methods. The sample included 37 patients, 17 female and 20 male, ranging in age 3?39 years. The method involved recording patients one month before and one month after tonsillectomy with a digital sound recorder, with recordings analyzed in the Praat program. The variables monitored in the basic voice were as follows: voice pitch, standard deviation of voice, degree of voice interruption, jitter, shimmer, and signal-to-noise ratio. In the statistical analysis, in addition to standard descriptive analyzes, t-test and ACNOVA were also used. Results. The results showed that there are effects of tonsillectomy on standard deviation of baseline voice (p = 0.002), shimmer (p = 0.002), baseline voice interruption rate (p = 0.023), signal to noise ratio (p = 0.003). There were no differences in the effects of tonsillectomy with respect to the sex of the subjects. Conclusion. Based on the conducted research, there were some methodological conclusions that could be considered as a recommendation for future research: increase the number of persons in the sample, introduce a variable of chronological age, type of surgical intervention, and gradation of size of the tonsil and adenoid tissue.


1990 ◽  
Vol 33 (2) ◽  
pp. 324-334 ◽  
Author(s):  
S. Feijoo ◽  
C. Hernández

The vocal quality of 64 normal subjects and 57 subjects suffering various degrees of glottal cancer was investigated using acoustic measures of six different aspects of the voice signal: tone period perturbation, amplitude perturbation, waveform perturbation, vocal noise, spectral periodicity and spectral distortion. The measures were estimated taking the glottal cycle as temporal reference unit to make the influence of the differences in tone period from one person to another as low as possible. The measures were evaluated with regard to (a) their ability to discriminate between healthy and sick subjects, and (b) their correlation with the perceptual evaluation of four trained listeners. The results suggest that signal processing techniques are unsatisfactory for clinical diagnoses but useful for monitoring voice quality.


2020 ◽  
pp. 2434-2439
Author(s):  
Hani S. Hassan ◽  
Jammila Harbi S. ◽  
Maisa'a Abid Ali Kodher

Voice denoising is the process of removing undesirable voices from the voice signal. Within the environmental noise and after the application of speech recognition system, the discriminative model finds it difficult to recognize the waveform of the voice signal. This is due to the fact that the environmental noise needs to use a suitable filter that does not affect the shaped waveform of the input microphone. This paper plans to build up a procedure for a discriminative model, using infinite impulse response filter (Butterworth filter) and local polynomial approximation (Savitzky-Golay) smoothing filter that is a polynomial regression on the signal values. Signal to noise ratio (SNR) was calculated after filtering to compare the results after and before adding the Savitzky-Golay smoothing filter. This procedure showed better results for the filtering of ambient noise and protecting a waveform from distortion, which makes the discriminative model more accurate when recognizing voice. Our procedure for preprocessing was developed and successfully implemented on a discriminative model by using MATLAB.  


2021 ◽  
Author(s):  
PRAMOD MEHRA ◽  
Parag Jain

Abstract For a human interaction with machine, it is important that it understand the mood of the speaker. Until now we train machines on neutral speeches or utterances. The mood of a person would affect their performances. Deciphering human mood is challenging for the machines, as human can create fourteen distinct sound in a second. For a machine to understand the human behaviour, it should understand the acoustic abilities of the human ear. Mel Frequency Cepstral Coefficients (MFCC) and Linear Prediction coefficients (LPC) can replicate human auditory system. The proposed model Emotion Recognition from Indian Languages (ERIL) extracts emotions like fear, anger, surprise, sadness, happiness, and neutral. ERIL first pre-processes the voice signal, extracts selective MFCC, LPC, pitch, and voice quality features, then classifies the speech using Catboost. ERIL is a multilingual emotion classifier, it is independent of any language. We checked it on Hindi, Gujarati, Marathi, Punjabi, Bangla, Tamil, Oriya, and Telugu. We recorded a speech dataset of various emotions in these languages. ERIL is compared to other benchmark classifiers.


2021 ◽  
Vol 67 (6) ◽  
pp. 46-51
Author(s):  
P.M. Kovalchuk ◽  
◽  
T.A. Shydlovska ◽  

We aimed to analyse voice signals in 40 patients with chronic laryngitis elicited by exposure to chemical factors. We ex- amined 20 people with catarrhal chronic laryngitis (group 1), 20 people with subatrophic chronic laryngitis (group 2) and 15 healthy volunteers as controls. All subjects underwent acoustic examination of the voice signal using the software Praat V 4.2.1. We studied acoustic measures as follows: Jitter, Shimmer and NHR (noise-to-harmonics ratio). The analysis of the obtained data revealed statistically significant differ- ences in the average values of Jitter and Shimmer measures, as well as in the ratio of nonharmonic (noise) and harmonic component in the spectrum ( NHR) in patients with chronic laryngitis (groups 1 and 2) compared with controls. In group 1 (chronic catarrhal laryngitis), the average values of acoustic measures such as Jitter, Shimmer and NHR were as follows: Jitter - 0.92 ± 0.1%, Shimmer - 5.31 ± 0.5%, NHR - 0.078 ± 0.04. In group 2 (subatorophic laryngitis), the average values of acoustic measures were: Jitter - 0.67 ± 0.6%, Shimmer - 6.57 ± 0.7% and NHR - 0.028 ± 0.003. The obtained data indicate a pronounced instability of the voice in frequency and amplitude, a significant proportion of the noise component in the spectrum of the voice signal in the examined patients with chronic laryngitis exposed to chemical factors. The most pronounced alterations were found in patients with catarrhal chronic laryngitis. We conclude that the quantitative values of spectral analysis of the voice signal Jitter, Shimmer, NHR may serve as valuable criteria of the degree of voice impair- ment. This may be helpful in determining the effectiveness of rehabilitation measures.


2020 ◽  
Vol 63 (12) ◽  
pp. 3991-3999
Author(s):  
Benjamin van der Woerd ◽  
Min Wu ◽  
Vijay Parsa ◽  
Philip C. Doyle ◽  
Kevin Fung

Objectives This study aimed to evaluate the fidelity and accuracy of a smartphone microphone and recording environment on acoustic measurements of voice. Method A prospective cohort proof-of-concept study. Two sets of prerecorded samples (a) sustained vowels (/a/) and (b) Rainbow Passage sentence were played for recording via the internal iPhone microphone and the Blue Yeti USB microphone in two recording environments: a sound-treated booth and quiet office setting. Recordings were presented using a calibrated mannequin speaker with a fixed signal intensity (69 dBA), at a fixed distance (15 in.). Each set of recordings (iPhone—audio booth, Blue Yeti—audio booth, iPhone—office, and Blue Yeti—office), was time-windowed to ensure the same signal was evaluated for each condition. Acoustic measures of voice including fundamental frequency ( f o ), jitter, shimmer, harmonic-to-noise ratio (HNR), and cepstral peak prominence (CPP), were generated using a widely used analysis program (Praat Version 6.0.50). The data gathered were compared using a repeated measures analysis of variance. Two separate data sets were used. The set of vowel samples included both pathologic ( n = 10) and normal ( n = 10), male ( n = 5) and female ( n = 15) speakers. The set of sentence stimuli ranged in perceived voice quality from normal to severely disordered with an equal number of male ( n = 12) and female ( n = 12) speakers evaluated. Results The vowel analyses indicated that the jitter, shimmer, HNR, and CPP were significantly different based on microphone choice and shimmer, HNR, and CPP were significantly different based on the recording environment. Analysis of sentences revealed a statistically significant impact of recording environment and microphone type on HNR and CPP. While statistically significant, the differences across the experimental conditions for a subset of the acoustic measures (viz., jitter and CPP) have shown differences that fell within their respective normative ranges. Conclusions Both microphone and recording setting resulted in significant differences across several acoustic measurements. However, a subset of the acoustic measures that were statistically significant across the recording conditions showed small overall differences that are unlikely to have clinical significance in interpretation. For these acoustic measures, the present data suggest that, although a sound-treated setting is ideal for voice sample collection, a smartphone microphone can capture acceptable recordings for acoustic signal analysis.


2017 ◽  
Vol 23 (1) ◽  
pp. 1-20
Author(s):  
Kathy Connaughton ◽  
Irena Yanushevskaya

Objective: This study explores the immediate impact of prolonged voice use by professional sports coaches. Method: Speech samples including sustained phonation of vowel /a/ and a short read passage were collected from two professional sports coaches. The audio recordings were made within an hour before and after a coaching session, over three sessions. Perceptual evaluation of voice quality was done using the GRBAS scale. The speech samples were subsequently analyzed using Praat. The acoustic measures included fundamental frequency (f0), jitter, shimmer, Harmonics-to-Noise ratio and Cepstral Peak Prominence. Main results: The results of perceptual and acoustic analysis suggest a slight shift towards a tenser phonation post-coaching session, which is a likely consequence of laryngeal muscle adaptation to prolonged voice use. This tendency was similar in sustained vowels and connected speech. Conclusion: Acoustic measures used in this study can be useful to capture the voice change post-coaching session. It is desirable, however, that more sophisticated and robust and at the same time intuitive and easy-to-use tools for voice assessment and monitoring be made available to clinicians and professional voice users.


Sign in / Sign up

Export Citation Format

Share Document