Effects of Fundamental Frequency, Vocal Intensity, Sample Duration, and Vowel Context in Cepstral and Spectral Measures of Dysphonic Voices

Purpose Smoothed cepstral peak prominence (CPPS) and harmonics-to-noise ratio (HNR) are acoustic measures related to the periodicity, harmonicity, and noise components of an acoustic signal. To date, there is little evidence about the advantages of CPPS over HNR in voice diagnostics. Recent studies indicate that voice fundamental frequency (F0) and intensity (sound pressure level [SPL]), sample duration (DUR), vowel context (speech vs. sustained phonation), and syllable stress (SS) may influence CPPS and HNR results. The scope of this work was to investigate the effects of voice F0 and SPL, DUR, SS, and token on CPPS and HNR in dysphonic voices. Method In this retrospective study, 27 Brazilian Portuguese speakers with voice disorders were investigated. Recordings of sustained vowels (SVs) /a:/ and manually extracted vowels (EVs) /a/ from Consensus Auditory-Perceptual Evaluation of Voice sentences were acoustically analyzed with the Praat program. Results There was a highly significant effect of F0, SPL, and DUR on both CPPS and HNR ( p < .001), whereas SS and vowel context significantly affected CPPS only ( p < .05). Higher SPL, F0, and lower DUR were related to higher CPPS and HNR. SVs moderately-to-highly correlated with EVs for CPPS, whereas HNR had few and moderate correlations. In addition, CPPS and HNR highly correlated in SVs and seven EVs ( p < .05). Conclusion Speaking prosodic variations of F0, SPL, and DUR influenced both CPPS and HNR measures and led to acoustic differences between sustained and excised vowels, especially in CPPS. Vowel context, prosodic factors, and token type should be controlled for in clinical acoustic voice assessment.

Download Full-text

Test–Retest Reliability of Relative Fundamental Frequency and Conventional Acoustic, Aerodynamic, and Perceptual Measures in Individuals With Healthy Voices

Journal of Speech Language and Hearing Research ◽

10.1044/2019_jslhr-s-18-0507 ◽

2019 ◽

Vol 62 (6) ◽

pp. 1707-1718 ◽

Cited By ~ 2

Author(s):

Yeonggwang Park ◽

Cara E. Stepp

Keyword(s):

Fundamental Frequency ◽

Intraclass Correlation ◽

Airflow Rate ◽

Retest Reliability ◽

Acoustic Measures ◽

Perceptual Evaluation ◽

Subglottal Pressure ◽

Cepstral Peak Prominence ◽

Relative Fundamental Frequency ◽

Test Retest Reliability

Purpose Recent studies have shown that an acoustic measure, relative fundamental frequency (RFF), has potential for the assessment of excessive laryngeal tension and vocal effort associated with functional and neurological voice disorders. This study presents an analysis of the test–retest reliability of RFF in individuals with healthy voices and a comparison of reliability between RFF and conventional measures of voice. Method Acoustic and aerodynamic measurements and Consensus Auditory–Perceptual Evaluation of Voice (CAPE-V) were performed on 28 individuals with healthy voices on 5 consecutive days. Participants produced RFF stimuli, a sustained /ɑ/, and a reading passage to allow for extraction of acoustic measures and CAPE-V ratings; /pa/ trains were produced to allow for extraction of aerodynamic measures. Results Moderate reliabilities (intraclass correlation coefficient [ICC] = .64–.71) were found for RFF values. Mean vocal fundamental frequency, smoothed cepstral peak prominence, shimmer, harmonics-to-noise ratio, and mean airflow rate exhibited good-to-excellent reliabilities (ICC = .76–.99). ICCs for jitter and phonation threshold pressure were moderately reliable (ICC = .67–.74). ICCs for subglottal pressure estimates and all CAPE-V parameters showed poor reliabilities (ICC = .31–.58). Conclusion RFF has comparable reliability to conventional measures of voice. This expands the potential for clinical application of RFF. Supplemental Material https://doi.org/10.23641/asha.8233376

Download Full-text

Acoustic evaluation of Isshiki type III thyroplasty for treatment of mutational voice disorders

The Journal of Laryngology & Otology ◽

10.1017/s0022215100143099 ◽

1999 ◽

Vol 113 (1) ◽

pp. 31-34 ◽

Cited By ~ 17

Author(s):

Guo-Dong Li ◽

Liancai Mu ◽

Shilin Yang

Keyword(s):

Fundamental Frequency ◽

Voice Disorders ◽

Normal Value ◽

Type Iii ◽

Acoustic Measures ◽

Acoustic Evaluation ◽

Male Patients ◽

Vocal Intensity ◽

The Voice ◽

Operative Measures

AbstractThe goal of this study was to determine if there are acoustical differences between pre- and post-surgical voices and to evaluate the effectiveness of Isshiki type III thyroplasty in 11 male patients with mutational voice disorders. Acoustic measures were obtained both pre- and post-operatively. A comparison of pre-and post-operative fundamental frequency (Fo), voice frequencies, and vocal intensity obtained from a sustained vowel /i/ during different phonatory tasks was made. The results from the present study demonstrated that after operation the voice frequencies were significantly decreased (p<0.05). The vocal intensity tended to reduce slightly as the voice frequency lowered. However, there were no statistically significant differences in the pre- and post-operative measures of vocal intensity (p>0.5). The preoperative high pitched voices of all the male patients were lowered up to the normal value by the type III thyroplasty.

Download Full-text

Effects of Vocal Intensity and Fundamental Frequency on Cepstral Peak Prominence in Patients with Voice Disorders and Vocally Healthy Controls

Journal of Voice ◽

10.1016/j.jvoice.2019.11.015 ◽

2019 ◽

Cited By ~ 5

Author(s):

Meike Brockmann-Bauser ◽

Jarrad H. Van Stan ◽

Marilia Carvalho Sampaio ◽

Joerg E. Bohlender ◽

Robert E. Hillman ◽

...

Keyword(s):

Fundamental Frequency ◽

Voice Disorders ◽

Healthy Controls ◽

Cepstral Peak Prominence ◽

Vocal Intensity

Download Full-text

Age-Related Changes in Speech and Voice: Spectral and Cepstral Measures

Journal of Speech Language and Hearing Research ◽

10.1044/2019_jslhr-19-00028 ◽

2020 ◽

Vol 63 (3) ◽

pp. 647-660

Author(s):

Sammi Taylor ◽

Christopher Dromey ◽

Shawn L. Nissen ◽

Kristine Tanner ◽

Dennis Eggett ◽

...

Keyword(s):

Standard Deviation ◽

Fundamental Frequency ◽

Center Of Gravity ◽

Read Aloud ◽

Profound Hearing Loss ◽

Acoustic Measures ◽

Spectral Kurtosis ◽

Age Related ◽

Cepstral Peak Prominence ◽

Speaking Fundamental Frequency

Purpose This study examined differences in selected acoustic measures of speech and voice according to age and sex and across families. Method Participants included 169 individuals, 79 men and 90 women, from 18 families, ranging in age from 17 to 87 years. Participants reported no history of articulation disorders, stroke or active neurologic disease, or severe-to-profound hearing loss. They read aloud two passages to facilitate examination of the following speech and voice acoustic parameters: fricative spectral moments (center of gravity, standard deviation, skewness, and kurtosis), the proportion of time spent speaking, mean speaking fundamental frequency, semitone standard deviation (STSD), and cepstral peak prominence smoothed. Results The results indicated a significant age effect for fricative spectral center of gravity, spectral skewness, and speaking STSD. There was a significant sex effect for spectral center of gravity, spectral kurtosis, and mean fundamental frequency. Familial relationship was significant for spectral skewness, STSD, and cepstral peak prominence smoothed. Conclusions These findings revealed that certain speech and voice features change with age and some change differently for men and women. Additionally, speakers from the same family units may demonstrate similar patterns for prosody, voicing, and articulatory behavior. The results also demonstrated normal differences in speech and voice variation across age, sex, and family unit. Understanding patterns and differences across these demographic variables in healthy speakers is important to distinguishing more confidently between normal and disordered speech and voice patterns clinically.

Download Full-text

Synchronised speech and speech motor control: convergence in voice fundamental frequency during choral speech

10.31234/osf.io/9hc34 ◽

2021 ◽

Author(s):

Abigail Bradshaw ◽

Carolyn McGettigan

Keyword(s):

Motor Control ◽

Fundamental Frequency ◽

Speech Motor Control ◽

Self And Other ◽

Acoustic Measures ◽

Speech Timing ◽

Video Recordings ◽

Speech Feedback ◽

Voice Fundamental Frequency ◽

Speech Motor

Synchronised speech behaviours such as choral speech (speaking in unison) are found in a variety of everyday settings, and have clinical relevance as a temporary fluency-enhancing technique for people who stutter. It is currently unknown whether such synchronisation of speech timing among two speakers is also accompanied by alignment in their vocal characteristics, for example in acoustic measures such as pitch. The current study investigated this by testing whether convergence in voice fundamental frequency (F0) between speakers could be demonstrated during choral speech. Sixty participants across three online experiments were audio recorded whilst reading a series of sentences, first on their own, and then in synchrony with another speaker (the accompanist) in a number of between-subject conditions. Experiment 1 demonstrated significant convergence in participants’ F0 to a pre-recorded accompanist voice, in the form of both upward (high F0 accompanist condition) and downward (low F0 accompanist condition) changes in F0; however, upward convergence was greater than downward convergence. Experiment 2 found that downward convergent changes in F0 could not be increased by the use of an accompanist voice with an even lower F0. Experiment 3 demonstrated that such convergence was not seen during a visual choral speech condition, in which participants spoke in synchrony with silent video recordings of the accompanist. Further, convergence in F0 was enhanced for a condition where participants could both see and hear the accompanist in pre-recorded videos compared to synchronisation with the pre-recorded voice alone. These findings suggest the need for models of speech motor control to incorporate interactions between self- and other-speech feedback during speech production, and suggest a novel hypothesis for the mechanisms underlying the fluency-enhancing effects of choral speech in people who stutter.

Download Full-text

Convergence in voice fundamental frequency during synchronous speech

PLoS ONE ◽

10.1371/journal.pone.0258747 ◽

2021 ◽

Vol 16 (10) ◽

pp. e0258747

Author(s):

Abigail R. Bradshaw ◽

Carolyn McGettigan

Keyword(s):

Fundamental Frequency ◽

Speech Motor Control ◽

Self And Other ◽

Acoustic Measures ◽

Speech Timing ◽

Video Recordings ◽

Speech Feedback ◽

Voice Fundamental Frequency ◽

Speech Motor ◽

Speech Condition

Joint speech behaviours where speakers produce speech in unison are found in a variety of everyday settings, and have clinical relevance as a temporary fluency-enhancing technique for people who stutter. It is currently unknown whether such synchronisation of speech timing among two speakers is also accompanied by alignment in their vocal characteristics, for example in acoustic measures such as pitch. The current study investigated this by testing whether convergence in voice fundamental frequency (F0) between speakers could be demonstrated during synchronous speech. Sixty participants across two online experiments were audio recorded whilst reading a series of sentences, first on their own, and then in synchrony with another speaker (the accompanist) in a number of between-subject conditions. Experiment 1 demonstrated significant convergence in participants’ F0 to a pre-recorded accompanist voice, in the form of both upward (high F0 accompanist condition) and downward (low and extra-low F0 accompanist conditions) changes in F0. Experiment 2 demonstrated that such convergence was not seen during a visual synchronous speech condition, in which participants spoke in synchrony with silent video recordings of the accompanist. An audiovisual condition in which participants were able to both see and hear the accompanist in pre-recorded videos did not result in greater convergence in F0 compared to synchronisation with the pre-recorded voice alone. These findings suggest the need for models of speech motor control to incorporate interactions between self- and other-speech feedback during speech production, and suggest a novel hypothesis for the mechanisms underlying the fluency-enhancing effects of synchronous speech in people who stutter.

Download Full-text

Evaluation of Acoustic Analyses of Voice in Nonoptimized Conditions

Journal of Speech Language and Hearing Research ◽

10.1044/2020_jslhr-20-00212 ◽

2020 ◽

Vol 63 (12) ◽

pp. 3991-3999

Author(s):

Benjamin van der Woerd ◽

Min Wu ◽

Vijay Parsa ◽

Philip C. Doyle ◽

Kevin Fung

Keyword(s):

Repeated Measures ◽

Voice Quality ◽

Data Sets ◽

Acoustic Measurements ◽

Sample Collection ◽

Experimental Conditions ◽

Environment Analysis ◽

Acoustic Measures ◽

Recording Conditions ◽

Cepstral Peak Prominence

Objectives This study aimed to evaluate the fidelity and accuracy of a smartphone microphone and recording environment on acoustic measurements of voice. Method A prospective cohort proof-of-concept study. Two sets of prerecorded samples (a) sustained vowels (/a/) and (b) Rainbow Passage sentence were played for recording via the internal iPhone microphone and the Blue Yeti USB microphone in two recording environments: a sound-treated booth and quiet office setting. Recordings were presented using a calibrated mannequin speaker with a fixed signal intensity (69 dBA), at a fixed distance (15 in.). Each set of recordings (iPhone—audio booth, Blue Yeti—audio booth, iPhone—office, and Blue Yeti—office), was time-windowed to ensure the same signal was evaluated for each condition. Acoustic measures of voice including fundamental frequency ( f o ), jitter, shimmer, harmonic-to-noise ratio (HNR), and cepstral peak prominence (CPP), were generated using a widely used analysis program (Praat Version 6.0.50). The data gathered were compared using a repeated measures analysis of variance. Two separate data sets were used. The set of vowel samples included both pathologic ( n = 10) and normal ( n = 10), male ( n = 5) and female ( n = 15) speakers. The set of sentence stimuli ranged in perceived voice quality from normal to severely disordered with an equal number of male ( n = 12) and female ( n = 12) speakers evaluated. Results The vowel analyses indicated that the jitter, shimmer, HNR, and CPP were significantly different based on microphone choice and shimmer, HNR, and CPP were significantly different based on the recording environment. Analysis of sentences revealed a statistically significant impact of recording environment and microphone type on HNR and CPP. While statistically significant, the differences across the experimental conditions for a subset of the acoustic measures (viz., jitter and CPP) have shown differences that fell within their respective normative ranges. Conclusions Both microphone and recording setting resulted in significant differences across several acoustic measurements. However, a subset of the acoustic measures that were statistically significant across the recording conditions showed small overall differences that are unlikely to have clinical significance in interpretation. For these acoustic measures, the present data suggest that, although a sound-treated setting is ideal for voice sample collection, a smartphone microphone can capture acceptable recordings for acoustic signal analysis.

Download Full-text

Effects of Laryngeal Topical Anesthesia on Voice Fundamental Frequency Perturbation

Journal of Speech Language and Hearing Research ◽

10.1044/jshr.2302.274 ◽

1980 ◽

Vol 23 (2) ◽

pp. 274-283 ◽

Cited By ~ 19

Author(s):

David Sorensen ◽

Yoshiyuki Horii ◽

Rebecca Leonard

Keyword(s):

Fundamental Frequency ◽

High Frequency ◽

Frequency Control ◽

Topical Anesthesia ◽

Proprioceptive Feedback ◽

Adult Males ◽

Voice Fundamental Frequency

Fundamental frequency perturbation (jitter) during sustained vowel phonations of speakers under topical anesthesia of the larynx was investigated for five adult males. The results showed that the average jitter was significantly greater under the anesthesia than normal conditions, and that the jitter difference between the two conditions was more prominent at high frequency phonations. Implications of these data for tactile and proprioceptive feedback in phonatory frequency control are discussed.

Download Full-text

Measuring Vocal Fatigue in Sports Coaches

Journal of Clinical Speech and Language Studies ◽

10.3233/acs-2017-23104 ◽

2017 ◽

Vol 23 (1) ◽

pp. 1-20

Author(s):

Kathy Connaughton ◽

Irena Yanushevskaya

Keyword(s):

Acoustic Analysis ◽

Voice Quality ◽

Professional Sports ◽

Muscle Adaptation ◽

Voice Change ◽

Acoustic Measures ◽

Perceptual Evaluation ◽

Coaching Session ◽

Sports Coaches ◽

Voice Use

Objective: This study explores the immediate impact of prolonged voice use by professional sports coaches. Method: Speech samples including sustained phonation of vowel /a/ and a short read passage were collected from two professional sports coaches. The audio recordings were made within an hour before and after a coaching session, over three sessions. Perceptual evaluation of voice quality was done using the GRBAS scale. The speech samples were subsequently analyzed using Praat. The acoustic measures included fundamental frequency (f0), jitter, shimmer, Harmonics-to-Noise ratio and Cepstral Peak Prominence. Main results: The results of perceptual and acoustic analysis suggest a slight shift towards a tenser phonation post-coaching session, which is a likely consequence of laryngeal muscle adaptation to prolonged voice use. This tendency was similar in sustained vowels and connected speech. Conclusion: Acoustic measures used in this study can be useful to capture the voice change post-coaching session. It is desirable, however, that more sophisticated and robust and at the same time intuitive and easy-to-use tools for voice assessment and monitoring be made available to clinicians and professional voice users.

Download Full-text

Acoustic voice characteristics with and without wearing a facemask

Scientific Reports ◽

10.1038/s41598-021-85130-8 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Duy Duong Nguyen ◽

Patricia McCabe ◽

Donna Thomas ◽

Alison Purcell ◽

Maree Doble ◽

...

Keyword(s):

Personal Protective Equipment ◽

Healthcare Workers ◽

Protective Equipment ◽

Mask Condition ◽

Connected Speech ◽

Significant Attenuation ◽

Cepstral Peak Prominence ◽

Vocal Intensity ◽

Significant Change ◽

The Voice

AbstractFacemasks are essential for healthcare workers but characteristics of the voice whilst wearing this personal protective equipment are not well understood. In the present study, we compared acoustic voice measures in recordings of sixteen adults producing standardised vocal tasks with and without wearing either a surgical mask or a KN95 mask. Data were analysed for mean spectral levels at 0–1 kHz and 1–8 kHz regions, an energy ratio between 0–1 and 1–8 kHz (LH1000), harmonics-to-noise ratio (HNR), smoothed cepstral peak prominence (CPPS), and vocal intensity. In connected speech there was significant attenuation of mean spectral level at 1–8 kHz region and there was no significant change in this measure at 0–1 kHz. Mean spectral levels of vowel did not change significantly in mask-wearing conditions. LH1000 for connected speech significantly increased whilst wearing either a surgical mask or KN95 mask but no significant change in this measure was found for vowel. HNR was higher in the mask-wearing conditions than the no-mask condition. CPPS and vocal intensity did not change in mask-wearing conditions. These findings implied an attenuation effects of wearing these types of masks on the voice spectra with surgical mask showing less impact than the KN95.

Download Full-text