Spectral- and Cepstral-Based Acoustic Features of Dysphonic, Strained Voice Quality

Abstract Filled pauses (FPs) have proved to be more than valuable cues to speech production processes and important units in discourse analysis. Some aspects of their form and occurrence patterns have been shown to be speaker- and language-specific. In the present study, basic acoustic properties of FPs in Polish task-oriented dialogues are explored. A set of FPs was extracted from a corpus of twenty task- oriented dialogues on the basis of available annotations. After initial scrutiny and selection, a subset of the signals underwent a series of pitch, formant frequency and voice quality analyses. A significant amount of variation found in the realisations of FPs justifies their potential application in speaker recognition systems. Regular monosegmental FPs were confirmed to show relatively stable basic acoustic parameters, which allows for their easy identification and measurements but it may result in less significant differences among the speakers.

Download Full-text

Individual Differences in Voice Quality Perception

Journal of Speech Language and Hearing Research ◽

10.1044/jshr.3503.512 ◽

1992 ◽

Vol 35 (3) ◽

pp. 512-520 ◽

Cited By ~ 157

Author(s):

Jody Kreiman ◽

Bruce R. Gerratt ◽

Kristin Precoda ◽

Gerald S. Berke

Keyword(s):

Individual Differences ◽

Multidimensional Scaling ◽

Voice Quality ◽

Nonmetric Multidimensional Scaling ◽

Acoustic Features ◽

Acoustic Parameters ◽

Voice Perception ◽

Quality Perception ◽

Normal Populations ◽

Homogeneous Set

Sixteen listeners (10 expert, 6 naive) judged the dissimilarity of pairs of voices drawn from pathological and normal populations. Separate nonmetric multidimensional scaling solutions were calculated for each listener and voice set. The correlations between individual listeners’ dissimilarity ratings were low However, scaling solutions indicated that each subject judged the voices in a reliable, meaningful way. Listeners differed more from one another in their judgments of the pathological voices (which varied widely on a number of acoustic parameters) than they did for the normal voices (which formed a much more homogeneous set acoustically). The acoustic features listeners used to judge dissimilarity were predictable from the characteristics of the stimulus sets’ only parameters that showed substantial variability were perceptually salient across listeners. These results are consistent with prototype models of voice perception They suggest that traditional means of assessing listener reliability n voice perception tasks may not be appropriate, and highlight the importance of using explicit comparisons between stimuli when studying voice quality perception

Download Full-text

Acoustic covariants of length contrast in Japanese stops

Journal of the International Phonetic Association ◽

10.1017/s0025100308003459 ◽

2008 ◽

Vol 38 (2) ◽

pp. 167-186 ◽

Cited By ~ 41

Author(s):

Kaori Idemaru ◽

Susan G. Guion

Keyword(s):

Fundamental Frequency ◽

Voice Quality ◽

Acoustic Features ◽

Acoustic Correlates ◽

Intensity Changes

This study explores acoustic correlates to the singleton vs. geminate stop length contrast in Japanese. The proposal examined is that multiple acoustic features covary with the stop length distinction and that these features are available in the signal as potential secondary cues. The results support the proposal, revealing the presence of several acoustic features covarying with the singleton vs. geminate contrast in both durational and non-durational domains. Specifically, the preceding vowel is longer, the following vowel is shorter, there are greater fundamental frequency and intensity changes from the preceding to the following vowel, and there is evidence of more creakiness in voice quality for geminate than singleton consonants. It is also demonstrated that the vowel durations, as well as fundamental frequency and intensity changes have fairly strong categorization power.

Download Full-text

Factors in voice quality: Acoustic features related to gender

10.1109/icassp.1987.1169707 ◽

2005 ◽

Cited By ~ 3

Author(s):

D. Childers ◽

Ke Wu ◽

D. Hicks

Keyword(s):

Voice Quality ◽

Acoustic Features

Download Full-text

Similar Speaker Selection Technique Based on Distance Metric Learning Using Highly Correlated Acoustic Features with Perceptual Voice Quality Similarity

IEICE Transactions on Information and Systems ◽

10.1587/transinf.2014edp7183 ◽

2015 ◽

Vol E98.D (1) ◽

pp. 157-165 ◽

Cited By ~ 2

Author(s):

Yusuke IJIMA ◽

Hideyuki MIZUNO

Keyword(s):

Metric Learning ◽

Voice Quality ◽

Distance Metric Learning ◽

Acoustic Features ◽

Distance Metric ◽

Selection Technique ◽

Highly Correlated ◽

Speaker Selection

Download Full-text

What Makes Business Speakers Sound Charismatic?

Cadernos de Linguística ◽

10.25189/2675-4916.2020.v1.n1.id272 ◽

2020 ◽

Vol 1 (1) ◽

pp. 01-40

Author(s):

Oliver Niebuhr ◽

Alexander Brem ◽

Jan Michalsky ◽

Jana Neitsch

Keyword(s):

Acoustic Analysis ◽

Voice Quality ◽

Large Field ◽

Religious Leaders ◽

Acoustic Features ◽

Research Gaps ◽

Steve Jobs ◽

Computer Based ◽

Tone Of Voice ◽

And Training

Phonetic research on the prosodic sources of perceived charisma has taken a big step towards making a speaker’s tone-of-voice a tangible, quantifiable, and trainable matter. However, the tone-of-voice includes a complex bundle of acoustic features, and a lot of parameters have not even been looked at so far. Moreover, all previous studies focused on political or religious leaders and left aside the large field of managers and CEOs in the world of business. These are the two research gaps addressed in the present study. An acoustic analysis of about 1,350 prosodic phrases from keynotes given by a more charismatic CEO (Steve Jobs) and a less charismatic CEO (Mark Zuckerberg) suggests that the same tone-of-voice settings that make political or religious leaders sound more charismatic also work for business speakers. In addition, results point to further charisma-relevant acoustic parameters related to rhythm, emphasis, pausing, and voice quality - as well as to audience type as a significant context factor. The findings are discussed with respect to implications for future perception-oriented studies and perspectives for a computer-based measurement, assessment, and training of a charismatic tone of voice.

Download Full-text

Listener Detection of Objectively Validated Acoustic Features of Speech in Huntington’s Disease

Journal of Huntington s Disease ◽

10.3233/jhd-210501 ◽

2021 ◽

pp. 1-9

Author(s):

Jess C.S. Chan ◽

Julie C. Stout ◽

Christopher A. Shirbin ◽

Adam P. Vogel

Keyword(s):

Huntington's Disease ◽

Huntington’S Disease ◽

Acoustic Analysis ◽

Early Stage ◽

Speech Rate ◽

Voice Quality ◽

Cognitive Test ◽

Healthy Controls ◽

Acoustic Features ◽

Pitch Level

Background: Subtle progressive changes in speech motor function and cognition begin prior to diagnosis of Huntington’s disease (HD). Objective: To determine the nature of listener-rated speech differences in premanifest and early-stage HD (i.e., PreHD and EarlyHD), compared to neurologically healthy controls. Methods: We administered a speech battery to 60 adults (16 people with PreHD, 14 with EarlyHD, and 30 neurologically healthy controls), and conducted a cognitive test of processing speed/visual attention, the Symbol Digit Modalities Test (SDMT) on participants with HD. Voice recordings were rated by expert listeners and analyzed for acoustic and perceptual speech features. Results: Listeners perceived subtle differences in the speech of PreHD compared to controls, including abnormal pitch level and speech rate, reduced loudness and loudness inflection, altered voice quality, hypernasality, imprecise articulation, and reduced naturalness of speech. Listeners detected abnormal speech rate in PreHD compared to healthy speakers on a reading task, which correlated with slower speech rate from acoustic analysis and a lower cognitive performance score. In early-stage HD, continuous speech was characterized by longer pauses, a higher proportion of silence, and slower rate. Conclusion: Differences in speech and voice acoustic features are detectable in PreHD by expert listeners and align with some acoustically-derived objective speech measures. Slower speech rate in PreHD suggests altered oral motor control and/or subtle cognitive deficits that begin prior to diagnosis. Speakers with EarlyHD exhibited more silences compared to the PreHD and control groups, raising the likelihood of a link between speech and cognition that is not yet well characterized in HD.

Download Full-text

Exploring the Acoustic Perceptual Relationship of Speech in Parkinson's Disease

Journal of Speech Language and Hearing Research ◽

10.1044/2021_jslhr-20-00610 ◽

2021 ◽

pp. 1-11

Author(s):

Yi-Fang Chiu ◽

Amy Neel ◽

Travis Loux

Keyword(s):

Parkinson’S Disease ◽

Parkinson's Disease ◽

Voice Quality ◽

Acoustic Features ◽

Healthy Older Adults ◽

Acoustic Measures ◽

Perceptual Judgments ◽

Second Formant ◽

Relationship Of ◽

Future Work

Purpose Auditory perceptual judgments are commonly used to diagnose dysarthria and assess treatment progress. The purpose of the study was to examine the acoustic underpinnings of perceptual speech abnormalities in individuals with Parkinson's disease (PD). Method Auditory perceptual judgments were obtained from sentences produced by 13 speakers with PD and five healthy older adults. Twenty young listeners rated overall ease of understanding, articulatory precision, voice quality, and prosodic adequacy on a visual analog scale. Acoustic measures associated with the speech subsystems of articulation, phonation, and prosody were obtained, including second formant transitions, articulation rate, cepstral and spectral measures of voice, and pitch variations. Regression analyses were performed to assess the relationships between perceptual judgments and acoustic variables. Results Perceptual impressions of Parkinsonian speech were related to combinations of several acoustic variables. Approximately 36%–49% of the variance in the perceptual ratings were explained by the acoustic measures indicating a modest acoustic perceptual relationship. Conclusions The relationships between perceptual ratings and acoustic signals in Parkinsonian speech are multifactorial and involve a variety of acoustic features simultaneously. The modest acoustic perceptual relationships, however, suggest that future work is needed to further examine the acoustic bases of perceptual judgments in dysarthria.

Download Full-text

Age Norms for Auditory-Perceptual Neurophonetic Parameters: A Prerequisite for the Assessment of Childhood Dysarthria

Journal of Speech Language and Hearing Research ◽

10.1044/2020_jslhr-19-00114 ◽

2020 ◽

Vol 63 (4) ◽

pp. 1071-1082

Author(s):

Theresa Schölderle ◽

Elisabet Haas ◽

Wolfram Ziegler

Keyword(s):

Assessment Tool ◽

Developmental Trajectories ◽

Voice Quality ◽

Typically Developing ◽

Typically Developing Children ◽

Age Norms ◽

Elementary School Age ◽

Speech Characteristics ◽

Substantial Progress ◽

Computer Based

Purpose The aim of this study was to collect auditory-perceptual data on established symptom categories of dysarthria from typically developing children between 3 and 9 years of age, for the purpose of creating age norms for dysarthria assessment. Method One hundred forty-four typically developing children (3;0–9;11 [years;months], 72 girls and 72 boys) participated. We used a computer-based game specifically designed for this study to elicit sentence repetitions and spontaneous speech samples. Speech recordings were analyzed using the auditory-perceptual criteria of the Bogenhausen Dysarthria Scales, a standardized German assessment tool for dysarthria in adults. The Bogenhausen Dysarthria Scales (scales and features) cover clinically relevant dimensions of speech and allow for an evaluation of well-established symptom categories of dysarthria. Results The typically developing children exhibited a number of speech characteristics overlapping with established symptom categories of dysarthria (e.g., breathy voice, frequent inspirations, reduced articulatory precision, decreased articulation rate). Substantial progress was observed between 3 and 9 years of age, but with different developmental trajectories across different dimensions. In several areas (e.g., respiration, voice quality), 9-year-olds still presented with salient developmental speech characteristics, while in other dimensions (e.g., prosodic modulation), features typically associated with dysarthria occurred only exceptionally, even in the 3-year-olds. Conclusions The acquisition of speech motor functions is a prolonged process not yet completed with 9 years. Various developmental influences (e.g., anatomic–physiological changes) shape children's speech specifically. Our findings are a first step toward establishing auditory-perceptual norms for dysarthria in children of kindergarten and elementary school age. Supplemental Material https://doi.org/10.23641/asha.12133380

Download Full-text