Instrumental Case Studies and Computational Simulations of Voice Quality

Voice Quality ◽  
2019 ◽  
pp. 83-122
2019 ◽  
Vol 3 (1) ◽  
pp. 3-27
Author(s):  
Danae Perez ◽  
Lena Zipp

Abstract This paper focuses on the role of voice quality variation in the system of a contact language, Afro-Yungueño Spanish, a restructured variety of Spanish spoken in the Bolivian Yungas valleys. Based on case studies of naturally occurring conversation between multiple speakers, we show that certain non-modal phonation types, in this case falsetto and breathy voice, are used to index expressiveness, intensification, or emphasis. We argue that these practices have discursive meaning that could otherwise also be encoded by means of grammatical and lexical resources, and that they are an integral part of the linguistic system of this variety. We claim that these practices may have resulted from the specific socio-historical context in which this variety evolved. This suggests that voice quality and ecological factors should not be underestimated in order to reach a more complete picture of how meaning is conveyed in apparently simplified contact languages.


2015 ◽  
Vol 9 (1) ◽  
Author(s):  
Reuven Tsur

AbstractThis paper presents in a nutshell aspects of the author’s research in poetry reading (rhythmical performance and voice quality). At the beginning it states the impossibility of straightforward instrumental research in poetic rhythm, and suggests a work-around within a comprehensive theory (the Perception-Oriented Theory of Metre). All rules for metrical vs unmetrical are violated by the greatest masters of musicality in English poetry (Milton and Shelley, for instance); instead, the theory places the constraints in the performer’s ability or willingness to perform the verse line rhythmically, a rhythmical performance being one in which conflicting patterns of language and versification are simultaneously perceptible. At a pre-instrumental stage the author applied hypotheses derived from the empirical research of others (stress perception, nonlinguistic tick-tack perception and performance of nonsense lines) to account for the peculiar nature of the trochaic metre; as well as hypotheses derived from the limited-channel-capacity hypothesis and gestalt theory to account for the mental processes that govern the vocal devices used in a rhythmical performance. He put to a non-instrumental test this theory in an experiment with the rhythmical performance of stress maxima in the seventh position in the iambic pentameter. Finally, he presents six case studies illustrating six theoretical issues, through computer analysis of recorded readings and electronic manipulations thereof in order to compare minimal pairs of alternative solutions. These case studies explore enjambment, convergent and divergent delivery style, triple-encodedness, listener response, voice quality and issues of interpretation.Such variety of effects is achieved by a homogeneous set of vocal manipulations: grouping and overarticulation which, in the final resort, boil down to conflicting phonetic cues for continuity and discontinuity at the same time. At the end of an utterance in ordinary speech there is, usually, redundancy of cues. We cue discontinuity by a pause, falling intonation contour, prolongation of the last syllable or speech sounds of the utterance, overarticulation of word-final stop releases, if any, overarticulation of the last word boundary, and so forth. In enjambment, for instance, where a syntactic unit overrides the line ending, the performer may have recourse to conflicting cues, indicating at the same time syntactic continuity and discontinuity of the versification unit. When a stressed syllable occurs in a weak position, overarticulation of the phonemes and of the syllable boundaries may save mental processing space, allowing to perceive the conflicting patterns of language and versification. At the same time, continuity must be indicated, to preserve syntactic coherence. A stress maximum (that is, a stressed syllable between two unstressed ones in mid-phrase or mid-word) in the seventh (weak) position of an iambic pentameter line renders it, according to Halle and Keyser, unmetrical. Experienced performers, however, seem to be able to perform such verse lines rhythmically, and tend to have recourse to similar vocal strategies. They are surprised to discover that they over- rather than under-emphasize the deviant stress, isolating the last four syllables as a perceptual unit, and generating a perceptual drive toward the last (tenth) position, where the two patterns have a coinciding downbeat, emphatically closing the verse line. After the sixth position cues for discontinuity are required to perceptually isolate the last four metric positions, but also cues for syntactic continuity (in mid-phrase). As to triple-encodedness, the same phonetic cues, e. g., overarticulated word-final voiceless plosives may indicate, at the same time, sentence ending, line ending and, e. g., a dominant, determined personality. As to convergent and divergent delivery styles, the distinction refers to the performer’s tendency to have recourse where possible to redundant or conflicting phonetic cues to effect a rhythmical performance, within the constraints of the conflicting linguistic and versification patterns of the text.


Pragmatics ◽  
2010 ◽  
Vol 20 (2) ◽  
pp. 229-277 ◽  
Author(s):  
Margret Selting

This paper reports on some recent work on affectivity, or emotive involvement, in conversational storytelling. After presenting the approach, some case studies of the display and management of affectivity in storytelling in telephone and face-to-face conversations are presented. The analysis reconstructs the display and handling of affectivity by both storyteller and story recipient. In particular, I describe the following kinds of resources: - the verbal and segmental display: Rhetorical, lexico-semantic, syntactic, phonetic-phonological resources; - the prosodic and suprasegmental vocal display: Resources from the realms of prosody and voice quality; - visual or "multimodal" resources from the realms of body posture and its changes, head movements, gaze, and hand movements and gestures. It is shown that the display of affectivity is organized in orderly ways in sequences of storytelling in conversation. I reconstruct (a) how verbal, vocal and visual cues are deployed in co-occurrence in order to make affectivity in general and specific affects in particular interpretable for the recipient and (b) how in turn the recipient responds and takes up the displayed affect. As a result, affectivity is shown to be managed by teller and recipient in storytelling sequences in conversation, involving both the reporting of affects from the story world as well as the negotiation of in-situ affects in the here-and-now of the storytelling situation.


2003 ◽  
Vol 9 (1) ◽  
pp. 2-11 ◽  
Author(s):  
Dexter Dunphy

ABSTRACTThis paper addresses the issue of corporate sustainability. It examines why achieving sustainability is becoming an increasingly vital issue for society and organisations, defines sustainability and then outlines a set of phases through which organisations can move to achieve increasing levels of sustainability. Case studies are presented of organisations at various phases indicating the benefits, for the organisation and its stakeholders, which can be made at each phase. Finally the paper argues that there is a marked contrast between the two competing philosophies of neo-conservatism (economic rationalism) and the emerging philosophy of sustainability. Management schools have been strongly influenced by economic rationalism, which underpins the traditional orthodoxies presented in such schools. Sustainability represents an urgent challenge for management schools to rethink these traditional orthodoxies and give sustainability a central place in the curriculum.


1978 ◽  
Vol 9 (4) ◽  
pp. 220-235
Author(s):  
David L. Ratusnik ◽  
Carol Melnick Ratusnik ◽  
Karen Sattinger

Short-form versions of the Screening Test of Spanish Grammar (Toronto, 1973) and the Northwestern Syntax Screening Test (Lee, 1971) were devised for use with bilingual Latino children while preserving the original normative data. Application of a multiple regression technique to data collected on 60 lower social status Latino children (four years and six months to seven years and one month) from Spanish Harlem and Yonkers, New York, yielded a small but powerful set of predictor items from the Spanish and English tests. Clinicians may make rapid and accurate predictions of STSG or NSST total screening scores from administration of substantially shortened versions of the instruments. Case studies of Latino children from Chicago and Miami serve to cross-validate the procedure outside the New York metropolitan area.


2020 ◽  
Vol 63 (4) ◽  
pp. 1071-1082
Author(s):  
Theresa Schölderle ◽  
Elisabet Haas ◽  
Wolfram Ziegler

Purpose The aim of this study was to collect auditory-perceptual data on established symptom categories of dysarthria from typically developing children between 3 and 9 years of age, for the purpose of creating age norms for dysarthria assessment. Method One hundred forty-four typically developing children (3;0–9;11 [years;months], 72 girls and 72 boys) participated. We used a computer-based game specifically designed for this study to elicit sentence repetitions and spontaneous speech samples. Speech recordings were analyzed using the auditory-perceptual criteria of the Bogenhausen Dysarthria Scales, a standardized German assessment tool for dysarthria in adults. The Bogenhausen Dysarthria Scales (scales and features) cover clinically relevant dimensions of speech and allow for an evaluation of well-established symptom categories of dysarthria. Results The typically developing children exhibited a number of speech characteristics overlapping with established symptom categories of dysarthria (e.g., breathy voice, frequent inspirations, reduced articulatory precision, decreased articulation rate). Substantial progress was observed between 3 and 9 years of age, but with different developmental trajectories across different dimensions. In several areas (e.g., respiration, voice quality), 9-year-olds still presented with salient developmental speech characteristics, while in other dimensions (e.g., prosodic modulation), features typically associated with dysarthria occurred only exceptionally, even in the 3-year-olds. Conclusions The acquisition of speech motor functions is a prolonged process not yet completed with 9 years. Various developmental influences (e.g., anatomic–physiological changes) shape children's speech specifically. Our findings are a first step toward establishing auditory-perceptual norms for dysarthria in children of kindergarten and elementary school age. Supplemental Material https://doi.org/10.23641/asha.12133380


2020 ◽  
Vol 63 (12) ◽  
pp. 3991-3999
Author(s):  
Benjamin van der Woerd ◽  
Min Wu ◽  
Vijay Parsa ◽  
Philip C. Doyle ◽  
Kevin Fung

Objectives This study aimed to evaluate the fidelity and accuracy of a smartphone microphone and recording environment on acoustic measurements of voice. Method A prospective cohort proof-of-concept study. Two sets of prerecorded samples (a) sustained vowels (/a/) and (b) Rainbow Passage sentence were played for recording via the internal iPhone microphone and the Blue Yeti USB microphone in two recording environments: a sound-treated booth and quiet office setting. Recordings were presented using a calibrated mannequin speaker with a fixed signal intensity (69 dBA), at a fixed distance (15 in.). Each set of recordings (iPhone—audio booth, Blue Yeti—audio booth, iPhone—office, and Blue Yeti—office), was time-windowed to ensure the same signal was evaluated for each condition. Acoustic measures of voice including fundamental frequency ( f o ), jitter, shimmer, harmonic-to-noise ratio (HNR), and cepstral peak prominence (CPP), were generated using a widely used analysis program (Praat Version 6.0.50). The data gathered were compared using a repeated measures analysis of variance. Two separate data sets were used. The set of vowel samples included both pathologic ( n = 10) and normal ( n = 10), male ( n = 5) and female ( n = 15) speakers. The set of sentence stimuli ranged in perceived voice quality from normal to severely disordered with an equal number of male ( n = 12) and female ( n = 12) speakers evaluated. Results The vowel analyses indicated that the jitter, shimmer, HNR, and CPP were significantly different based on microphone choice and shimmer, HNR, and CPP were significantly different based on the recording environment. Analysis of sentences revealed a statistically significant impact of recording environment and microphone type on HNR and CPP. While statistically significant, the differences across the experimental conditions for a subset of the acoustic measures (viz., jitter and CPP) have shown differences that fell within their respective normative ranges. Conclusions Both microphone and recording setting resulted in significant differences across several acoustic measurements. However, a subset of the acoustic measures that were statistically significant across the recording conditions showed small overall differences that are unlikely to have clinical significance in interpretation. For these acoustic measures, the present data suggest that, although a sound-treated setting is ideal for voice sample collection, a smartphone microphone can capture acceptable recordings for acoustic signal analysis.


2020 ◽  
Vol 63 (12) ◽  
pp. 3974-3981
Author(s):  
Ashwini Joshi ◽  
Isha Baheti ◽  
Vrushali Angadi

Aim The purpose of this study was to develop and assess the reliability of a Hindi version of the Consensus Auditory-Perceptual Evaluation of Voice (CAPE-V). Reliability was assessed by comparing Hindi CAPE-V ratings with English CAPE-V ratings and by the Grade, Roughness, Breathiness, Asthenia and Strain (GRBAS) scale. Method Hindi sentences were created to match the phonemic load of the corresponding English CAPE-V sentences. The Hindi sentences were adapted for linguistic content. The original English and adapted Hindi CAPE-V and GRBAS were completed for 33 bilingual individuals with normal voice quality. Additionally, the Hindi CAPE-V and GRBAS were completed for 13 Hindi speakers with disordered voice quality. The agreement of CAPE-V ratings was assessed between language versions, GRBAS ratings, and two rater pairs (three raters in total). Pearson product–moment correlation was completed for all comparisons. Results A strong correlation ( r > .8, p < .01) was found between the Hindi CAPE-V scores and the English CAPE-V scores for most variables in normal voice participants. A weak correlation was found for the variable of strain ( r < .2, p = .400) in the normative group. A strong correlation ( r > .6, p < .01) was found between the overall severity/grade, roughness, and breathiness scores in the GRBAS scale and the CAPE-V scale in normal and disordered voice samples. Significant interrater reliability ( r > .75) was present in overall severity and breathiness. Conclusions The Hindi version of the CAPE-V demonstrates good interrater reliability and concurrent validity with the English CAPE-V and the GRBAS. The Hindi CAPE-V can be used for the auditory-perceptual voice assessment of Hindi speakers.


Sign in / Sign up

Export Citation Format

Share Document