vowel formants
Recently Published Documents


TOTAL DOCUMENTS

91
(FIVE YEARS 18)

H-INDEX

14
(FIVE YEARS 1)

Author(s):  
Vladimir Kulikov ◽  
Fatemeh M. Mohsenzadeh ◽  
Rawand M. Syam

Emphasis (contrastive pharyngealization of coronals) in Arabic spreads from an emphatic consonant to neighboring segments. Previous research suggests that in addition to changing spectral characteristics of adjacent segments, emphasis might affect voice onset time (VOT) of voiceless stops because emphatic stops in Arabic dialects have considerably shorter VOT than their plain cognates. No study investigated whether emphatic co-articulation could shorten VOT in plain stops produced in emphatic environment. The present study investigates changes in VOT in syllable-initial /t/ using production data from sixteen speakers of Qatari Arabic, who read non-word syllables with initial plain and emphatic stops /t/ and /ṭ/ adjacent to another plain or emphatic consonant. The results show that emphasis spread is a gradient process that affects only spectral characteristics of segments, causing changes in vowel formants and spectral centre of gravity of stops. Long-lag VOT in plain /t/, however, was not shortened in emphatic syllables. The findings suggest that shorter VOT in voiceless emphatic stops in Qatari Arabic is not a mechanical aftermath of pharyngealization but, rather, a phonological requirement to maintain contrast between long-lag and short-lag VOT in plain and emphatic stops.


2021 ◽  
Vol 4 ◽  
Author(s):  
Rolando Coto-Solano ◽  
James N. Stanford ◽  
Sravana K. Reddy

In recent decades, computational approaches to sociophonetic vowel analysis have been steadily increasing, and sociolinguists now frequently use semi-automated systems for phonetic alignment and vowel formant extraction, including FAVE (Forced Alignment and Vowel Extraction, Rosenfelder et al., 2011; Evanini et al., Proceedings of Interspeech, 2009), Penn Aligner (Yuan and Liberman, J. Acoust. Soc. America, 2008, 123, 3878), and DARLA (Dartmouth Linguistic Automation), (Reddy and Stanford, DARLA Dartmouth Linguistic Automation: Online Tools for Linguistic Research, 2015a). Yet these systems still have a major bottleneck: manual transcription. For most modern sociolinguistic vowel alignment and formant extraction, researchers must first create manual transcriptions. This human step is painstaking, time-consuming, and resource intensive. If this manual step could be replaced with completely automated methods, sociolinguists could potentially tap into vast datasets that have previously been unexplored, including legacy recordings that are underutilized due to lack of transcriptions. Moreover, if sociolinguists could quickly and accurately extract phonetic information from the millions of hours of new audio content posted on the Internet every day, a virtual ocean of speech from newly created podcasts, videos, live-streams, and other audio content would now inform research. How close are the current technological tools to achieving such groundbreaking changes for sociolinguistics? Prior work (Reddy et al., Proceedings of the North American Association for Computational Linguistics 2015 Conference, 2015b, 71–75) showed that an HMM-based Automated Speech Recognition system, trained with CMU Sphinx (Lamere et al., 2003), was accurate enough for DARLA to uncover evidence of the US Southern Vowel Shift without any human transcription. Even so, because that automatic speech recognition (ASR) system relied on a small training set, it produced numerous transcription errors. Six years have passed since that study, and since that time numerous end-to-end automatic speech recognition (ASR) algorithms have shown considerable improvement in transcription quality. One example of such a system is the RNN/CTC-based DeepSpeech from Mozilla (Hannun et al., 2014). (RNN stands for recurrent neural networks, the learning mechanism for DeepSpeech. CTC stands for connectionist temporal classification, the mechanism to merge phones into words). The present paper combines DeepSpeech with DARLA to push the technological envelope and determine how well contemporary ASR systems can perform in completely automated vowel analyses with sociolinguistic goals. Specifically, we used these techniques on audio recordings from 352 North American English speakers in the International Dialects of English Archive (IDEA1), extracting 88,500 tokens of vowels in stressed position from spontaneous, free speech passages. With this large dataset we conducted acoustic sociophonetic analyses of the Southern Vowel Shift and the Northern Cities Chain Shift in the North American IDEA speakers. We compared the results using three different sources of transcriptions: 1) IDEA’s manual transcriptions as the baseline “ground truth”, 2) the ASR built on CMU Sphinx used by Reddy et al. (Proceedings of the North American Association for Computational Linguistics 2015 Conference, 2015b, 71–75), and 3) the latest publicly available Mozilla DeepSpeech system. We input these three different transcriptions to DARLA, which automatically aligned and extracted the vowel formants from the 352 IDEA speakers. Our quantitative results show that newer ASR systems like DeepSpeech show considerable promise for sociolinguistic applications like DARLA. We found that DeepSpeech’s automated transcriptions had significantly fewer character error rates than those from the prior Sphinx system (from 46 to 35%). When we performed the sociolinguistic analysis of the extracted vowel formants from DARLA, we found that the automated transcriptions from DeepSpeech matched the results from the ground truth for the Southern Vowel Shift (SVS): five vowels showed a shift in both transcriptions, and two vowels didn’t show a shift in either transcription. The Northern Cities Shift (NCS) was more difficult to detect, but ground truth and DeepSpeech matched for four vowels: One of the vowels showed a clear shift, and three showed no shift in either transcription. Our study therefore shows how technology has made progress toward greater automation in vowel sociophonetics, while also showing what remains to be done. Our statistical modeling provides a quantified view of both the abilities and the limitations of a completely “hands-free” analysis of vowel shifts in a large dataset. Naturally, when comparing a completely automated system against a semi-automated system involving human manual work, there will always be a tradeoff between accuracy on the one hand versus speed and replicability on the other hand [Kendall and Joseph, Towards best practices in sociophonetics (with Marianna DiPaolo), 2014]. The amount of “noise” that can be tolerated for a given study will depend on the particular research goals and researchers’ preferences. Nonetheless, our study shows that, for certain large-scale applications and research goals, a completely automated approach using publicly available ASR can produce meaningful sociolinguistic results across large datasets, and these results can be generated quickly, efficiently, and with full replicability.


2021 ◽  
Author(s):  
Lana Hantzsch ◽  
Benjamin Parrell ◽  
Caroline A. Niziolek

Sensory errors caused by perturbations to movement-related feedback induce two types of behavioral changes that oppose the perturbation: rapid compensation within a movement, as well as longer-term adaptation of subsequent movements. Although adaptation is hypothesized to occur whenever a sensory error is perceived (including after a single exposure to altered feedback), adaptation of articulatory movements in speech has only been observed after repetitive exposure to auditory perturbations, questioning both current theories of speech sensorimotor adaptation as well as the universality of more general theories of adaptation. Thus, positive evidence for the hypothesized single-exposure or 'one-shot' learning would provide critical support for current theories of speech sensorimotor learning and control and align adaptation in speech more closely with other motor domains. We measured one-shot learning in a large dataset in which participants were exposed to intermittent, unpredictable auditory perturbations to their vowel formants (the resonant frequencies of the vocal tract that distinguish between different vowels). On each trial, participants spoke a word out loud while their first formant was shifted up, shifted down, or remained unshifted. We examined whether the perturbation on a given trial affected speech on the subsequent, unperturbed trial. We found that participants adjusted their first formant in the opposite direction of the preceding shift, demonstrating that learning occurs even after a single auditory perturbation as predicted by current theories of sensorimotor adaptation. While adaptation and the preceding compensation responses were correlated, this was largely due to differences across individuals rather than within-participant variation from trial to trial. These findings are more consistent with theories that hypothesize adaptation is driven directly by updates to internal control models than those that suggest adaptation results from incorporation of feedback responses from previous productions.


2020 ◽  
pp. 003151252097351
Author(s):  
Erwan Pépiot ◽  
Aron Arnold

The present study concerns speech productions of female and male English/French bilingual speakers in both reading and semi-spontaneous speech tasks. We investigated various acoustic parameters: average fundamental sound frequency (F0), F0 range, F0 variance ( SD), vowel formants (F1, F2, and F3), voice onset time (VOT) and H1-H2 (intensity difference between the first and the second harmonic frequencies, used to measure phonation type) in both languages. Our results revealed a significant effect of gender and language on all parameters. Overall, average F0 was higher in French while F0 modulation was stronger in English. Regardless of language, female speakers exhibited higher F0 than male speakers. Moreover, the higher average F0 in French was larger in female speakers. On the other hand, the smaller F0 modulation in French was stronger in male speakers. The analysis of vowel formants showed that overall, female speakers exhibited higher values than males. However, we found a significant cross-gender difference on F2 of the back vowel [u:] in English, but not on the vowel [u] in French. VOT of voiceless stops was longer in Female speakers in both languages, with a greater difference in English. VOT contrast between voiceless stops and their voiced counterparts was also significantly longer in female speakers in both languages. The scope of this cross-gender difference was greater in English. H1-H2 was higher in female speakers in both languages, indicating a breathier phonation type. Furthermore, female speakers tended to exhibit smaller H1-H2 in French, while the opposite was true in males. This resulted in a smaller cross-gender difference in French for this parameter. All these data support the idea of language- and gender-specific vocal norms, to which bilingual speakers seem to adapt. This constitutes a further argument to give social factors, such as gender dynamics, more consideration in phonetic studies.


2020 ◽  
Vol 148 (4) ◽  
pp. 2714-2715
Author(s):  
Valerie Freeman ◽  
Paul De Decker ◽  
Molly Landers
Keyword(s):  

2020 ◽  
Vol 148 (4) ◽  
pp. 2474-2474
Author(s):  
Daniel Aalto ◽  
Mike Fenner ◽  
Meagan Haarstad ◽  
Amberley Ostevik ◽  
Bill Hodgetts ◽  
...  
Keyword(s):  

2020 ◽  
Vol 29 (3) ◽  
pp. 1749-1778
Author(s):  
Ray D. Kent ◽  
Carrie Rountrey

Purpose Literature was reviewed on the development of vowels in children's speech and on vowel disorders in children and adults, with an emphasis on studies using acoustic methods. Method Searches were conducted with PubMed/MEDLINE, Google Scholar, CINAHL, HighWire Press, and legacy sources in retrieved articles. The primary search items included, but were not limited to, vowels, vowel development, vowel disorders, vowel formants, vowel therapy, vowel inherent spectral change, speech rhythm, and prosody. Results/Discussion The main conclusions reached in this review are that vowels are (a) important to speech intelligibility; (b) intrinsically dynamic; (c) refined in both perceptual and productive aspects beyond the age typically given for their phonetic mastery; (d) produced to compensate for articulatory and auditory perturbations; (e) influenced by language and dialect even in early childhood; (f) affected by a variety of speech, language, and hearing disorders in children and adults; (g) inadequately assessed by standardized articulation tests; and (h) characterized by at least three factors—articulatory configuration, extrinsic and intrinsic regulation of duration, and role in speech rhythm and prosody. Also discussed are stages in typical vowel ontogeny, acoustic characterization of rhotic vowels, a sensory-motor perspective on vowel production, and implications for clinical assessment of vowels.


2020 ◽  
Author(s):  
Robin Karlin ◽  
Benjamin Parrell ◽  
Chris Naber

Real-time altered auditory feedback has demonstrated a key role for auditory feedback in both online feedback control and in updating feedforward control for future utterances. Much of this research has examined control in the spectral domain, and has found that speakers compensate for perturbations to vowel formants, intensity, and fricative center of gravity. The aim of the current study is to examine adaptation in response to temporal perturbation, using real-time perturbation of ongoing speech. Word-initial consonant targets (VOT for /k, g/ and fricative duration for /s, z/) were lengthened and the following stressed vowel (/æ/) was shortened. Overall, speakers did not adapt to lengthened consonants, but did lengthen vowels by nearly 100\% of the perturbation magnitude in response to shortening. Vowel lengthening showed continued aftereffects during a washout phase when perturbation was abruptly removed. Although speakers did not actively adapt consonant durations, the adaptation in vowel duration leads to the consonant taking up an overall smaller proportion of the syllable, aligning with previous research that suggests that speakers attend to proportional durations rather than absolute durations. These results indicate that speakers actively monitor duration and update upcoming plans accordingly.


Sign in / Sign up

Export Citation Format

Share Document