Gated Bilinear Networks for Vowel Formant Estimation

Author(s):  
Wang Dai ◽  
Zheng Hua ◽  
Jinsong Zhang ◽  
Yanlu Xie ◽  
Binghuai Lin
Keyword(s):  
2020 ◽  
Vol 6 (s1) ◽  
Author(s):  
Tyler Kendall ◽  
Charlotte Vaughn

AbstractThis paper contributes insight into the sources of variability in vowel formant estimation, a major analytic activity in sociophonetics, by reviewing the outcomes of two simulations that manipulated the settings used for linear predictive coding (LPC)-based vowel formant estimation. Simulation 1 explores the range of frequency differences obtained when minor adjustments are made to LPC settings, and measurement timepoints around the settings used by trained analysts, in order to determine the range of variability that should be expected in sociophonetic vowel studies. Simulation 2 examines the variability that emerges when LPC settings are varied combinatorially around constant default settings, rather than settings set by trained analysts. The impacts of different LPC settings are discussed as a way of demonstrating the inherent properties of LPC-based formant estimation. This work suggests that differences more fine-grained than about 10 Hz in F1 and 15–20 Hz in F2 are within the range of LPC-based formant estimation variability.


1985 ◽  
Vol 26 (3) ◽  
pp. 245-253
Author(s):  
Hiroshi Watanabe ◽  
Takemoto Shin ◽  
Koichi Matsuo ◽  
Junichi Fukaura ◽  
Mariko Tomita
Keyword(s):  

1979 ◽  
Vol 22 (3) ◽  
pp. 627-648 ◽  
Author(s):  
Ray D. Kent ◽  
Ronald Netsell ◽  
James H. Abbs

The speech of five individuals with cerebellar disease and ataxic dysarthria was studied with acoustic analyses of CVC words, words of varying syllabic structure (stem, stem plus suffix, stem plus two suffixes), simple sentences, the Rainbow Passage, and conversation. The most consistent and marked abnormalities observed in spectrograms were alterations of the normal timing pattern, with prolongation of a variety of segments and a tendency toward equalized syllable durations. Vowel formant structure in the CVC words was judged to be essentially normal except for transitional segments. The greater the severity of the dysarthria, the greater the number of segments lengthened and the degree of lengthening of individual segments. The ataxic subjects were inconsistent in durational adjustments of the stem syllable as the number of syllables in a word was varied and generally made smaller reductions than normal subjects as suffixes were added. Disturbances of syllable timing frequently were accompanied by abnormal contours of fundamental frequency, particularly monotone and syllable-falling patterns. These dysprosodic aspects of ataxic dysarthria are discussed in relation to cerebellar function in motor control.


Author(s):  
Yeptain Leung ◽  
Jennifer Oates ◽  
Siew-Pang Chan ◽  
Viktória Papp

Purpose The aim of the study was to examine associations between speaking fundamental frequency ( f os ), vowel formant frequencies ( F ), listener perceptions of speaker gender, and vocal femininity–masculinity. Method An exploratory study was undertaken to examine associations between f os , F 1 – F 3 , listener perceptions of speaker gender (nominal scale), and vocal femininity–masculinity (visual analog scale). For 379 speakers of Australian English aged 18–60 years, f os mode and F 1 – F 3 (12 monophthongs; total of 36 F s) were analyzed on a standard reading passage. Seventeen listeners rated speaker gender and vocal femininity–masculinity on randomized audio recordings of these speakers. Results Model building using principal component analysis suggested the 36 F s could be succinctly reduced to seven principal components (PCs). Generalized structural equation modeling (with the seven PCs of F and f os as predictors) suggested that only F 2 and f os predicted listener perceptions of speaker gender (male, female, unable to decide). However, listener perceptions of vocal femininity–masculinity behaved differently and were predicted by F 1 , F 3 , and the contrast between monophthongs at the extremities of the F 1 acoustic vowel space, in addition to F 2 and f os . Furthermore, listeners' perceptions of speaker gender also influenced ratings of vocal femininity–masculinity substantially. Conclusion Adjusted odds ratios highlighted the substantially larger contribution of F to listener perceptions of speaker gender and vocal femininity–masculinity relative to f os than has previously been reported.


Sign in / Sign up

Export Citation Format

Share Document