Phonetic fusion in Chinese conversational speech

Author(s):  
Shu-Chuan Tseng

Abstract This paper presents a corpus-based perspective on the phonetic fusion of disyllabic words in a Chinese conversational speech corpus. Four categorical types that reflect the phonological features of reduction degrees are automatically derived from gradient, acoustic properties. A transcription experiment is conducted with the most common disyllabic words. Both automatic derivation by acoustic signals and human transcription by perceptual judgment refer to the same sound inventory. We have shown that the complete form of fusion occurring in conversation need not be legitimate syllables and it appears consistently in the form of syllable merger that represents a group of phonetic variants.

Author(s):  
Ali Janalizadeh Choobbasti ◽  
Mohammad Erfan Gholamian ◽  
Amir Vaheb ◽  
Saeid Safavi

2019 ◽  
Vol 11 (02) ◽  
pp. 235-255 ◽  
Author(s):  
PADRAIC MONAGHAN ◽  
MATTHEW FLETCHER

abstractThe sound of words has been shown to relate to the meaning that the words denote, an effect that extends beyond morphological properties of the word. Studies of these sound-symbolic relations have described this iconicity in terms of individual phonemes, or alternatively due to acoustic properties (expressed in phonological features) relating to meaning. In this study, we investigated whether individual phonemes or phoneme features best accounted for iconicity effects. We tested 92 participants’ judgements about the appropriateness of 320 nonwords presented in written form, relating to 8 different semantic attributes. For all 8 attributes, individual phonemes fitted participants’ responses better than general phoneme features. These results challenge claims that sound-symbolic effects for visually presented words can access broad, cross-modal associations between sound and meaning, instead the results indicate the operation of individual phoneme to meaning relations. Whether similar effects are found for nonwords presented auditorially remains an open question.


2004 ◽  
Vol 16 (1) ◽  
pp. 31-39 ◽  
Author(s):  
Jonas Obleser ◽  
Aditi Lahiri ◽  
Carsten Eulitz

This study further elucidates determinants of vowel perception in the human auditory cortex. The vowel inventory of a given language can be classified on the basis of phonological features which are closely linked to acoustic properties. A cortical representation of speech sounds based on these phonological features might explain the surprisingly inverse correlation between immense variance in the acoustic signal and high accuracy of speech recognition. We investigated timing and mapping of the N100m elicited by 42 tokens of seven natural German vowels varying along the phonological features tongue height (corresponding to the frequency of the first formant) and place of articulation (corresponding to the frequency of the second and third formants). Auditoryevoked fields were recorded using a 148-channel whole-head magnetometer while subjects performed target vowel detection tasks. Source location differences appeared to be driven by place of articulation: Vowels with mutually exclusive place of articulation features, namely, coronal and dorsal elicited separate centers of activation along the posterior-anterior axis. Additionally, the time course of activation as reflected in the N100m peak latency distinguished between vowel categories especially when the spatial distinctiveness of cortical activation was low. In sum, results suggest that both N100m latency and source location as well as their interaction reflect properties of speech stimuli that correspond to abstract phonological features.


2016 ◽  
Vol 140 (1) ◽  
pp. 308-321 ◽  
Author(s):  
Yi-Fen Liu ◽  
Shu-Chuan Tseng ◽  
Jyh-Shing Roger Jang

Author(s):  
Tze Yuang Chong ◽  
Xiong Xiao ◽  
Tien-Ping Tan ◽  
Eng Siong Chng ◽  
Haizhou Li

2008 ◽  
Vol 3 (2) ◽  
pp. 259-278 ◽  
Author(s):  
Monica Tamariz

This paper investigates the existence of systematicity between two similarity-based representations of the lexicon, one focusing on word-form and another one based on cooccurrence statistics in speech, which captures aspects of syntax and semantics. An analysis of the three most frequent form-homogeneous word groups in a Spanish speech corpus (cvcv, cvccv and cvcvcv words) supports the existence of systematicity: words that sound similar tend to occur in the same lexical contexts in speech. A lexicon that is highly systematic in this respect, however, may lead to confusion between similar-sounding words that appear in similar contexts. Exploring the impact of different phonological features on systematicity reveals that while some features (such as sharing consonants or the stress pattern) seem to underlie the measured systematicity, others (particularly, sharing the stressed vowel) oppose it, perhaps to help discriminate between words that systematicity may render ambiguous.


2014 ◽  
Vol 5 (2) ◽  
pp. 231-251 ◽  
Author(s):  
Shu-Chuan Tseng

This paper presents a study of segment duration in Chinese disyllabic words. The study accounts for boundary-related factors at levels of syllable, word, prosodic unit, and discourse unit. Face-to-face conversational speech data annotated with signal-aligned, multi-layer linguistic information was used for the analysis. A series of quantitative results show that Chinese disyllabic words have a long first syllable onset and a long second syllable rhyme, suggesting an edge effect of disyllabic words. This is in line with disyllabic merger in Chinese that preserves the onset of the first syllable and the rhyme of the second syllable. A shortening effect at prosodic and discourse unit initiation locations is due to a duration reduction of the second syllable onset, whereas the common phenomenon of pre-boundary lengthening is mainly a result of the second syllable rhyme prolongation including the glide, nucleus, and coda. Morphologically inseparable disyllabic words in principle follow the “long first onset and long second rhyme” duration pattern. But diverse duration patterns were found in words with a head-complement and a stem-suffix construction, suggesting that word morphology may also play a role in determining the duration pattern of Chinese disyllabic words in conversational speech.


2017 ◽  
Vol 49 (1) ◽  
pp. 33-51
Author(s):  
Jing Yang ◽  
Robert A. Fox

The present study aims to document the developmental profile of static and dynamic acoustic features of vowel productions in monolingual Mandarin-speaking children aged between three and six years in comparison to adults. Twenty-nine monolingual Mandarin children and 12 native Mandarin adults were recorded producing ten Mandarin disyllabic words containing five monophthongal vowel phonemes /a i u yɤ/. F1 and F2 values were measured at five equidistant temporal locations (the 20–35–50–65–80% points of the vowel's duration) and normalized. Scatter plots showed clear separations between vowel categories although the size of individual vowel categories exhibited a decreasing trend as the age increased. This indicates that speakers as young as three years old could separate these five Mandarin vowels in the acoustic space but they were still refining the acoustic properties of their vowel production as they matured. Although the tested vowels were monophthongs, they were still characterized by distinctive formant movement patterns. Mandarin children generally demonstrated formant movement patterns comparable to those of adult speakers. However, children still showed positional variation and differed from adults in the magnitudes of spectral change for certain vowels. This indicates that vowel development is a long-term process which extends beyond three years of age.


2021 ◽  
Vol 15 ◽  
Author(s):  
Mathilde Marie Duville ◽  
Luz Maria Alonso-Valerdi ◽  
David I. Ibarra-Zarate

Socio-emotional impairments are key symptoms of Autism Spectrum Disorders. This work proposes to analyze the neuronal activity related to the discrimination of emotional prosodies in autistic children (aged 9 to 11-year-old) as follows. Firstly, a database for single words uttered in Mexican Spanish by males, females, and children will be created. Then, optimal acoustic features for emotion characterization will be extracted, followed of a cubic kernel function Support Vector Machine (SVM) in order to validate the speech corpus. As a result, human-specific acoustic properties of emotional voice signals will be identified. Secondly, those identified acoustic properties will be modified to synthesize the recorded human emotional voices. Thirdly, both human and synthesized utterances will be used to study the electroencephalographic correlate of affective prosody processing in typically developed and autistic children. Finally, and on the basis of the outcomes, synthesized voice-enhanced environments will be created to develop an intervention based on social-robot and Social StoryTM for autistic children to improve affective prosodies discrimination. This protocol has been registered at BioMed Central under the following number: ISRCTN18117434.


Author(s):  
Pablo Pérez Zarazaga ◽  
Sneha Das ◽  
Tom Bäckström ◽  
V. Vidyadhara Raju V. ◽  
Anil Kumar Vuppala

Sign in / Sign up

Export Citation Format

Share Document