Phonetic fusion in Chinese conversational speech

Chinese Language & Discourse ◽

10.1075/cld.21004.tse ◽

2021 ◽

Author(s):

Shu-Chuan Tseng

Keyword(s):

Acoustic Signals ◽

Acoustic Properties ◽

Perceptual Judgment ◽

Conversational Speech ◽

Speech Corpus ◽

Phonological Features ◽

Disyllabic Words ◽

Complete Form

Abstract This paper presents a corpus-based perspective on the phonetic fusion of disyllabic words in a Chinese conversational speech corpus. Four categorical types that reflect the phonological features of reduction degrees are automatically derived from gradient, acoustic properties. A transcription experiment is conducted with the most common disyllabic words. Both automatic derivation by acoustic signals and human transcription by perceptual judgment refer to the same sound inventory. We have shown that the complete form of fusion occurring in conversation need not be legitimate syllables and it appears consistently in the form of syllable merger that represents a group of phonetic variants.

Download Full-text

JSpeech: A Multi-Lingual Conversational Speech Corpus

2018 IEEE Spoken Language Technology Workshop (SLT) ◽

10.1109/slt.2018.8639658 ◽

2018 ◽

Author(s):

Ali Janalizadeh Choobbasti ◽

Mohammad Erfan Gholamian ◽

Amir Vaheb ◽

Saeid Safavi

Keyword(s):

Conversational Speech ◽

Speech Corpus

Download Full-text

Do sound symbolism effects for written words relate to individual phonemes or to phoneme features?

Language and Cognition ◽

10.1017/langcog.2019.20 ◽

2019 ◽

Vol 11 (02) ◽

pp. 235-255 ◽

Cited By ~ 1

Author(s):

PADRAIC MONAGHAN ◽

MATTHEW FLETCHER

Keyword(s):

Sound Symbolism ◽

Acoustic Properties ◽

Morphological Properties ◽

Phonological Features ◽

Written Form ◽

Open Question ◽

Semantic Attributes ◽

Better Than

abstractThe sound of words has been shown to relate to the meaning that the words denote, an effect that extends beyond morphological properties of the word. Studies of these sound-symbolic relations have described this iconicity in terms of individual phonemes, or alternatively due to acoustic properties (expressed in phonological features) relating to meaning. In this study, we investigated whether individual phonemes or phoneme features best accounted for iconicity effects. We tested 92 participants’ judgements about the appropriateness of 320 nonwords presented in written form, relating to 8 different semantic attributes. For all 8 attributes, individual phonemes fitted participants’ responses better than general phoneme features. These results challenge claims that sound-symbolic effects for visually presented words can access broad, cross-modal associations between sound and meaning, instead the results indicate the operation of individual phoneme to meaning relations. Whether similar effects are found for nonwords presented auditorially remains an open question.

Download Full-text

Magnetic Brain Response Mirrors Extraction of Phonological Features from Spoken Vowels

Journal of Cognitive Neuroscience ◽

10.1162/089892904322755539 ◽

2004 ◽

Vol 16 (1) ◽

pp. 31-39 ◽

Cited By ~ 89

Author(s):

Jonas Obleser ◽

Aditi Lahiri ◽

Carsten Eulitz

Keyword(s):

Time Course ◽

Inverse Correlation ◽

Source Location ◽

Cortical Activation ◽

Acoustic Properties ◽

Brain Response ◽

Vowel Perception ◽

Speech Stimuli ◽

Place Of Articulation ◽

Phonological Features

This study further elucidates determinants of vowel perception in the human auditory cortex. The vowel inventory of a given language can be classified on the basis of phonological features which are closely linked to acoustic properties. A cortical representation of speech sounds based on these phonological features might explain the surprisingly inverse correlation between immense variance in the acoustic signal and high accuracy of speech recognition. We investigated timing and mapping of the N100m elicited by 42 tokens of seven natural German vowels varying along the phonological features tongue height (corresponding to the frequency of the first formant) and place of articulation (corresponding to the frequency of the second and third formants). Auditoryevoked fields were recorded using a 148-channel whole-head magnetometer while subjects performed target vowel detection tasks. Source location differences appeared to be driven by place of articulation: Vowels with mutually exclusive place of articulation features, namely, coronal and dorsal elicited separate centers of activation along the posterior-anterior axis. Additionally, the time course of activation as reflected in the N100m peak latency distinguished between vowel categories especially when the spatial distinctiveness of cortical activation was low. In sum, results suggest that both N100m latency and source location as well as their interaction reflect properties of speech stimuli that correspond to abstract phonological features.

Download Full-text

Deriving disyllabic word variants from a Chinese conversational speech corpus

The Journal of the Acoustical Society of America ◽

10.1121/1.4954745 ◽

2016 ◽

Vol 140 (1) ◽

pp. 308-321 ◽

Cited By ~ 3

Author(s):

Yi-Fen Liu ◽

Shu-Chuan Tseng ◽

Jyh-Shing Roger Jang

Keyword(s):

Conversational Speech ◽

Speech Corpus ◽

Disyllabic Word

Download Full-text

Collection and annotation of Malay conversational speech corpus

2012 International Conference on Speech Database and Assessments ◽

10.1109/icsda.2012.6422473 ◽

2012 ◽

Cited By ~ 1

Author(s):

Tze Yuang Chong ◽

Xiong Xiao ◽

Tien-Ping Tan ◽

Eng Siong Chng ◽

Haizhou Li

Keyword(s):

Conversational Speech ◽

Speech Corpus

Download Full-text

Exploring systematicity between phonological and context-cooccurrence representations of the mental lexicon

The Mental Lexicon ◽

10.1075/ml.3.2.05tam ◽

2008 ◽

Vol 3 (2) ◽

pp. 259-278 ◽

Cited By ~ 8

Author(s):

Monica Tamariz

Keyword(s):

Mental Lexicon ◽

Stress Pattern ◽

Word Form ◽

Speech Corpus ◽

Frequent Form ◽

Phonological Features ◽

The Impact

This paper investigates the existence of systematicity between two similarity-based representations of the lexicon, one focusing on word-form and another one based on cooccurrence statistics in speech, which captures aspects of syntax and semantics. An analysis of the three most frequent form-homogeneous word groups in a Spanish speech corpus (cvcv, cvccv and cvcvcv words) supports the existence of systematicity: words that sound similar tend to occur in the same lexical contexts in speech. A lexicon that is highly systematic in this respect, however, may lead to confusion between similar-sounding words that appear in similar contexts. Exploring the impact of different phonological features on systematicity reveals that while some features (such as sharing consonants or the stress pattern) seem to underlie the measured systematicity, others (particularly, sharing the stressed vowel) oppose it, perhaps to help discriminate between words that systematicity may render ambiguous.

Download Full-text

Chinese disyllabic words in conversation

Chinese Language & Discourse ◽

10.1075/cld.5.2.05tse ◽

2014 ◽

Vol 5 (2) ◽

pp. 231-251 ◽

Cited By ~ 1

Author(s):

Shu-Chuan Tseng

Keyword(s):

Conversational Speech ◽

Related Factors ◽

Face To Face ◽

Speech Data ◽

Syllable Onset ◽

First Onset ◽

Disyllabic Words ◽

Quantitative Results ◽

The Common ◽

Segment Duration

This paper presents a study of segment duration in Chinese disyllabic words. The study accounts for boundary-related factors at levels of syllable, word, prosodic unit, and discourse unit. Face-to-face conversational speech data annotated with signal-aligned, multi-layer linguistic information was used for the analysis. A series of quantitative results show that Chinese disyllabic words have a long first syllable onset and a long second syllable rhyme, suggesting an edge effect of disyllabic words. This is in line with disyllabic merger in Chinese that preserves the onset of the first syllable and the rhyme of the second syllable. A shortening effect at prosodic and discourse unit initiation locations is due to a duration reduction of the second syllable onset, whereas the common phenomenon of pre-boundary lengthening is mainly a result of the second syllable rhyme prolongation including the glide, nucleus, and coda. Morphologically inseparable disyllabic words in principle follow the “long first onset and long second rhyme” duration pattern. But diverse duration patterns were found in words with a head-complement and a stem-suffix construction, suggesting that word morphology may also play a role in determining the duration pattern of Chinese disyllabic words in conversational speech.

Download Full-text

Acoustic development of vowel production in native Mandarin-speaking children

Journal of the International Phonetic Association ◽

10.1017/s0025100317000196 ◽

2017 ◽

Vol 49 (1) ◽

pp. 33-51

Author(s):

Jing Yang ◽

Robert A. Fox

Keyword(s):

Movement Patterns ◽

Acoustic Properties ◽

Vowel Production ◽

Positional Variation ◽

Scatter Plots ◽

Disyllabic Words ◽

Acoustic Space ◽

Vowel Categories ◽

Mandarin Vowels

The present study aims to document the developmental profile of static and dynamic acoustic features of vowel productions in monolingual Mandarin-speaking children aged between three and six years in comparison to adults. Twenty-nine monolingual Mandarin children and 12 native Mandarin adults were recorded producing ten Mandarin disyllabic words containing five monophthongal vowel phonemes /a i u yɤ/. F1 and F2 values were measured at five equidistant temporal locations (the 20–35–50–65–80% points of the vowel's duration) and normalized. Scatter plots showed clear separations between vowel categories although the size of individual vowel categories exhibited a decreasing trend as the age increased. This indicates that speakers as young as three years old could separate these five Mandarin vowels in the acoustic space but they were still refining the acoustic properties of their vowel production as they matured. Although the tested vowels were monophthongs, they were still characterized by distinctive formant movement patterns. Mandarin children generally demonstrated formant movement patterns comparable to those of adult speakers. However, children still showed positional variation and differed from adults in the magnitudes of spectral change for certain vowels. This indicates that vowel development is a long-term process which extends beyond three years of age.

Download Full-text

Electroencephalographic Correlate of Mexican Spanish Emotional Speech Processing in Autism Spectrum Disorder: To a Social Story and Robot-Based Intervention

Frontiers in Human Neuroscience ◽

10.3389/fnhum.2021.626146 ◽

2021 ◽

Vol 15 ◽

Author(s):

Mathilde Marie Duville ◽

Luz Maria Alonso-Valerdi ◽

David I. Ibarra-Zarate

Keyword(s):

Speech Processing ◽

Autism Spectrum ◽

Acoustic Properties ◽

Support Vector ◽

Mexican Spanish ◽

Acoustic Features ◽

Autistic Children ◽

Speech Corpus ◽

Social Story ◽

Spectrum Disorders

Socio-emotional impairments are key symptoms of Autism Spectrum Disorders. This work proposes to analyze the neuronal activity related to the discrimination of emotional prosodies in autistic children (aged 9 to 11-year-old) as follows. Firstly, a database for single words uttered in Mexican Spanish by males, females, and children will be created. Then, optimal acoustic features for emotion characterization will be extracted, followed of a cubic kernel function Support Vector Machine (SVM) in order to validate the speech corpus. As a result, human-specific acoustic properties of emotional voice signals will be identified. Secondly, those identified acoustic properties will be modified to synthesize the recorded human emotional voices. Thirdly, both human and synthesized utterances will be used to study the electroencephalographic correlate of affective prosody processing in typically developed and autistic children. Finally, and on the basis of the outcomes, synthesized voice-enhanced environments will be created to develop an intervention based on social-robot and Social StoryTM for autistic children to improve affective prosodies discrimination. This protocol has been registered at BioMed Central under the following number: ISRCTN18117434.

Download Full-text

Sound Privacy: A Conversational Speech Corpus for Quantifying the Experience of Privacy

10.21437/interspeech.2019-1172 ◽

2019 ◽

Cited By ~ 2

Author(s):

Pablo Pérez Zarazaga ◽

Sneha Das ◽

Tom Bäckström ◽

V. Vidyadhara Raju V. ◽

Anil Kumar Vuppala

Keyword(s):

Conversational Speech ◽

Speech Corpus

Download Full-text