Separating the Novel Speech Sound Perception of Lexical Tone Chimeras From Their Auditory Signal Manipulations: Behavioral and Electroencephalographic Evidence

Previous research has shown the novelty of lexical-tone chimeras (artificially constructed speech sounds created by combining normal speech sounds of a given language) to native speakers of the language from which the chimera components were drawn. However, the source of such novelty remains unclear. Our goal in this study was to separate the effects of chimeric tonal novelty in Mandarin speech from the effects of auditory signal manipulations. We recruited 20 native speakers of Mandarin and constructed two sets of lexical-tone chimeras by interchanging the envelopes and fine structures of both a falling/yi4/and a rising/yi2/Mandarin tone through 1, 2, 3, 4, 6, 8, 16, 32, and 64 auditory filter banks. We conducted pitch-perception ability tasks via a two-alternative, forced-choice paradigm to produce behavioral (versus physiological) pitch perception data. We also obtained electroencephalographic measurements through the scalp-recorded frequency-following response (FFR). Analyses of variances and post hoc Greenhouse-Geisser procedures revealed that the differences observed in the participants’ reaction times and FFR measurements were attributable primarily to chimeric novelty rather than signal manipulation effects. These findings can be useful in assessing neuroplasticity and developing speech-processing strategies.

Download Full-text

Speech perception in individuals with auditory dys-synchrony

The Journal of Laryngology & Otology ◽

10.1017/s0022215110001854 ◽

2010 ◽

Vol 125 (3) ◽

pp. 236-245 ◽

Cited By ~ 5

Author(s):

U A Kumar ◽

M Jayaram

Keyword(s):

Speech Processing ◽

Speech Sound ◽

Just Noticeable Difference ◽

Auditory Information ◽

Speech Sounds ◽

Transition Duration ◽

Processing Strategies ◽

Temporal Cues ◽

Speech Segments ◽

Difference Time

AbstractObjective:This study aimed to evaluate the effect of lengthening the transition duration of selected speech segments upon the perception of those segments in individuals with auditory dys-synchrony.Methods:Thirty individuals with auditory dys-synchrony participated in the study, along with 30 age-matched normal hearing listeners. Eight consonant–vowel syllables were used as auditory stimuli. Two experiments were conducted. Experiment one measured the ‘just noticeable difference’ time: the smallest prolongation of the speech sound transition duration which was noticeable by the subject. In experiment two, speech sounds were modified by lengthening the transition duration by multiples of the just noticeable difference time, and subjects' speech identification scores for the modified speech sounds were assessed.Results:Subjects with auditory dys-synchrony demonstrated poor processing of temporal auditory information. Lengthening of speech sound transition duration improved these subjects' perception of both the placement and voicing features of the speech syllables used.Conclusion:These results suggest that innovative speech processing strategies which enhance temporal cues may benefit individuals with auditory dys-synchrony.

Download Full-text

Perceptual Fusion Tendency of Speech Sounds

Journal of Cognitive Neuroscience ◽

10.1162/jocn.2010.21470 ◽

2011 ◽

Vol 23 (4) ◽

pp. 1003-1014 ◽

Cited By ~ 11

Author(s):

Ying Huang ◽

Jingyu Li ◽

Xuefei Zou ◽

Tianshu Qu ◽

Xihong Wu ◽

...

Keyword(s):

Speech Processing ◽

Speech Sound ◽

Amplitude Fluctuation ◽

Speech Sounds ◽

Sound Sources ◽

Reflected Waves ◽

Sound Image ◽

Grouping Strategy ◽

Nonspeech Sounds ◽

Reverberant Environment

To discriminate and to recognize sound sources in a noisy, reverberant environment, listeners need to perceptually integrate the direct wave with the reflections of each sound source. It has been confirmed that perceptual fusion between direct and reflected waves of a speech sound helps listeners recognize this speech sound in a simulated reverberant environment with disrupting sound sources. When the delay between a direct sound wave and its reflected wave is sufficiently short, the two waves are perceptually fused into a single sound image as coming from the source location. Interestingly, compared with nonspeech sounds such as clicks and noise bursts, speech sounds have a much larger perceptual fusion tendency. This study investigated why the fusion tendency for speech sounds is so large. Here we show that when the temporal amplitude fluctuation of speech was artificially time reversed, a large perceptual fusion tendency of speech sounds disappeared, regardless of whether the speech acoustic carrier was in normal or reversed temporal order. Moreover, perceptual fusion of normal-order speech, but not that of time-reversed speech, was accompanied by increased coactivation of the attention-control-related, spatial-processing-related, and speech-processing-related cortical areas. Thus, speech-like acoustic carriers modulated by speech amplitude fluctuation selectively activate a cortical network for top–down modulations of speech processing, leading to an enhancement of perceptual fusion of speech sounds. This mechanism represents a perceptual-grouping strategy for unmasking speech under adverse conditions.

Download Full-text

Characteristics of Effective Auditory Training: Implications From Two Training Programs That Successfully Trained Nonnative Cantonese Tone Identification in Monolingual Mandarin and Bilingual Mandarin–Taiwanese Tone Speakers

Journal of Speech Language and Hearing Research ◽

10.1044/2021_jslhr-20-00436 ◽

2021 ◽

pp. 1-23

Author(s):

Puisan Wong ◽

Ka Yu Lam

Keyword(s):

Native Speakers ◽

Training Programs ◽

Reference Group ◽

Training Session ◽

Auditory Training ◽

Speech Sounds ◽

Lexical Tone ◽

Factors Affecting ◽

Identification Test ◽

Tonal System

Purpose Auditory training is important in pedagogical and clinical settings. In search of a more effective perceptual program for training new suprasegmental categories, this study examined the effect of two auditory programs that incorporated five elements that have previously been identified to be effective for training nonnative segmental and suprasegmental speech sounds on the identification of a complex foreign lexical tone system (Cantonese) that contrasts both pitch shapes and pitch heights. To investigate the training outcomes in learners with different tonal systems, monolingual Mandarin-speaking learners who have a smaller native tonal system that contrasts pitch shapes only and bilingual Mandarin-Taiwanese–speaking learners who have a larger native tonal system that contrasts both pitch shapes and pitch heights were recruited for training. Method Thirty Mandarin-speaking monolinguals and 33 Mandarin-Taiwanese–speaking bilinguals in Taiwan were randomly assigned to two training programs, one with different tones and the other with the same tone preceding the target words in the same training block, and received six 90-min training sessions within 2 weeks. They took a Cantonese Tone Identification Test before training and after each training session. Twenty Cantonese native speakers in Hong Kong served as the reference group and took the same Cantonese Tone Identification Test. Results The two training programs were equally effective. Before training, the monolinguals performed poorer than the bilinguals. After training, the monolinguals and bilinguals in both training programs identified the six Cantonese tones in new words, new utterances, and novel speakers with comparable results, and their overall accuracy did not differ from that of the Cantonese native speakers. Conclusions Though learners with a larger and more complex native tonal system have initial advantage in learning nonnative tones, the intensive high-variability full-set training programs that provide explicit phonetic instruction and contrastive feedback of nonnative tones effectively promote nonnative tone acquisition in learners of different tone languages. The findings revealed factors affecting nonnative tone acquisition in tone speakers. The design of the two programs can be adopted in future programs for effective auditory training of segmental and suprasegmental speech sounds.

Download Full-text

Speech perception in individuals with auditory dys-synchrony: effect of lengthening of voice onset time and burst duration of speech segments

The Journal of Laryngology & Otology ◽

10.1017/s0022215113001278 ◽

2013 ◽

Vol 127 (7) ◽

pp. 656-665

Author(s):

U A Kumar ◽

M Jayaram

Keyword(s):

Speech Perception ◽

Speech Processing ◽

Voice Onset Time ◽

Onset Time ◽

Burst Duration ◽

Just Noticeable Difference ◽

Speech Sounds ◽

Transition Duration ◽

Processing Strategies ◽

Synchrony Effect

AbstractObjective:The purpose of this study was to evaluate the effect of lengthening of voice onset time and burst duration of selected speech stimuli on perception by individuals with auditory dys-synchrony. This is the second of a series of articles reporting the effect of signal enhancing strategies on speech perception by such individuals.Methods:Two experiments were conducted: (1) assessment of the ‘just-noticeable difference' for voice onset time and burst duration of speech sounds; and (2) assessment of speech identification scores when speech sounds were modified by lengthening the voice onset time and the burst duration in units of one just-noticeable difference, both in isolation and in combination with each other plus transition duration modification.Results:Lengthening of voice onset time as well as burst duration improved perception of voicing. However, the effect of voice onset time modification was greater than that of burst duration modification. Although combined lengthening of voice onset time, burst duration and transition duration resulted in improved speech perception, the improvement was less than that due to lengthening of transition duration alone.Conclusion:These results suggest that innovative speech processing strategies that enhance temporal cues may benefit individuals with auditory dys-synchrony.

Download Full-text

SPEECH SOUND PERCEPTION MODALITY IN COCHLEAR IMPLANT PATIENTS WITH DIFFERENT PROCESSING STRATEGIES

Nippon Jibiinkoka Gakkai Kaiho ◽

10.3950/jibiinkoka.91.177 ◽

1988 ◽

Vol 91 (2) ◽

pp. 177-184,317

Author(s):

SOTARO FUNASAKA ◽

KUMIKO YUKAWA ◽

OSAMU TAKAHASHI ◽

SHINICHI HATSUSHIKA ◽

MUTSUMI HOSOYA ◽

...

Keyword(s):

Cochlear Implant ◽

Speech Sound ◽

Sound Perception ◽

Processing Strategies ◽

Speech Sound Perception

Download Full-text

Study on the Speech Processing Strategies of Bi-sentences of Thai Native Speakers

Proceedings of the 2016 2nd International Conference on Social Science and Higher Education ◽

10.2991/icsshe-16.2016.120 ◽

2016 ◽

Author(s):

Ruifeng Wang

Keyword(s):

Speech Processing ◽

Native Speakers ◽

Processing Strategies

Download Full-text

The Development of Ilocano Word Lists for Speech Audiometry

Philippine Journal of Otolaryngology Head and Neck Surgery ◽

10.32412/pjohns.v21i1-2.821 ◽

2006 ◽

Vol 21 (1-2) ◽

pp. 11-19

Author(s):

Reynita R. Sagon ◽

Rosalie M. Uchanski

Keyword(s):

Native Speakers ◽

Speech Sound ◽

Word List ◽

Speech Audiometry ◽

Speech Sounds ◽

Transcription Analysis ◽

Phonetic Transcription ◽

Word Lists ◽

Monosyllabic Words ◽

Magazine Articles

Objective: The goal of this work is the creation of word lists, in Ilocano, suitable for use in speech audiometry. Methods: First, estimates of the distribution of speech sounds and of the most common syllable structures in Ilocano were found from a phonetic transcription analysis of nearly 3000 words obtained from three magazine articles. Second, 372 two-syllable words were rated, for commonness, by fifteen native speakers of Ilocano who currently reside in Hawai’i. Finally, various combinations of two-syllable words were made to produce 50-item lists. Results: First, an estimate of the distribution of speech sounds in Ilocano was found, with frequencies of occurrence ranging from 22.4%, for the speech sound /a/, to 0.007%, for the speech sound /v/. The syllable-structure analyses revealed that a very small number of distinct monosyllabic words were used very frequently. Two-syllable words were also used frequently, but were attributed to many distinct words. Second, from the rating results, approximately 70% of the two-syllable words in the rating survey were judged as common by 12 or more of the raters. Finally, four lists of 50 words each were constructed using only common two-syllable words with the most frequent two-syllable structures found in Ilocano. Each word list has a distribution of speech sounds that approximates that found from the phonetic analysis, and hence each list is roughly phonetically-balanced. Conclusions: These word lists may be of value to otolaryngologists and audiologists who work with native speakers of Ilocano. Keywords: Ilocano, Ilokano, phonetically-balanced, speech audiometry, word lists

Download Full-text

Metaphoric Gestures Facilitate Perception of Intonation More than Length in Auditory Judgments of Non-Native Phonemic Contrasts

Collabra Psychology ◽

10.1525/collabra.76 ◽

2017 ◽

Vol 3 (1) ◽

Cited By ~ 7

Author(s):

Spencer Kelly ◽

April Bailey ◽

Yukari Hirata

Keyword(s):

Foreign Language ◽

Speech Processing ◽

Speech Sound ◽

Two Dimensions ◽

Consistent Pattern ◽

Speech Sounds ◽

Hand Gestures ◽

English Speaking ◽

Phonemic Contrasts

It is well established that hand gestures affect comprehension and learning of semantic aspects of a foreign language (FL). However, much less is known about the role of hand gestures in lower-level language processes, such as perception of phonemes. To address this gap, we explored the role that metaphoric gestures play in perceiving FL speech sounds that varied on two dimensions: length and intonation. English speaking adults listened to Japanese length contrasts and sentence-final intonational distinctions in the context of congruent, incongruent and no gestures. For intonational contrasts, identification was more accurate for congruent gestures and less accurate for incongruent gestures relative to the baseline no gesture condition. However, for the length contrasts, there was no such clear and consistent pattern, and in fact, congruent gestures made speech processing more effortful. We conclude that metaphoric gestures help with some—but not all—novel speech sounds in a FL, suggesting that gesture and speech are phonemically integrated to differing extents depending on the nature of the gesture and/or speech sound.

Download Full-text

Easy Screening for Mild Alzheimer's Disease and Mild Cognitive Impairment from Elderly Speech

Current Alzheimer Research ◽

10.2174/1567205014666171120144343 ◽

2018 ◽

Vol 15 (2) ◽

pp. 104-110 ◽

Cited By ~ 3

Author(s):

Shohei Kato ◽

Akira Homma ◽

Takuto Sakuma

Keyword(s):

Alzheimer’S Disease ◽

Alzheimer's Disease ◽

Cognitive Impairment ◽

Mild Cognitive Impairment ◽

Characteristic Curve ◽

Linear Regression Analysis ◽

Speech Sound ◽

The Elderly ◽

Speech Sounds ◽

Mel Frequency Cepstral Coefficients

Objective: This study presents a novel approach for early detection of cognitive impairment in the elderly. The approach incorporates the use of speech sound analysis, multivariate statistics, and data-mining techniques. We have developed a speech prosody-based cognitive impairment rating (SPCIR) that can distinguish between cognitively normal controls and elderly people with mild Alzheimer's disease (mAD) or mild cognitive impairment (MCI) using prosodic signals extracted from elderly speech while administering a questionnaire. Two hundred and seventy-three Japanese subjects (73 males and 200 females between the ages of 65 and 96) participated in this study. The authors collected speech sounds from segments of dialogue during a revised Hasegawa's dementia scale (HDS-R) examination and talking about topics related to hometown, childhood, and school. The segments correspond to speech sounds from answers to questions regarding birthdate (T1), the name of the subject's elementary school (T2), time orientation (Q2), and repetition of three-digit numbers backward (Q6). As many prosodic features as possible were extracted from each of the speech sounds, including fundamental frequency, formant, and intensity features and mel-frequency cepstral coefficients. They were refined using principal component analysis and/or feature selection. The authors calculated an SPCIR using multiple linear regression analysis. Conclusion: In addition, this study proposes a binary discrimination model of SPCIR using multivariate logistic regression and model selection with receiver operating characteristic curve analysis and reports on the sensitivity and specificity of SPCIR for diagnosis (control vs. MCI/mAD). The study also reports discriminative performances well, thereby suggesting that the proposed approach might be an effective tool for screening the elderly for mAD and MCI.

Download Full-text

Isomorphism and language-specific devices in comprehension of Korean suffixal passive construction by Mandarin-speaking learners of Korean

Applied Linguistics Review ◽

10.1515/applirev-2020-0036 ◽

2021 ◽

Vol 0 (0) ◽

Author(s):

Gyu-Ho Shin ◽

Sun Hee Park

Keyword(s):

Native Speakers ◽

Reaction Times ◽

The Other ◽

L2 Acquisition ◽

Verbal Morphology ◽

Passive Voice ◽

Passive Construction ◽

Case Marking

Abstract Across languages, a passive construction is known to manifest a misalignment between the typical order of event composition (agent-before-theme) and the actual order of arguments in the constructions (theme-before-agent), dubbed non-isomorphic mapping. This study investigates comprehension of a suffixal passive construction in Korean by Mandarin-speaking learners of Korean, focusing on isomorphism and language-specific devices in the passive. We measured learners’ judgment of the acceptability of canonical and scrambled suffixal passives as well as their reaction times (relative to a canonical active transitive). Our analysis generated three major findings. First, learners uniformly preferred the canonical passive to the scrambled passive. Second, as proficiency increased, the judgment gap between the canonical active transitive and the canonical suffixal passive narrowed, but the gap between the canonical active transitive and the scrambled suffixal passive did not. Third, learners (and even native speakers) spent more time in judging the acceptability of the canonical suffixal passive than they did in the other two construction types. Implications of these findings are discussed with respect to the mapping nature involving a passive voice, indicated by language-specific devices (i.e., case-marking and verbal morphology dedicated to Korean passives), in L2 acquisition.

Download Full-text