Gradience in morphological decomposability: Evidence from the perception of audiovisually incongruent speech

Author(s):  
Azra N. Ali ◽  
Michael Ingleby

AbstractOver the last three decades, priming and masking experiments and corpus frequency studies have dominated attempts to find ranking in the decomposability of words containing morphological affixes. Here we establish feasibility of using another experimental probe based on audiovisually incongruent speech stimuli. In response to such stimuli, a proportion of participants report percepts that differ in place of articulation from either the audio or the visual signal, typically reporting percept /t/ when receiving audio /p/ dubbed onto visual /k/. We study the systematic variation of this proportion, the McGurk fusion rate, using a small corpus with affixes

2004 ◽  
Vol 16 (1) ◽  
pp. 31-39 ◽  
Author(s):  
Jonas Obleser ◽  
Aditi Lahiri ◽  
Carsten Eulitz

This study further elucidates determinants of vowel perception in the human auditory cortex. The vowel inventory of a given language can be classified on the basis of phonological features which are closely linked to acoustic properties. A cortical representation of speech sounds based on these phonological features might explain the surprisingly inverse correlation between immense variance in the acoustic signal and high accuracy of speech recognition. We investigated timing and mapping of the N100m elicited by 42 tokens of seven natural German vowels varying along the phonological features tongue height (corresponding to the frequency of the first formant) and place of articulation (corresponding to the frequency of the second and third formants). Auditoryevoked fields were recorded using a 148-channel whole-head magnetometer while subjects performed target vowel detection tasks. Source location differences appeared to be driven by place of articulation: Vowels with mutually exclusive place of articulation features, namely, coronal and dorsal elicited separate centers of activation along the posterior-anterior axis. Additionally, the time course of activation as reflected in the N100m peak latency distinguished between vowel categories especially when the spatial distinctiveness of cortical activation was low. In sum, results suggest that both N100m latency and source location as well as their interaction reflect properties of speech stimuli that correspond to abstract phonological features.


2013 ◽  
Vol 56 (3) ◽  
pp. 779-791 ◽  
Author(s):  
Catherine Mayo ◽  
Fiona Gibbon ◽  
Robert A. J. Clark

Purpose In this study, the authors aimed to investigate how listener training and the presence of intermediate acoustic cues influence transcription variability for conflicting cue speech stimuli. Method Twenty listeners with training in transcribing disordered speech, and 26 untrained listeners, were asked to make forced-choice labeling decisions for synthetic vowel–consonant–vowel (VCV) sequences “a doe” (/ədo/) and “a go” (/əgo/). Both the VC and CV transitions in these stimuli ranged through intermediate positions, from appropriate for /d/ to appropriate for /g/. Results Both trained and untrained listeners gave more weight to the CV transitions than to the VC transitions. However, listener behavior was not uniform: The results showed a high level of inter- and intratranscriber inconsistency, with untrained listeners showing a nonsignificant tendency to be more influenced than trained listeners by CV transitions. Conclusions Listeners do not assign consistent categorical labels to the type of intermediate, conflicting transitional cues that were present in the stimuli used in the current study and that are also present in disordered articulations. Although listener inconsistency in assigning labels to intermediate productions is not increased as a result of phonetic training, neither is it reduced by such training.


2021 ◽  
pp. 1-12
Author(s):  
Sandhya ◽  
Vinay ◽  
Manchaiah, V

Purpose Multimodal sensory integration in audiovisual (AV) speech perception is a naturally occurring phenomenon. Modality-specific responses such as auditory left, auditory right, and visual responses to dichotic incongruent AV speech stimuli help in understanding AV speech processing through each input modality. It is observed that distribution of activity in the frontal motor areas involved in speech production has been shown to correlate with how subjects perceive the same syllable differently or perceive different syllables. This study investigated the distribution of modality-specific responses to dichotic incongruent AV speech stimuli by simultaneously presenting consonant–vowel (CV) syllables with different places of articulation to the participant's left and right ears and visually. Design A dichotic experimental design was adopted. Six stop CV syllables /pa/, /ta/, /ka/, /ba/, /da/, and /ga/ were assembled to create dichotic incongruent AV speech material. Participants included 40 native speakers of Norwegian (20 women, M age = 22.6 years, SD = 2.43 years; 20 men, M age = 23.7 years, SD = 2.08 years). Results Findings of this study showed that, under dichotic listening conditions, velar CV syllables resulted in the highest scores in the respective ears, and this might be explained by stimulus dominance of velar consonants, as shown in previous studies. However, this study, with dichotic auditory stimuli accompanied by an incongruent video segment, demonstrated that the presentation of a visually distinct video segment possibly draws attention to the video segment in some participants, thereby reducing the overall recognition of the dominant syllable. Furthermore, the findings here suggest the possibility of lesser response times to incongruent AV stimuli in females compared with males. Conclusion The identification of the left audio, right audio, and visual segments in dichotic incongruent AV stimuli depends on place of articulation, stimulus dominance, and voice onset time of the CV syllables.


2021 ◽  
Vol 15 ◽  
Author(s):  
Mariel G. Gonzales ◽  
Kristina C. Backer ◽  
Brenna Mandujano ◽  
Antoine J. Shahin

The McGurk illusion occurs when listeners hear an illusory percept (i.e., “da”), resulting from mismatched pairings of audiovisual (AV) speech stimuli (i.e., auditory/ba/paired with visual/ga/). Hearing a third percept—distinct from both the auditory and visual input—has been used as evidence of AV fusion. We examined whether the McGurk illusion is instead driven by visual dominance, whereby the third percept, e.g., “da,” represents a default percept for visemes with an ambiguous place of articulation (POA), like/ga/. Participants watched videos of a talker uttering various consonant vowels (CVs) with (AV) and without (V-only) audios of/ba/. Individuals transcribed the CV they saw (V-only) or heard (AV). In the V-only condition, individuals predominantly saw “da”/“ta” when viewing CVs with indiscernible POAs. Likewise, in the AV condition, upon perceiving an illusion, they predominantly heard “da”/“ta” for CVs with indiscernible POAs. The illusion was stronger in individuals who exhibited weak/ba/auditory encoding (examined using a control auditory-only task). In Experiment2, we attempted to replicate these findings using stimuli recorded from a different talker. The V-only results were not replicated, but again individuals predominately heard “da”/“ta”/“tha” as an illusory percept for various AV combinations, and the illusion was stronger in individuals who exhibited weak/ba/auditory encoding. These results demonstrate that when visual CVs with indiscernible POAs are paired with a weakly encoded auditory/ba/, listeners default to hearing “da”/“ta”/“tha”—thus, tempering the AV fusion account, and favoring a default mechanism triggered when both AV stimuli are ambiguous.


2020 ◽  
Vol 29 (3) ◽  
pp. 391-403
Author(s):  
Dania Rishiq ◽  
Ashley Harkrider ◽  
Cary Springer ◽  
Mark Hedrick

Purpose The main purpose of this study was to evaluate aging effects on the predominantly subcortical (brainstem) encoding of the second-formant frequency transition, an essential acoustic cue for perceiving place of articulation. Method Synthetic consonant–vowel syllables varying in second-formant onset frequency (i.e., /ba/, /da/, and /ga/ stimuli) were used to elicit speech-evoked auditory brainstem responses (speech-ABRs) in 16 young adults ( M age = 21 years) and 11 older adults ( M age = 59 years). Repeated-measures mixed-model analyses of variance were performed on the latencies and amplitudes of the speech-ABR peaks. Fixed factors were phoneme (repeated measures on three levels: /b/ vs. /d/ vs. /g/) and age (two levels: young vs. older). Results Speech-ABR differences were observed between the two groups (young vs. older adults). Specifically, older listeners showed generalized amplitude reductions for onset and major peaks. Significant Phoneme × Group interactions were not observed. Conclusions Results showed aging effects in speech-ABR amplitudes that may reflect diminished subcortical encoding of consonants in older listeners. These aging effects were not phoneme dependent as observed using the statistical methods of this study.


1991 ◽  
Vol 34 (3) ◽  
pp. 671-678 ◽  
Author(s):  
Joan E. Sussman

This investigation examined the response strategies and discrimination accuracy of adults and children aged 5–10 as the ratio of same to different trials was varied across three conditions of a “change/no-change” discrimination task. The conditions varied as follows: (a) a ratio of one-third same to two-thirds different trials (33% same), (b) an equal ratio of same to different trials (50% same), and (c) a ratio of two-thirds same to one-third different trials (67% same). Stimuli were synthetic consonant-vowel syllables that changed along a place of articulation dimension by formant frequency transition. Results showed that all subjects changed their response strategies depending on the ratio of same-to-different trials. The most lax response pattern was observed for the 50% same condition, and the most conservative pattern was observed for the 67% same condition. Adult response patterns were most conservative across condition. Differences in discrimination accuracy as measured by P(C) were found, with the largest difference in the 5- to 6-year-old group and the smallest change in the adult group. These findings suggest that children’s response strategies, like those of adults, can be manipulated by changing the ratio of same-to-different trials. Furthermore, interpretation of sensitivity measures must be referenced to task variables such as the ratio of same-to-different trials.


1988 ◽  
Vol 31 (2) ◽  
pp. 156-165 ◽  
Author(s):  
P. A. Busby ◽  
Y. C. Tong ◽  
G. M. Clark

The identification of consonants in a/-C-/a/nonsense syllables, using a fourteen-alternative forced-choice procedure, was examined in 4 profoundly hearing-impaired children under five conditions: audition alone using hearing aids in free-field (A),vision alone (V), auditory-visual using hearing aids in free-field (AV1), auditory-visual with linear amplification (AV2), and auditory-visual with syllabic compression (AV3). In the AV2 and AV3 conditions, acoustic signals were binaurally presented by magnetic or acoustic coupling to the subjects' hearing aids. The syllabic compressor had a compression ratio of 10:1, and attack and release times were 1.2 ms and 60 ms. The confusion matrices were subjected to two analysis methods: hierarchical clustering and information transmission analysis using articulatory features. The same general conclusions were drawn on the basis of results obtained from either analysis method. The results indicated better performance in the V condition than in the A condition. In the three AV conditions, the subjects predominately combined the acoustic parameter of voicing with the visual signal. No consistent differences were recorded across the three AV conditions. Syllabic compression did not, therefore, appear to have a significant influence on AV perception for these children. A high degree of subject variability was recorded for the A and three AV conditions, but not for the V condition.


1998 ◽  
Vol 41 (3) ◽  
pp. 538-548 ◽  
Author(s):  
Sean C. Huckins ◽  
Christopher W. Turner ◽  
Karen A. Doherty ◽  
Michael M. Fonte ◽  
Nikolaus M. Szeverenyi

Functional Magnetic Resonance Imaging (fMRI) holds exciting potential as a research and clinical tool for exploring the human auditory system. This noninvasive technique allows the measurement of discrete changes in cerebral cortical blood flow in response to sensory stimuli, allowing determination of precise neuroanatomical locations of the underlying brain parenchymal activity. Application of fMRI in auditory research, however, has been limited. One problem is that fMRI utilizing echo-planar imaging technology (EPI) generates intense noise that could potentially affect the results of auditory experiments. Also, issues relating to the reliability of fMRI for listeners with normal hearing need to be resolved before this technique can be used to study listeners with hearing loss. This preliminary study examines the feasibility of using fMRI in auditory research by performing a simple set of experiments to test the reliability of scanning parameters that use a high resolution and high signal-to-noise ratio unlike that presently reported in the literature. We used consonant-vowel (CV) speech stimuli to investigate whether or not we could observe reproducible and consistent changes in cortical blood flow in listeners during a single scanning session, across more than one scanning session, and in more than one listener. In addition, we wanted to determine if there were differences between CV speech and nonspeech complex stimuli across listeners. Our study shows reproducibility within and across listeners for CV speech stimuli. Results were reproducible for CV speech stimuli within fMRI scanning sessions for 5 out of 9 listeners and were reproducible for 6 out of 8 listeners across fMRI scanning sessions. Results of nonspeech complex stimuli across listeners showed activity in 4 out of 9 individuals tested.


2010 ◽  
Vol 3 (2) ◽  
pp. 156-180 ◽  
Author(s):  
Renáta Gregová ◽  
Lívia Körtvélyessy ◽  
Július Zimmermann

Universals Archive (Universal #1926) indicates a universal tendency for sound symbolism in reference to the expression of diminutives and augmentatives. The research ( Štekauer et al. 2009 ) carried out on European languages has not proved the tendency at all. Therefore, our research was extended to cover three language families – Indo-European, Niger-Congo and Austronesian. A three-step analysis examining different aspects of phonetic symbolism was carried out on a core vocabulary of 35 lexical items. A research sample was selected out of 60 languages. The evaluative markers were analyzed according to both phonetic classification of vowels and consonants and Ultan's and Niewenhuis' conclusions on the dominance of palatal and post-alveolar consonants in diminutive markers. Finally, the data obtained in our sample languages was evaluated by means of a three-dimensional model illustrating the place of articulation of the individual segments.


2014 ◽  
Vol 155 (38) ◽  
pp. 1524-1529
Author(s):  
Ádám Bach ◽  
Ferenc Tóth ◽  
Vera Matievics ◽  
József Géza Kiss ◽  
József Jóri ◽  
...  

Introduction: Cortical auditory evoked potentials can provide objective information about the highest level of the auditory system. Aim: The purpose of the authors was to introduce a new tool, the “HEARLab” which can be routinely used in clinical practice for the measurement of the cortical auditory evoked potentials. In addition, they wanted to establish standards of the analyzed parameters in subjects with normal hearing. Method: 25 adults with normal hearing were tested with speech stimuli, and frequency specific examinations were performed utilizing pure tone stimuli. Results: The findings regarding the latency and amplitude analyses of the evoked potentials confirm previously published results of this novel method. Conclusions: The HEARLAb can be a great help when performance of the conventional audiological examinations is complicated. The examination can be performed in uncooperative subjects even in the presence of hearing aids. The test is frequency specific and does not require anesthesia. Orv. Hetil., 2014, 155(38), 1524–1529.


Sign in / Sign up

Export Citation Format

Share Document