scholarly journals Individual Differences in the Use of Acoustic-Phonetic Versus Lexical Cues for Speech Perception

2021 ◽  
Vol 6 ◽  
Author(s):  
Nikole Giovannone ◽  
Rachel M. Theodore

Previous research suggests that individuals with weaker receptive language show increased reliance on lexical information for speech perception relative to individuals with stronger receptive language, which may reflect a difference in how acoustic-phonetic and lexical cues are weighted for speech processing. Here we examined whether this relationship is the consequence of conflict between acoustic-phonetic and lexical cues in speech input, which has been found to mediate lexical reliance in sentential contexts. Two groups of participants completed standardized measures of language ability and a phonetic identification task to assess lexical recruitment (i.e., a Ganong task). In the high conflict group, the stimulus input distribution removed natural correlations between acoustic-phonetic and lexical cues, thus placing the two cues in high competition with each other; in the low conflict group, these correlations were present and thus competition was reduced as in natural speech. The results showed that 1) the Ganong effect was larger in the low compared to the high conflict condition in single-word contexts, suggesting that cue conflict dynamically influences online speech perception, 2) the Ganong effect was larger for those with weaker compared to stronger receptive language, and 3) the relationship between the Ganong effect and receptive language was not mediated by the degree to which acoustic-phonetic and lexical cues conflicted in the input. These results suggest that listeners with weaker language ability down-weight acoustic-phonetic cues and rely more heavily on lexical knowledge, even when stimulus input distributions reflect characteristics of natural speech input.

Author(s):  
Nikole Giovannone ◽  
Rachel M. Theodore

Purpose The extant literature suggests that individual differences in speech perception can be linked to broad receptive language phenotype. For example, a recent study found that individuals with a smaller receptive vocabulary showed diminished lexically guided perceptual learning compared to individuals with a larger receptive vocabulary. Here, we examined (a) whether such individual differences stem from variation in reliance on lexical information or variation in perceptual learning itself and (b) whether a relationship exists between lexical recruitment and lexically guided perceptual learning more broadly, as predicted by current models of lexically guided perceptual learning. Method In Experiment 1, adult participants ( n = 70) completed measures of receptive and expressive language ability, lexical recruitment, and lexically guided perceptual learning. In Experiment 2, adult participants ( n = 120) completed the same lexical recruitment and lexically guided perceptual learning tasks to provide a high-powered replication of the primary findings from Experiment 1. Results In Experiment 1, individuals with weaker receptive language ability showed increased lexical recruitment relative to individuals with higher receptive language ability; however, receptive language ability did not predict the magnitude of lexically guided perceptual learning. Moreover, the results of both experiments converged to show no evidence indicating a relationship between lexical recruitment and lexically guided perceptual learning. Conclusion The current findings suggest that (a) individuals with weaker language ability demonstrate increased reliance on lexical information for speech perception compared to those with stronger receptive language ability; (b) individuals with weaker language ability maintain an intact perceptual learning mechanism; and, (c) to the degree that the measures used here accurately capture individual differences in lexical recruitment and lexically guided perceptual learning, there is no graded relationship between these two constructs.


2020 ◽  
Vol 63 (1) ◽  
pp. 1-13 ◽  
Author(s):  
Rachel M. Theodore ◽  
Nicholas R. Monto ◽  
Stephen Graham

Purpose Speech perception is facilitated by listeners' ability to dynamically modify the mapping to speech sounds given systematic variation in speech input. For example, the degree to which listeners show categorical perception of speech input changes as a function of distributional variability in the input, with perception becoming less categorical as the input, becomes more variable. Here, we test the hypothesis that higher level receptive language ability is linked to the ability to adapt to low-level distributional cues in speech input. Method Listeners ( n = 58) completed a distributional learning task consisting of 2 blocks of phonetic categorization for words beginning with /g/ and /k/. In 1 block, the distributions of voice onset time values specifying /g/ and /k/ had narrow variances (i.e., minimal variability). In the other block, the distributions of voice onset times specifying /g/ and /k/ had wider variances (i.e., increased variability). In addition, all listeners completed an assessment battery for receptive language, nonverbal intelligence, and reading fluency. Results As predicted by an ideal observer computational framework, the participants in aggregate showed identification responses that were more categorical for consistent compared to inconsistent input, indicative of distributional learning. However, the magnitude of learning across participants showed wide individual variability, which was predicted by receptive language ability but not by nonverbal intelligence or by reading fluency. Conclusion The results suggest that individual differences in distributional learning for speech are linked, at least in part, to receptive language ability, reflecting a decreased ability among those with weaker receptive language to capitalize on consistent input distributions.


2004 ◽  
Vol 16 (3) ◽  
pp. 154-159 ◽  
Author(s):  
Seung-Hwan Lee ◽  
Young-Cho Chung ◽  
Jong-Chul Yang ◽  
Yong-Ku Kim ◽  
Kwang-Yoon Suh

Background:The neurobiological mechanism of auditory hallucination (AH) in schizophrenia remains elusive, but AH can be caused by the abnormality in the speech perception system based on the speech perception neural network model.Objectives:The purpose of this study was to investigate whether schizophrenic patients with AH have the speech processing impairment as compared with schizophrenic patients without AH, and whether the speech perception ability could be improved after AH had subsided.Methods:Twenty-four schizophrenic patients with AH were compared with 25 schizophrenic patients without AH. Narrative speech perception was assessed using a masked speech tracking (MST) task with three levels of superimposed phonetic noise. Sentence repetition task (SRT) and auditory continuous performance task (CPT) were used to assess grammar-dependent verbal working memory and non-language attention, respectively. These tests were measured before and after treatment in both groups.Results:Before treatment, schizophrenic patients with AH showed significant impairments in MST compared with those without AH. There were no significant differences in SRT and CPT correct (CPT-C) rates between both groups, but CPT incorrect (CPT-I) rate showed a significant difference. The low-score CPI-I group showed a significant difference in MST performance between the two groups, while the high-score CPI-I group did not. After treatment (after AH subsided), the hallucinating schizophrenic patients still had significant impairment in MST performance compared with non-hallucinating schizophrenic patients.Conclusions:Our results support the claim that schizophrenic patients with AH are likely to have a disturbance of the speech perception system. Moreover, our data suggest that non-language attention might be a key factor influencing speech perception ability and that speech perception dysfunction might be a trait marker in schizophrenia with AH.


2021 ◽  
Author(s):  
Julia Schwarz ◽  
Katrina (Kechun) Li ◽  
Jasper Hong Sim ◽  
Yixin Zhang ◽  
Elizabeth Buchanan-Worster ◽  
...  

Face masks can cause speech processing difficulties. However, it is unclear to what extent these difficulties are caused by the visual obstruction of the speaker’s mouth or by changes of the acoustic signal, and whether the effects can be found regardless of semantic context. In the present study, children and adults performed a cued shadowing task online, repeating the last word of English sentences. Target words were embedded in sentence-final position and manipulated visually, acoustically, and by semantic context (cloze probability). First results from 16 children and 16 adults suggest that processing language through face masks leads to slower responses in both groups, but visual, acoustic, and semantic cues all significantly reduce the mask effect. Although children were less proficient in predictive speech processing overall, they were still able to use semantic cues to compensate for face mask effects in a similar fashion to adults.


2019 ◽  
Author(s):  
Shyanthony R. Synigal ◽  
Emily S. Teoh ◽  
Edmund C. Lalor

ABSTRACTThe human auditory system is adept at extracting information from speech in both single-speaker and multi-speaker situations. This involves neural processing at the rapid temporal scales seen in natural speech. Non-invasive brain imaging (electro-/magnetoencephalography [EEG/MEG]) signatures of such processing have shown that the phase of neural activity below 16 Hz tracks the dynamics of speech, whereas invasive brain imaging (electrocorticography [ECoG]) has shown that such rapid processing is even more strongly reflected in the power of neural activity at high frequencies (around 70-150 Hz; known as high gamma). The aim of this study was to determine if high gamma power in scalp recorded EEG carries useful stimulus-related information, despite its reputation for having a poor signal to noise ratio. Furthermore, we aimed to assess whether any such information might be complementary to that reflected in well-established low frequency EEG indices of speech processing. We used linear regression to investigate speech envelope and attention decoding in EEG at low frequencies, in high gamma power, and in both signals combined. While low frequency speech tracking was evident for almost all subjects as expected, high gamma power also showed robust speech tracking in a minority of subjects. This same pattern was true for attention decoding using a separate group of subjects who undertook a cocktail party attention experiment. For the subjects who showed speech tracking in high gamma power, the spatiotemporal characteristics of that high gamma tracking differed from that of low-frequency EEG. Furthermore, combining the two neural measures led to improved measures of speech tracking for several subjects. Overall, this indicates that high gamma power EEG can carry useful information regarding speech processing and attentional selection in some subjects and combining it with low frequency EEG can improve the mapping between natural speech and the resulting neural responses.


2010 ◽  
pp. 447-473
Author(s):  
Pedro Gómez-Vilda ◽  
José Manuel Ferrández-Vicente ◽  
Victoria Rodellar-Biarge ◽  
Rafael Martínez-Olalla ◽  
Víctor Nieto-Lluis ◽  
...  

Current trends in the search for improvements in well-established technologies imitating human abilities, as speech perception, try to find inspiration in the explanation of certain capabilities hidden in the natural system which are not yet well understood. A typical case is that of speech recognition, where the semantic gap going from spectral time-frequency representations to the symbolic translation into phonemes and words, and the construction of morpho-syntactic and semantic structures find many hidden phenomena not well understood yet. The present chapter is intended to explore some of these facts at a simplifying level under two points of view: that of top-down analysis provided from speech perception, and the symmetric from bottom-up synthesis provided by the biological architecture of auditory pathways. An application-driven design of a Neuromorphic Speech Processing Architecture is presented and its performance analyzed. Simulation details provided by a parallel implementation of the architecture in a supercomputer will be also shown and discussed.


2020 ◽  
Vol 10 (1) ◽  
Author(s):  
Raphaël Thézé ◽  
Mehdi Ali Gadiri ◽  
Louis Albert ◽  
Antoine Provost ◽  
Anne-Lise Giraud ◽  
...  

Abstract Natural speech is processed in the brain as a mixture of auditory and visual features. An example of the importance of visual speech is the McGurk effect and related perceptual illusions that result from mismatching auditory and visual syllables. Although the McGurk effect has widely been applied to the exploration of audio-visual speech processing, it relies on isolated syllables, which severely limits the conclusions that can be drawn from the paradigm. In addition, the extreme variability and the quality of the stimuli usually employed prevents comparability across studies. To overcome these limitations, we present an innovative methodology using 3D virtual characters with realistic lip movements synchronized on computer-synthesized speech. We used commercially accessible and affordable tools to facilitate reproducibility and comparability, and the set-up was validated on 24 participants performing a perception task. Within complete and meaningful French sentences, we paired a labiodental fricative viseme (i.e. /v/) with a bilabial occlusive phoneme (i.e. /b/). This audiovisual mismatch is known to induce the illusion of hearing /v/ in a proportion of trials. We tested the rate of the illusion while varying the magnitude of background noise and audiovisual lag. Overall, the effect was observed in 40% of trials. The proportion rose to about 50% with added background noise and up to 66% when controlling for phonetic features. Our results conclusively demonstrate that computer-generated speech stimuli are judicious, and that they can supplement natural speech with higher control over stimulus timing and content.


2018 ◽  
Vol 30 (11) ◽  
pp. 1704-1719 ◽  
Author(s):  
Anna Maria Alexandrou ◽  
Timo Saarinen ◽  
Jan Kujala ◽  
Riitta Salmelin

During natural speech perception, listeners must track the global speaking rate, that is, the overall rate of incoming linguistic information, as well as transient, local speaking rate variations occurring within the global speaking rate. Here, we address the hypothesis that this tracking mechanism is achieved through coupling of cortical signals to the amplitude envelope of the perceived acoustic speech signals. Cortical signals were recorded with magnetoencephalography (MEG) while participants perceived spontaneously produced speech stimuli at three global speaking rates (slow, normal/habitual, and fast). Inherently to spontaneously produced speech, these stimuli also featured local variations in speaking rate. The coupling between cortical and acoustic speech signals was evaluated using audio–MEG coherence. Modulations in audio–MEG coherence spatially differentiated between tracking of global speaking rate, highlighting the temporal cortex bilaterally and the right parietal cortex, and sensitivity to local speaking rate variations, emphasizing the left parietal cortex. Cortical tuning to the temporal structure of natural connected speech thus seems to require the joint contribution of both auditory and parietal regions. These findings suggest that cortical tuning to speech rhythm operates on two functionally distinct levels: one encoding the global rhythmic structure of speech and the other associated with online, rapidly evolving temporal predictions. Thus, it may be proposed that speech perception is shaped by evolutionary tuning, a preference for certain speaking rates, and predictive tuning, associated with cortical tracking of the constantly changing-rate of linguistic information in a speech stream.


Sign in / Sign up

Export Citation Format

Share Document