Brain potentials indicate immediate use of prosodic cues in natural speech processing

10.1038/5757 ◽  
1999 ◽  
Vol 2 (2) ◽  
pp. 191-196 ◽  
Author(s):  
Karsten Steinhauer ◽  
Kai Alter ◽  
Angela D. Friederici
2019 ◽  
Author(s):  
Shyanthony R. Synigal ◽  
Emily S. Teoh ◽  
Edmund C. Lalor

ABSTRACTThe human auditory system is adept at extracting information from speech in both single-speaker and multi-speaker situations. This involves neural processing at the rapid temporal scales seen in natural speech. Non-invasive brain imaging (electro-/magnetoencephalography [EEG/MEG]) signatures of such processing have shown that the phase of neural activity below 16 Hz tracks the dynamics of speech, whereas invasive brain imaging (electrocorticography [ECoG]) has shown that such rapid processing is even more strongly reflected in the power of neural activity at high frequencies (around 70-150 Hz; known as high gamma). The aim of this study was to determine if high gamma power in scalp recorded EEG carries useful stimulus-related information, despite its reputation for having a poor signal to noise ratio. Furthermore, we aimed to assess whether any such information might be complementary to that reflected in well-established low frequency EEG indices of speech processing. We used linear regression to investigate speech envelope and attention decoding in EEG at low frequencies, in high gamma power, and in both signals combined. While low frequency speech tracking was evident for almost all subjects as expected, high gamma power also showed robust speech tracking in a minority of subjects. This same pattern was true for attention decoding using a separate group of subjects who undertook a cocktail party attention experiment. For the subjects who showed speech tracking in high gamma power, the spatiotemporal characteristics of that high gamma tracking differed from that of low-frequency EEG. Furthermore, combining the two neural measures led to improved measures of speech tracking for several subjects. Overall, this indicates that high gamma power EEG can carry useful information regarding speech processing and attentional selection in some subjects and combining it with low frequency EEG can improve the mapping between natural speech and the resulting neural responses.


2004 ◽  
Vol 16 (9) ◽  
pp. 1647-1668 ◽  
Author(s):  
Dirk Koester ◽  
Th. C. Gunter ◽  
S. Wagner ◽  
A. D. Friederici

The morphosyntactic decomposition of German compound words and a proposed function of linking elements were examined during auditory processing using event-related brain potentials. In Experiment 1, the syntactic gender agreement was manipulated between a determiner and the initial compound constituent (the “nonhead” constituent), and between a determiner and the last constituent (“head”). Although only the head is (morpho)syntactically relevant in German, both constituents elicited a left-anterior negativity if its gender was incongruent. This strongly suggests that compounds are morphosyntactically decomposed. Experiment 2 tested the function of those linking elements which are homophonous to plural morphemes. It has been previously suggested that these indicate the number of nonhead constituents. The number agreement was manipulated for both constituents analogous to Experiment 1. Number-incongruent heads, but not nonhead constituents, elicited an N400 and a subsequent broad negativity, suggesting that linking elements are not processed as plural morphemes. Experiment 3 showed that prosodic cues (duration and fundamental frequency) are employed to differentiate between compounds and single nouns and, thereby, betwen linking elements and plural morphemes. Number-incongruent words elicited a broad negativity if they were produced with a single noun prosody; the same words elicited no event-related potential effect if produced with a compound prosody. A dual-route model can account for the influence of prosody on morphosyntactic processing.


2020 ◽  
Vol 10 (1) ◽  
Author(s):  
Raphaël Thézé ◽  
Mehdi Ali Gadiri ◽  
Louis Albert ◽  
Antoine Provost ◽  
Anne-Lise Giraud ◽  
...  

Abstract Natural speech is processed in the brain as a mixture of auditory and visual features. An example of the importance of visual speech is the McGurk effect and related perceptual illusions that result from mismatching auditory and visual syllables. Although the McGurk effect has widely been applied to the exploration of audio-visual speech processing, it relies on isolated syllables, which severely limits the conclusions that can be drawn from the paradigm. In addition, the extreme variability and the quality of the stimuli usually employed prevents comparability across studies. To overcome these limitations, we present an innovative methodology using 3D virtual characters with realistic lip movements synchronized on computer-synthesized speech. We used commercially accessible and affordable tools to facilitate reproducibility and comparability, and the set-up was validated on 24 participants performing a perception task. Within complete and meaningful French sentences, we paired a labiodental fricative viseme (i.e. /v/) with a bilabial occlusive phoneme (i.e. /b/). This audiovisual mismatch is known to induce the illusion of hearing /v/ in a proportion of trials. We tested the rate of the illusion while varying the magnitude of background noise and audiovisual lag. Overall, the effect was observed in 40% of trials. The proportion rose to about 50% with added background noise and up to 66% when controlling for phonetic features. Our results conclusively demonstrate that computer-generated speech stimuli are judicious, and that they can supplement natural speech with higher control over stimulus timing and content.


2016 ◽  
Author(s):  
Liberty S. Hamilton ◽  
Erik Edwards ◽  
Edward F. Chang

AbstractTo derive meaning from speech, we must extract multiple dimensions of concurrent information from incoming speech signals, including phonetic and prosodic cues. Equally important is the detection of acoustic cues that give structure and context to the information we hear, such as sentence boundaries. How the brain organizes this information processing is unknown. Here, using data-driven computational methods on an extensive set of high-density intracranial recordings, we reveal a large-scale partitioning of the entire human speech cortex into two spatially distinct regions that detect important cues for parsing natural speech. These caudal (Zone 1) and rostral (Zone 2) regions work in parallel to detect onsets and prosodic information, respectively, within naturally spoken sentences. In contrast, local processing within each region supports phonetic feature encoding. These findings demonstrate a fundamental organizational property of the human auditory cortex that has been previously unrecognized.


2019 ◽  
pp. 002383091989036
Author(s):  
Sasha Calhoun ◽  
Emma Wollum ◽  
Emma Kruse Va’ai

This paper looks at the perception of prosodic prominence and the interpretation of focus position in the unrelated languages, Samoan and English. In many languages, prosodic prominence is a key marker of focus, so it is expected that prosodic prominence would affect judgments of focus position. However, it is shown that focus position, in turn, influences the perception of prosodic prominence according to language-specific expectations about the alignment between focus position and nuclear accent placement. Two sets of parallel perception experiments in Samoan and English are reported. In the first, participants judged the most prosodically prominent word in sentences which varied in syntactic construction (cleft/canonical) and intended stress position (subject/object). In both languages, participants were more likely to choose the intended stressed word if it was in the focus position. However, this effect was much larger in Samoan, which fits with its relatively lower functionality of prosodic prominence. In the second experiment, participants were asked to choose which question had been asked, consistent with subject or object focus. It was found that in both languages, participants weighted syntactic and prosodic cues to focus in line with expectations from their language. These findings have implications for how we conceive the role of prosodic prominence in speech processing across languages.


2021 ◽  
Author(s):  
◽  
Mengzhu Yan

<p>It is well established that focus plays an important role in facilitating language processing, i.e., focused words are recognised faster and remembered better. In addition, more recent research shows that alternatives to a word (e.g., sailor as an alternative to captain) are more activated when listeners hear the word with contrastive prominence (e.g., ‘The captain put on the raincoat) (bold indicates contrastive prominence). The mechanism behind these processing advantages is focus. Focus has two broad conceptions in relation to its effect on language processing: focus as updating the common ground and focus as indicating alternatives. Considerable psycholinguistic evidence has been obtained for processing advantages consistent with the first conception, and this evidence comes from studies across a reasonably wide range of languages. But the evidence for the second conception only comes from a handful of closely related languages (i.e., English, Dutch and German). Further, it has largely been confined to contrastive accenting as a marker of focus. Therefore, it is not clear if other types of focus marking (e.g., clefts) have similar processing effects. It is also not known if all this is true in Mandarin, as there is very little research in these areas in Mandarin. Mandarin uses pitch expansion to mark contrastive prominence, rather than the pitch accenting found in Germanic languages. Therefore, the investigation of Mandarin expands our knowledge of these speech processing effects to a different language and language family. It also expands our knowledge of the relative roles of prosody and syntax in marking focus and in speech processing in Mandarin, and in general.  This thesis tested how different types of focus marking affect the perception of focus and two aspects of language processing related to focus: the encoding and activation of discourse information (focused words and focus alternatives). The aim was to see whether there is a link between the relative importance of prosodic and syntactic focus marking in Mandarin and their effectiveness in these aspects of language processing. For focus perception, contrastive prominence and clefting have been claimed to mark focus in Mandarin, but it has not been well tested whether listeners perceive them as focus marking. For the first aspect of processing, it is not yet clear what cues listeners use to encode focused information beyond prominence when processing a discourse. For the second aspect, there has been rapidly growing interest in the role of alternatives in language processing, but little is known regarding the effect of clefting. In addition, it is not clear whether the prosodic and syntactic cues are equally effective, and again little research has been devoted to Mandarin. Therefore, the following experiments were conducted to look at these cues in Mandarin.  Experiment 1, a norming study, was conducted to help select stimuli for the following Experiments 2, 3, 4A and 4B. Experiment 2 investigated the relative weights of prosodic and syntactic focus cues in a question-answer appropriateness rating task. The findings show that in canonical word order sentences, the focus was perceived to be on the word that was marked by contrastive prominence. In clefts where the prominence and syntactic cues were on the same word, that word was perceived as being in focus. However, in ‘mismatch’ cases, e.g., 是[船长]F 穿上的[雨衣]F ‘It was the [captain]F who put on the [raincoat]F’ (F indicates focus), the focus was perceived to be on raincoat, the word that had contrastive prominence. In other words, participants weighted prosodic cues more highly. This suggests that prosodic prominence is a stronger focus cue than syntax in Mandarin.  Experiment 3 looked at the role of prosodic and syntactic cues in listeners’ encoding of discourse information in a speeded ‘false alternative’ rejection task. This experiment shows that false alternatives to a word in a sentence (e.g., sailor to captain in ‘The captain put on the raincoat’) were more easily rejected if captain was marked with prosodic cues than with syntactic cues. This experiment shows congruent results to those of Experiment 2, in that prosodic cues were more effective than syntactic cues in encoding discourse information. It seems that a more important marker of focus provides more effective encoding of discourse information.  Experiments 4A and 4B investigated the role of prosodic and syntactic focus cues in the activation of discourse information in Mandarin, using the cross-modal lexical priming paradigm. Both studies consistently show that prosodic focus marking, but not syntactic focus marking, facilitates the activation of identical targets (e.g., captain after hearing ‘The captain put on the raincoat’). Similarly, prosodic focus marking, but not syntactic focus marking, primes alternatives (e.g., sailor). But focus marking does not prime noncontrastive associates (e.g., deck). These findings, together with previous findings on focus particles (e.g., only), suggest that alternative priming is particularly related to contrastive prominence, at least in languages looked at to date. The relative priming effects of prosodic and syntactic focus cues in Experiments 4A and 4B are in line with their relative weights in Experiments 2 and 3.   This thesis presents a crucial link between the relative weights of prosodic and syntactic cues in marking focus, their degrees of effectiveness in encoding discourse information and their ability to activate discourse information in Mandarin. This research contributes significantly to our cross-linguistic understanding of prosodic and syntactic focus in speech processing, showing the processing advantages of focus may be common across languages, but what cues trigger the effects differ by language.</p>


2021 ◽  
Vol 6 ◽  
Author(s):  
Nikole Giovannone ◽  
Rachel M. Theodore

Previous research suggests that individuals with weaker receptive language show increased reliance on lexical information for speech perception relative to individuals with stronger receptive language, which may reflect a difference in how acoustic-phonetic and lexical cues are weighted for speech processing. Here we examined whether this relationship is the consequence of conflict between acoustic-phonetic and lexical cues in speech input, which has been found to mediate lexical reliance in sentential contexts. Two groups of participants completed standardized measures of language ability and a phonetic identification task to assess lexical recruitment (i.e., a Ganong task). In the high conflict group, the stimulus input distribution removed natural correlations between acoustic-phonetic and lexical cues, thus placing the two cues in high competition with each other; in the low conflict group, these correlations were present and thus competition was reduced as in natural speech. The results showed that 1) the Ganong effect was larger in the low compared to the high conflict condition in single-word contexts, suggesting that cue conflict dynamically influences online speech perception, 2) the Ganong effect was larger for those with weaker compared to stronger receptive language, and 3) the relationship between the Ganong effect and receptive language was not mediated by the degree to which acoustic-phonetic and lexical cues conflicted in the input. These results suggest that listeners with weaker language ability down-weight acoustic-phonetic cues and rely more heavily on lexical knowledge, even when stimulus input distributions reflect characteristics of natural speech input.


Sign in / Sign up

Export Citation Format

Share Document