Neural markers of speech comprehension: measuring EEG tracking of linguistic speech representations, controlling the speech acoustics

When listening to speech, brain responses time-lock to acoustic events in the stimulus. Recent studies have also reported that cortical responses track linguistic representations of speech. However, tracking of these representations is often described without controlling for acoustic properties. Therefore, the response to these linguistic representations might reflect unaccounted acoustic processing rather than language processing. Here we tested several recently proposed linguistic representations, using audiobook speech, while controlling for acoustic and other linguistic representations. Indeed, some of these linguistic representations were not significantly tracked after controlling for acoustic properties. However, phoneme surprisal, cohort entropy, word surprisal and word frequency were significantly tracked over and beyond acoustic properties. Additionally, these linguistic representations are tracked similarly across different stories, spoken by different readers. Together, this suggests that these representations characterize processing of the linguistic content of speech and might allow a behaviour-free evaluation of the speech intelligibility.

Download Full-text

Behavioral and Brain Responses Highlight the Role of Usage in the Preparation of Multiword Utterances for Production

Journal of Cognitive Neuroscience ◽

10.1162/jocn_a_01757 ◽

2021 ◽

pp. 1-34

Author(s):

Hyein Jeong ◽

Emiel van den Hoven ◽

Sylvain Madec ◽

Audrey Bürki

Keyword(s):

Language Processing ◽

Time Windows ◽

Low Frequency ◽

Balanced Design ◽

Brain Responses ◽

Functional Locus ◽

Linguistic Representations ◽

Electrophysiological Signal ◽

The Brain ◽

Naming Latencies

Abstract Usage-based theories assume that all aspects of language processing are shaped by the distributional properties of the language. The frequency not only of words but also of larger chunks plays a major role in language processing. These theories predict that the frequency of phrases influences the time needed to prepare these phrases for production and their acoustic duration. By contrast, dominant psycholinguistic models of utterance production predict no such effects. In these models, the system keeps track of the frequency of individual words but not of co-occurrences. This study investigates the extent to which the frequency of phrases impacts naming latencies and acoustic duration with a balanced design, where the same words are recombined to build high- and low-frequency phrases. The brain signal of participants is recorded so as to obtain information on the electrophysiological bases and functional locus of frequency effects. Forty-seven participants named pictures using high- and low-frequency adjective–noun phrases. Naming latencies were shorter for high-frequency than low-frequency phrases. There was no evidence that phrase frequency impacted acoustic duration. The electrophysiological signal differed between high- and low-frequency phrases in time windows that do not overlap with conceptualization or articulation processes. These findings suggest that phrase frequency influences the preparation of phrases for production, irrespective of the lexical properties of the constituents, and that this effect originates at least partly when speakers access and encode linguistic representations. Moreover, this study provides information on how the brain signal recorded during the preparation of utterances changes with the frequency of word combinations.

Download Full-text

High-frequency neural activity predicts word parsing in ambiguous speech streams

Journal of Neurophysiology ◽

10.1152/jn.00074.2016 ◽

2016 ◽

Vol 116 (6) ◽

pp. 2497-2512 ◽

Cited By ~ 29

Author(s):

Anne Kösem ◽

Anahita Basirat ◽

Leila Azizi ◽

Virginie van Wassenhove

Keyword(s):

High Frequency ◽

Low Frequency ◽

Acoustic Properties ◽

Neural Oscillations ◽

Speech Comprehension ◽

Top Down ◽

Linguistic Representations ◽

High Frequency Activity ◽

Speech Percept ◽

The Brain

During speech listening, the brain parses a continuous acoustic stream of information into computational units (e.g., syllables or words) necessary for speech comprehension. Recent neuroscientific hypotheses have proposed that neural oscillations contribute to speech parsing, but whether they do so on the basis of acoustic cues (bottom-up acoustic parsing) or as a function of available linguistic representations (top-down linguistic parsing) is unknown. In this magnetoencephalography study, we contrasted acoustic and linguistic parsing using bistable speech sequences. While listening to the speech sequences, participants were asked to maintain one of the two possible speech percepts through volitional control. We predicted that the tracking of speech dynamics by neural oscillations would not only follow the acoustic properties but also shift in time according to the participant's conscious speech percept. Our results show that the latency of high-frequency activity (specifically, beta and gamma bands) varied as a function of the perceptual report. In contrast, the phase of low-frequency oscillations was not strongly affected by top-down control. Whereas changes in low-frequency neural oscillations were compatible with the encoding of prelexical segmentation cues, high-frequency activity specifically informed on an individual's conscious speech percept.

Download Full-text

Phoneme-level processing in low-frequency cortical responses to speech explained by acoustic features

10.1101/448134 ◽

2018 ◽

Author(s):

Christoph Daube ◽

Robin A. A. Ince ◽

Joachim Gross

Keyword(s):

Sound Pressure ◽

Low Frequency ◽

Speech Comprehension ◽

Acoustic Features ◽

Linguistic Features ◽

Non Invasive ◽

Model Based ◽

Healthy Humans ◽

Brain Responses ◽

Cortical Responses

AbstractWhen we listen to speech, we have to make sense of a waveform of sound pressure. Hierarchical models of speech perception assume that before giving rise to its final semantic meaning, the signal is transformed into unknown intermediate neuronal representations. Classically, studies of such intermediate representations are guided by linguistically defined concepts such as phonemes. Here we argue that in order to arrive at an unbiased understanding of the mechanisms of speech comprehension, the focus should instead lie on representations obtained directly from the stimulus. We illustrate our view with a strongly data-driven analysis of a dataset of 24 young, healthy humans who listened to a narrative of one hour duration while their magnetoencephalogram (MEG) was recorded. We find that two recent results, a performance gain of an encoding model based on acoustic and annotated linguistic features over a model based on acoustic features alone as well as the decoding of subgroups of phonemes from phoneme-locked responses, can be explained with an encoding model entirely based on acoustic features. These acoustic features capitalise on acoustic edges and outperform Gabor-filtered spectrograms, features with the potential to describe the spectrotemporal characteristics of individual phonemes. We conclude that models of brain responses based on linguistic features can serve as excellent benchmarks. However, we put forward that linguistic concepts are better used when interpreting models, not when building them. In doing so, we find that the results of our analyses favour syllables over phonemes as candidate intermediate speech representations visible with fast non-invasive neuroimaging.

Download Full-text

Exaggerated Cortical Representation of Speech in Older Listeners: Mutual Information Analysis

10.1101/2019.12.18.881334 ◽

2019 ◽

Author(s):

Peng Zan ◽

Alessandro Presacco ◽

Samira Anderson ◽

Jonathan Z. Simon

Keyword(s):

Information Theory ◽

Mutual Information ◽

Speech Intelligibility ◽

Coarse Grained ◽

Speech Comprehension ◽

Late Component ◽

Age Related ◽

Cortical Responses ◽

Non Linear ◽

Speech Envelope

AbstractAging is associated with an exaggerated representation of the speech envelope in auditory cortex. The relationship between this age-related exaggerated response and a listener’s ability to understand speech in noise remains an open question. Here, information-theory-based analysis methods are applied to magnetoencephalography (MEG) recordings of human listeners, investigating their cortical responses to continuous speech, using the novel non-linear measure of phase-locked mutual information between the speech stimuli and cortical responses. The cortex of older listeners shows an exaggerated level of mutual information, compared to younger listeners, for both attended and unattended speakers. The mutual information peaks for several distinct latencies: early (∼50 ms), middle (∼100 ms) and late (∼200 ms). For the late component, the neural enhancement of attended over unattended speech is affected by stimulus SNR, but the direction of this dependency is reversed by aging. Critically, in older listeners and for the same late component, greater cortical exaggeration is correlated with decreased behavioral inhibitory control. This negative correlation also carries over to speech intelligibility in noise, where greater cortical exaggeration in older listeners is correlated with worse speech intelligibility scores. Finally, an age-related lateralization difference is also seen for the ∼100 ms latency peaks, where older listeners show a bilateral response compared to younger listeners’ right-lateralization. Thus, this information-theory-based analysis provides new, and less coarse-grained, results regarding age-related change in auditory cortical speech processing, and its correlation with cognitive measures, compared to related linear measures.New & NoteworthyCortical representations of natural speech are investigated using a novel non-linear approach based on mutual information. Cortical responses, phase-locked to the speech envelope, show an exaggerated level of mutual information associated with aging, appearing at several distinct latencies (∼50, ∼100 and ∼200 ms). Critically, for older listeners only, the ∼200 ms latency response components are correlated with specific behavioral measures, including behavioral inhibition and speech comprehension.

Download Full-text

Neural speech tracking shifts from the syllabic to the modulation rate of speech as intelligibility decreases

10.1101/2021.03.25.437033 ◽

2021 ◽

Author(s):

Fabian Schmidt ◽

Ya-Ping Chen ◽

Anne Keitel ◽

Sebastian Rösch ◽

Ronny Hannemann ◽

...

Keyword(s):

Language Processing ◽

Speech Processing ◽

Auditory Processing ◽

Speech Intelligibility ◽

Speech Comprehension ◽

Modulation Rate ◽

Acoustic Modulation ◽

Cortical Regions ◽

Envelope Modulation ◽

Speech Tracking

ABSTRACTThe most prominent acoustic features in speech are intensity modulations, represented by the amplitude envelope of speech. Synchronization of neural activity with these modulations is vital for speech comprehension. As the acoustic modulation of speech is related to the production of syllables, investigations of neural speech tracking rarely distinguish between lower-level acoustic (envelope modulation) and higher-level linguistic (syllable rate) information. Here we manipulated speech intelligibility using noise-vocoded speech and investigated the spectral dynamics of neural speech processing, across two studies at cortical and subcortical levels of the auditory hierarchy, using magnetoencephalography. Overall, cortical regions mostly track the syllable rate, whereas subcortical regions track the acoustic envelope. Furthermore, with less intelligible speech, tracking of the modulation rate becomes more dominant. Our study highlights the importance of distinguishing between envelope modulation and syllable rate and provides novel possibilities to better understand differences between auditory processing and speech/language processing disorders.Abstract Figure

Download Full-text

The Effect of Binaural Beamforming Technology on Speech Intelligibility in Bimodal Cochlear Implant Recipients

Audiology and Neurotology ◽

10.1159/000487749 ◽

2018 ◽

Vol 23 (1) ◽

pp. 32-38 ◽

Cited By ~ 1

Author(s):

Jantien L. Vroegop ◽

Nienke C. Homans ◽

André Goedegebure ◽

J. Gertjan Dingemanse ◽

Teun van Immerzeel ◽

...

Keyword(s):

Cochlear Implant ◽

Repeated Measures ◽

Speech Intelligibility ◽

Real Life ◽

Speech Comprehension ◽

Noise Sources ◽

Additional Advantage ◽

Listening Condition ◽

Repeated Measures Design ◽

Speech Reception Thresholds

Although the benefit of bimodal listening in cochlear implant users has been agreed on, speech comprehension remains a challenge in acoustically complex real-life environments due to reverberation and disturbing background noises. One way to additionally improve bimodal auditory performance is the use of directional microphones. The objective of this study was to investigate the effect of a binaural beamformer for bimodal cochlear implant (CI) users. This prospective study measured speech reception thresholds (SRT) in noise in a repeated-measures design that varied in listening modality for static and dynamic listening conditions. A significant improvement in SRT of 4.7 dB was found with the binaural beamformer switched on in the bimodal static listening condition. No significant improvement was found in the dynamic listening condition. We conclude that there is a clear additional advantage of the binaural beamformer in bimodal CI users for predictable/static listening conditions with frontal target speech and spatially separated noise sources.

Download Full-text

The Effect of Familiarity on Neural Representations of Music and Language

Journal of Cognitive Neuroscience ◽

10.1162/jocn_a_01737 ◽

2021 ◽

pp. 1-17

Author(s):

Avital Sternin ◽

Lucy M. McGarry ◽

Adrian M. Owen ◽

Jessica A. Grahn

Keyword(s):

Language Processing ◽

Instrumental Music ◽

Training Period ◽

Neural Representations ◽

Music And Language ◽

Brain Responses ◽

Language And Music ◽

Before And After ◽

Musical Memory ◽

The Brain

Abstract We investigated how familiarity alters music and language processing in the brain. We used fMRI to measure brain responses before and after participants were familiarized with novel music and language stimuli. To manipulate the presence of language and music in the stimuli, there were four conditions: (1) whole music (music and words together), (2) instrumental music (no words), (3) a capella music (sung words, no instruments), and (4) spoken words. To manipulate participants' familiarity with the stimuli, we used novel stimuli and a familiarization paradigm designed to mimic “natural” exposure, while controlling for autobiographical memory confounds. Participants completed two fMRI scans that were separated by a stimulus training period. Behaviorally, participants learned the stimuli over the training period. However, there were no significant neural differences between the familiar and unfamiliar stimuli in either univariate or multivariate analyses. There were differences in neural activity in frontal and temporal regions based on the presence of language in the stimuli, and these differences replicated across the two scanning sessions. These results indicate that the way we engage with music is important for creating a memory of that music, and these aspects, over and above familiarity on its own, may be responsible for the robust nature of musical memory in the presence of neurodegenerative disorders such as Alzheimer's disease.

Download Full-text

Language processing and language acquisition through the lens of Relational Morphology

The Texture of the Lexicon ◽

10.1093/oso/9780198827900.003.0007 ◽

2019 ◽

pp. 201-232

Author(s):

Ray Jackendoff ◽

Jenny Audring

Keyword(s):

Language Acquisition ◽

Language Processing ◽

Language Production ◽

Long Term Memory ◽

Spreading Activation ◽

Linguistic Representations ◽

Memory Network ◽

Complex Words

This chapter asks what is happening to linguistic representations during language use, and how representations are formed in the course of language acquisition. It is shown how Relational Morphology’s theory of representations can be directly embedded into models of processing and acquisition. Central is that the lexicon, complete with schemas and relational links, constitutes the long-term memory network that supports language production and comprehension. The chapter first discusses processing: the nature of working memory; promiscuous (opportunistic) processing; spreading activation; priming; probabilistic parsing; the balance between storage and computation in recognizing morphologically complex words; and the role of relational links and schemas in word retrieval. It then turns to acquisition, which is to be thought of as adding nodes and relational links to the lexical network. The general approach is based on the Propose but Verify procedure of Trueswell et al. (2013), plus conservative generalization, as in usage-based approaches.

Download Full-text

Word frequency counts

Lingvisticae Investigationes ◽

10.1075/li.00021.brz ◽

2018 ◽

Vol 41 (2) ◽

pp. 224-239

Author(s):

Bartosz Brzoza

Keyword(s):

Word Frequency ◽

Language Processing ◽

Applied Research ◽

Native Speakers ◽

Present Contribution ◽

Subjective Frequency ◽

Corpus Linguistic ◽

Comprehensive Picture ◽

The Relationship ◽

Frequency Counts

Abstract Lexical frequency is one of the major variables involved in language processing. It constitutes a cornerstone of psycholinguistic, corpus linguistic as well as applied research. Linguists take frequency counts from corpora and they started to take them for granted. However, voices emerge that corpora may not always provide a comprehensive picture of how frequently lexical items appear in a language. In the present contribution I compare corpus frequency counts for English and Polish words to native speakers’ perception of frequency. The analysis shows that, while generally objective and subjective values are related, there is a disparity between measures for frequent Polish words. The direction of the relationship, though positive, is also not as strong as in previous studies. I suggest linking objective with subjective frequency measures in research.

Download Full-text

Challenging listening environments in higher education: an analysis of academic classroom acoustics

Journal of Applied Research in Higher Education ◽

10.1108/jarhe-05-2020-0112 ◽

2020 ◽

Vol ahead-of-print (ahead-of-print) ◽

Author(s):

Kirsten van den Heuij ◽

Theo Goverts ◽

Karin Neijenhuis ◽

Martine Coene

Keyword(s):

Higher Education ◽

Background Noise ◽

Speech Intelligibility ◽

Academic Language ◽

Acoustic Properties ◽

Reference Level ◽

Teaching Staff ◽

Reference Levels ◽

Classroom Acoustics ◽

Content Type

PurposeAs oral communication in higher education is vital, good classroom acoustics is needed to pass the verbal message to university students. Non-auditory factors such as academic language, a non-native educational context and a diversity of acoustic settings in different types of classrooms affect speech understanding and performance of students. The purpose of this study is to find out whether the acoustic properties of the higher educational teaching contexts meet the recommended reference levels.Design/methodology/approachBackground noise levels and the Speech Transmission Index (STI) were assessed in 45 unoccupied university classrooms (15 lecture halls, 16 regular classrooms and 14 skills laboratories).FindingsThe findings of this study indicate that 41 classrooms surpassed the maximum reference level for background noise of 35 dB(A) and 17 exceeded the reference level of 40 dB(A). At five-meter distance facing the speaker, six classrooms indicated excellent speech intelligibility, while at more representative listening positions, none of the classrooms indicated excellent speech intelligibility. As the acoustic characteristics in a majority of the classrooms exceeded the available reference levels, speech intelligibility was likely to be insufficient.Originality/valueThis study seeks to assess the acoustics in academic classrooms against the available acoustic reference levels. Non-acoustic factors, such as academic language complexity and (non-)nativeness of the students and teaching staff, put higher cognitive demands upon listeners in higher education and need to be taken into account when using them in daily practice for regular students and students with language/hearing disabilities in particular.

Download Full-text