scholarly journals Speech Intelligibility Predicted from Neural Entrainment of the Speech Envelope

2018 ◽  
Vol 19 (2) ◽  
pp. 181-191 ◽  
Author(s):  
Jonas Vanthornhout ◽  
Lien Decruy ◽  
Jan Wouters ◽  
Jonathan Z. Simon ◽  
Tom Francart
2018 ◽  
Author(s):  
Jonas Vanthornhout ◽  
Lien Decruy ◽  
Jan Wouters ◽  
Jonathan Z. Simon ◽  
Tom Francart

AbstractSpeech intelligibility is currently measured by scoring how well a person can identify a speech signal. The results of such behavioral measures reflect neural processing of the speech signal, but are also influenced by language processing, motivation and memory. Very often electrophysiological measures of hearing give insight in the neural processing of sound. However, in most methods non-speech stimuli are used, making it hard to relate the results to behavioral measures of speech intelligibility. The use of natural running speech as a stimulus in electrophysiological measures of hearing is a paradigm shift which allows to bridge the gap between behavioral and electrophysiological measures. Here, by decoding the speech envelope from the electroencephalogram, and correlating it with the stimulus envelope, we demonstrate an electrophysiological measure of neural processing of running speech. We show that behaviorally measured speech intelligibility is strongly correlated with our electrophysiological measure. Our results pave the way towards an objective and automatic way of assessing neural processing of speech presented through auditory prostheses, reducing confounds such as attention and cognitive capabilities. We anticipate that our electrophysiological measure will allow better differential diagnosis of the auditory system, and will allow the development of closed-loop auditory prostheses that automatically adapt to individual users.


2019 ◽  
Author(s):  
Sankar Mukherjee ◽  
Alice Tomassini ◽  
Leonardo Badino ◽  
Aldo Pastore ◽  
Luciano Fadiga ◽  
...  

AbstractCortical entrainment to the (quasi-) rhythmic components of speech seems to play an important role in speech comprehension. It has been suggested that neural entrainment may reflect top-down temporal predictions of sensory signals. Key properties of a predictive model are its anticipatory nature and its ability to reconstruct missing information. Here we put both these two properties to experimental test. We acoustically presented sentences and measured cortical entrainment to both acoustic speech envelope and lips kinematics acquired from the speaker but not visible to the participants. We then analyzed speech-brain and lips-brain coherence at multiple negative and positive lags. Besides the well-known cortical entrainment to the acoustic speech envelope, we found significant entrainment in the delta range to the (latent) lips kinematics. Most interestingly, the two entrainment phenomena were temporally dissociated. While entrainment to the acoustic speech peaked around +0.3 s lag (i.e., when EEG followed speech by 0.3 s), entrainment to the lips was significantly anticipated and peaked around 0-0.1 s lag (i.e., when EEG was virtually synchronous to the putative lips movement). Our results demonstrate that neural entrainment during speech listening involves the anticipatory reconstruction of missing information related to lips movement production, indicating its fundamentally predictive nature and thus supporting analysis by synthesis models.


2020 ◽  
Author(s):  
Di Zhou ◽  
Gaoyan Zhang ◽  
Jianwu Dang ◽  
Shuang Wu ◽  
Zhuo Zhang

2019 ◽  
Vol 9 (3) ◽  
pp. 70 ◽  
Author(s):  
Brett Myers ◽  
Miriam Lense ◽  
Reyna Gordon

Prosodic cues in speech are indispensable for comprehending a speaker’s message, recognizing emphasis and emotion, parsing segmental units, and disambiguating syntactic structures. While it is commonly accepted that prosody provides a fundamental service to higher-level features of speech, the neural underpinnings of prosody processing are not clearly defined in the cognitive neuroscience literature. Many recent electrophysiological studies have examined speech comprehension by measuring neural entrainment to the speech amplitude envelope, using a variety of methods including phase-locking algorithms and stimulus reconstruction. Here we review recent evidence for neural tracking of the speech envelope and demonstrate the importance of prosodic contributions to the neural tracking of speech. Prosodic cues may offer a foundation for supporting neural synchronization to the speech envelope, which scaffolds linguistic processing. We argue that prosody has an inherent role in speech perception, and future research should fill the gap in our knowledge of how prosody contributes to speech envelope entrainment.


2018 ◽  
Author(s):  
Eline Verschueren ◽  
Jonas Vanthornhout ◽  
Tom Francart

ABSTRACTObjectivesRecently an objective measure of speech intelligibility, based on brain responses derived from the electroencephalogram (EEG), has been developed using isolated Matrix sentences as a stimulus. We investigated whether this objective measure of speech intelligibility can also be used with natural speech as a stimulus, as this would be beneficial for clinical applications.DesignWe recorded the EEG in 19 normal-hearing participants while they listened to two types of stimuli: Matrix sentences and a natural story. Each stimulus was presented at different levels of speech intelligibility by adding speech weighted noise. Speech intelligibility was assessed in two ways for both stimuli: (1) behaviorally and (2) objectively by reconstructing the speech envelope from the EEG using a linear decoder and correlating it with the acoustic envelope. We also calculated temporal response functions (TRFs) to investigate the temporal characteristics of the brain responses in the EEG channels covering different brain areas.ResultsFor both stimulus types the correlation between the speech envelope and the reconstructed envelope increased with increasing speech intelligibility. In addition, correlations were higher for the natural story than for the Matrix sentences. Similar to the linear decoder analysis, TRF amplitudes increased with increasing speech intelligibility for both stimuli. Remarkable is that although speech intelligibility remained unchanged in the no noise and +2.5 dB SNR condition, neural speech processing was affected by the addition of this small amount of noise: TRF amplitudes across the entire scalp decreased between 0 to 150 ms, while amplitudes between 150 to 200 ms increased in the presence of noise. TRF latency changes in function of speech intelligibility appeared to be stimulus specific: The latency of the prominent negative peak in the early responses (50-300 ms) increased with increasing speech intelligibility for the Matrix sentences, but remained unchanged for the natural story.ConclusionsThese results show (1) the feasibility of natural speech as a stimulus for the objective measure of speech intelligibility, (2) that neural tracking of speech is enhanced using a natural story compared to Matrix sentences and (3) that noise and the stimulus type can change the temporal characteristics of the brain responses. These results might reflect the integration of incoming acoustic features and top-down information, suggesting that the choice of the stimulus has to be considered based on the intended purpose of the measurement.


2021 ◽  
Author(s):  
Na Xu ◽  
Baotian Zhao ◽  
Lu Luo ◽  
Kai Zhang ◽  
Xiaoqiu Shao ◽  
...  

The envelope is essential for speech perception. Recent studies have shown that cortical activity can track the acoustic envelope. However, whether the tracking strength reflects the extent of speech intelligibility processing remains controversial. Here, using stereo-electroencephalogram (sEEG) technology, we directly recorded the activity in human auditory cortex while subjects listened to either natural or noise-vocoded speech. These two stimuli have approximately identical envelopes, but the noise-vocoded speech does not have speech intelligibility. We found two stages of envelope tracking in auditory cortex: an early high-γ (60-140 Hz) power stage (delay ≈ 49 ms) that preferred the noise-vocoded speech, and a late θ (4-8 Hz) phase stage (delay ≈ 178 ms) that preferred the natural speech. Furthermore, the decoding performance of high-γ power was better in primary auditory cortex than in non-primary auditory cortex, consistent with its short tracking delay. We also found distinct lateralization effects: high-γ power envelope tracking dominated left auditory cortex, while θ phase showed better decoding performance in right auditory cortex. In sum, we suggested a functional dissociation between high-γ power and θ phase: the former reflects fast and automatic processing of brief acoustic features, while the latter correlates to slow build-up processing facilitated by speech intelligibility.


2019 ◽  
Author(s):  
Peng Zan ◽  
Alessandro Presacco ◽  
Samira Anderson ◽  
Jonathan Z. Simon

AbstractAging is associated with an exaggerated representation of the speech envelope in auditory cortex. The relationship between this age-related exaggerated response and a listener’s ability to understand speech in noise remains an open question. Here, information-theory-based analysis methods are applied to magnetoencephalography (MEG) recordings of human listeners, investigating their cortical responses to continuous speech, using the novel non-linear measure of phase-locked mutual information between the speech stimuli and cortical responses. The cortex of older listeners shows an exaggerated level of mutual information, compared to younger listeners, for both attended and unattended speakers. The mutual information peaks for several distinct latencies: early (∼50 ms), middle (∼100 ms) and late (∼200 ms). For the late component, the neural enhancement of attended over unattended speech is affected by stimulus SNR, but the direction of this dependency is reversed by aging. Critically, in older listeners and for the same late component, greater cortical exaggeration is correlated with decreased behavioral inhibitory control. This negative correlation also carries over to speech intelligibility in noise, where greater cortical exaggeration in older listeners is correlated with worse speech intelligibility scores. Finally, an age-related lateralization difference is also seen for the ∼100 ms latency peaks, where older listeners show a bilateral response compared to younger listeners’ right-lateralization. Thus, this information-theory-based analysis provides new, and less coarse-grained, results regarding age-related change in auditory cortical speech processing, and its correlation with cognitive measures, compared to related linear measures.New & NoteworthyCortical representations of natural speech are investigated using a novel non-linear approach based on mutual information. Cortical responses, phase-locked to the speech envelope, show an exaggerated level of mutual information associated with aging, appearing at several distinct latencies (∼50, ∼100 and ∼200 ms). Critically, for older listeners only, the ∼200 ms latency response components are correlated with specific behavioral measures, including behavioral inhibition and speech comprehension.


2019 ◽  
Author(s):  
Guangting Mai ◽  
William S-Y. Wang

AbstractNeural entrainment of acoustic envelopes is important for speech intelligibility in spoken language processing. However, it is unclear how it contributes to processing at different linguistic hierarchical levels. The present EEG study investigated this issue when participants responded to stimuli that dissociated phonological and semantic processing (real-word, pseudo-word and backward utterances). Multivariate Temporal Response Function (mTRF) model was adopted to map speech envelopes from multiple spectral bands onto EEG signals, providing a direct approach to measure neural entrainment. We tested the hypothesis that entrainment at delta (supra-syllabic) and theta (syllabic and sub-syllabic) bands take distinct roles at different hierarchical levels. Results showed that both types of entrainment involve speech-specific processing, but their underlying mechanisms were different. Theta-band entrainment was modulated by phonological but not semantic contents, reflecting the possible mechanism of tracking syllabic- and sub-syllabic patterns during phonological processing. Delta-band entrainment, on the other hand, was modulated by semantic information, indexing more attention-demanding, effortful phonological encoding when higher-level (semantic) information is deficient. Interestingly, we further demonstrated that the statistical capacity of mTRFs at the delta band and theta band to classify utterances is affected by their semantic (real-word vs. pseudo-word) and phonological (real-word and pseudo-word vs. backward) contents, respectively. Moreover, analyses on the response weighting of mTRFs showed that delta-band entrainment sustained across neural processing stages up to higher-order timescales (~ 300 ms), while theta-band entrainment occurred mainly at early, perceptual processing stages (< 160 ms). This indicates that, compared to theta-band entrainment, delta-band entrainment may reflect increased involvement of higher-order cognitive functions during interactions between phonological and semantic processing. As such, we conclude that neural entrainment is not only associated with speech intelligibility, but also with the hierarchy of linguistic (phonological and semantic) content. The present study thus provide a new insight into cognitive mechanisms of neural entrainment for spoken language processing.HighlightsLow-frequency neural entrainment was examined via mTRF models in EEG during phonological and semantic processing.Delta entrainment take roles in effortful listening for phonological recognitionTheta entrainment take roles in tracking syllabic and subsyllabic patterns for phonological processingDelta and theta entrainment sustain at different timescales of neural processing


Sign in / Sign up

Export Citation Format

Share Document