scholarly journals General auditory and speech-specific contributions to cortical envelope tracking revealed using auditory chimeras

Author(s):  
Kevin D. Prinsloo ◽  
Edmund C. Lalor

AbstractIn recent years research on natural speech processing has benefited from recognizing that low frequency cortical activity tracks the amplitude envelope of natural speech. However, it remains unclear to what extent this tracking reflects speech-specific processing beyond the analysis of the stimulus acoustics. In the present study, we aimed to disentangle contributions to cortical envelope tracking that reflect general acoustic processing from those that are functionally related to processing speech. To do so, we recorded EEG from subjects as they listened to “auditory chimeras” – stimuli comprised of the temporal fine structure (TFS) of one speech stimulus modulated by the amplitude envelope (ENV) of another speech stimulus. By varying the number of frequency bands used in making the chimeras, we obtained some control over which speech stimulus was recognized by the listener. No matter which stimulus was recognized, envelope tracking was always strongest for the ENV stimulus, indicating a dominant contribution from acoustic processing. However, there was also a positive relationship between intelligibility and the tracking of the perceived speech, indicating a contribution from speech specific processing. These findings were supported by a follow-up analysis that assessed envelope tracking as a function of the (estimated) output of the cochlea rather than the original stimuli used in creating the chimeras. Finally, we sought to isolate the speech-specific contribution to envelope tracking using forward encoding models and found that indices of phonetic feature processing tracked reliably with intelligibility. Together these results show that cortical speech tracking is dominated by acoustic processing, but also reflects speech-specific processing.This work was supported by a Career Development Award from Science Foundation Ireland (CDA/15/3316) and a grant from the National Institute on Deafness and Other Communication Disorders (DC016297). The authors thank Dr. Aaron Nidiffer, Dr. Aisling O’Sullivan, Thomas Stoll and Lauren Szymula for assistance with data collection, and Dr. Nathaniel Zuk, Dr. Aaron Nidiffer, Dr. Aisling O’Sullivan for helpful comments on this manuscript.Significance StatementActivity in auditory cortex is known to dynamically track the energy fluctuations, or amplitude envelope, of speech. Measures of this tracking are now widely used in research on hearing and language and have had a substantial influence on theories of how auditory cortex parses and processes speech. But, how much of this speech tracking is actually driven by speech-specific processing rather than general acoustic processing is unclear, limiting its interpretability and its usefulness. Here, by merging two speech stimuli together to form so-called auditory chimeras, we show that EEG tracking of the speech envelope is dominated by acoustic processing, but also reflects linguistic analysis. This has important implications for theories of cortical speech tracking and for using measures of that tracking in applied research.

2019 ◽  
Author(s):  
Shyanthony R. Synigal ◽  
Emily S. Teoh ◽  
Edmund C. Lalor

ABSTRACTThe human auditory system is adept at extracting information from speech in both single-speaker and multi-speaker situations. This involves neural processing at the rapid temporal scales seen in natural speech. Non-invasive brain imaging (electro-/magnetoencephalography [EEG/MEG]) signatures of such processing have shown that the phase of neural activity below 16 Hz tracks the dynamics of speech, whereas invasive brain imaging (electrocorticography [ECoG]) has shown that such rapid processing is even more strongly reflected in the power of neural activity at high frequencies (around 70-150 Hz; known as high gamma). The aim of this study was to determine if high gamma power in scalp recorded EEG carries useful stimulus-related information, despite its reputation for having a poor signal to noise ratio. Furthermore, we aimed to assess whether any such information might be complementary to that reflected in well-established low frequency EEG indices of speech processing. We used linear regression to investigate speech envelope and attention decoding in EEG at low frequencies, in high gamma power, and in both signals combined. While low frequency speech tracking was evident for almost all subjects as expected, high gamma power also showed robust speech tracking in a minority of subjects. This same pattern was true for attention decoding using a separate group of subjects who undertook a cocktail party attention experiment. For the subjects who showed speech tracking in high gamma power, the spatiotemporal characteristics of that high gamma tracking differed from that of low-frequency EEG. Furthermore, combining the two neural measures led to improved measures of speech tracking for several subjects. Overall, this indicates that high gamma power EEG can carry useful information regarding speech processing and attentional selection in some subjects and combining it with low frequency EEG can improve the mapping between natural speech and the resulting neural responses.


2020 ◽  
Author(s):  
Eline Verschueren ◽  
Jonas Vanthornhout ◽  
Tom Francart

ABSTRACTObjectivesThe last years there has been significant interest in attempting to recover the temporal envelope of a speech signal from the neural response to investigate neural speech processing. The research focus is now broadening from neural speech processing in normal-hearing listeners towards hearing-impaired listeners. When testing hearing-impaired listeners speech has to be amplified to resemble the effect of a hearing aid and compensate peripheral hearing loss. Until today, it is not known with certainty how or if neural speech tracking is influenced by sound amplification. As these higher intensities could influence the outcome, we investigated the influence of stimulus intensity on neural speech tracking.DesignWe recorded the electroencephalogram (EEG) of 20 normal-hearing participants while they listened to a narrated story. The story was presented at intensities from 10 to 80 dB A. To investigate the brain responses, we analyzed neural tracking of the speech envelope by reconstructing the envelope from EEG using a linear decoder and by correlating the reconstructed with the actual envelope. We investigated the delta (0.5-4 Hz) and the theta (4-8 Hz) band for each intensity. We also investigated the latencies and amplitudes of the responses in more detail using temporal response functions which are the estimated linear response functions between the stimulus envelope and the EEG.ResultsNeural envelope tracking is dependent on stimulus intensity in both the TRF and envelope reconstruction analysis. However, provided that the decoder is applied on data of the same stimulus intensity as it was trained on, envelope reconstruction is robust to stimulus intensity. In addition, neural envelope tracking in the delta (but not theta) band seems to relate to speech intelligibility. Similar to the linear decoder analysis, TRF amplitudes and latencies are dependent on stimulus intensity: The amplitude of peak 1 (30-50 ms) increases and the latency of peak 2 (140-160 ms) decreases with increasing stimulus intensity.ConclusionAlthough brain responses are influenced by stimulus intensity, neural envelope tracking is robust to stimulus intensity when using the same intensity to test and train the decoder. Therefore we can assume that intensity is not a confound when testing hearing-impaired participants with amplified speech using the linear decoder approach. In addition, neural envelope tracking in the delta band appears to be correlated with speech intelligibility, showing the potential of neural envelope tracking as an objective measure of speech intelligibility.


2020 ◽  
Author(s):  
Katsuaki Kojima ◽  
Yulia Oganian ◽  
Chang Cai ◽  
Anne Findlay ◽  
Edward Chang ◽  
...  

AbstractThe amplitude envelope of speech is crucial for accurate comprehension, and several studies have shown that the phase of neural activity in the theta-delta bands (1 - 10 Hz) tracks the phase of the speech amplitude envelope during listening, a process referred to as envelope tracking. However, the mechanisms underlying envelope tracking have been heavily debated. A dominant model posits that envelope tracking reflects continuous entrainment of endogenous low-frequency oscillations to the speech envelope. However, it has proven challenging to distinguish this from the alternative that envelope tracking reflects evoked responses to acoustic landmarks within the envelope. Here we recorded magnetoencephalography while participants listened to natural and slowed speech to test two critical predictions of the entrainment model: (1) that the frequency range of phase-locking reflects the stimulus speech rate and (2) that an entrained oscillator will resonate for multiple cycles after a landmark-driven phase reset. We found that peaks in the rate of envelope change, acoustic edges, induced evoked responses and theta phase locking. Crucially, the frequency range of this phase locking was independent of the speech rate and transient, in line with the evoked response account. Further comparisons between regular and slowed speech revealed that encoding of acoustic edge magnitudes was invariant to contextual speech rate, demonstrating that it was normalized for speech rate. Taken together, our results show that the evoked response model provides a better account of neural phase locking to the speech envelope than oscillatory entrainment.


2012 ◽  
Vol 2012 ◽  
pp. 1-7
Author(s):  
Vijaya Kumar Name ◽  
C. S. Vanaja

Background. The aim of this study was to investigate the individual effects of envelope enhancement and high-pass filtering (500 Hz) on word identification scores in quiet for individuals with Auditory Neuropathy. Method. Twelve individuals with Auditory Neuropathy (six males and six females) with ages ranging from 12 to 40 years participated in the study. Word identification was assessed using bi-syllabic words in each of three speech processing conditions: unprocessed, envelope-enhanced, and high-pass filtered. All signal processing was carried out using MATLAB-7. Results. Word identification scores showed a mean improvement of 18% with envelope enhanced versus unprocessed speech. No significant improvement was observed with high-pass filtered versus unprocessed speech. Conclusion. These results suggest that the compression/expansion signal processing strategy enhances speech identification scores—at least for mild and moderately impaired individuals with AN. In contrast, simple high-pass filtering (i.e., eliminating the low-frequency content of the signal) does not improve speech perception in quiet for individuals with Auditory Neuropathy.


2013 ◽  
Vol 25 (2) ◽  
pp. 175-187 ◽  
Author(s):  
Jihoon Oh ◽  
Jae Hyung Kwon ◽  
Po Song Yang ◽  
Jaeseung Jeong

Neural responses in early sensory areas are influenced by top–down processing. In the visual system, early visual areas have been shown to actively participate in top–down processing based on their topographical properties. Although it has been suggested that the auditory cortex is involved in top–down control, functional evidence of topographic modulation is still lacking. Here, we show that mental auditory imagery for familiar melodies induces significant activation in the frequency-responsive areas of the primary auditory cortex (PAC). This activation is related to the characteristics of the imagery: when subjects were asked to imagine high-frequency melodies, we observed increased activation in the high- versus low-frequency response area; when the subjects were asked to imagine low-frequency melodies, the opposite was observed. Furthermore, we found that A1 is more closely related to the observed frequency-related modulation than R in tonotopic subfields of the PAC. Our findings suggest that top–down processing in the auditory cortex relies on a mechanism similar to that used in the perception of external auditory stimuli, which is comparable to early visual systems.


2020 ◽  
Vol 123 (2) ◽  
pp. 695-706
Author(s):  
Lu Luo ◽  
Na Xu ◽  
Qian Wang ◽  
Liang Li

The central mechanisms underlying binaural unmasking for spectrally overlapping concurrent sounds, which are unresolved in the peripheral auditory system, remain largely unknown. In this study, frequency-following responses (FFRs) to two binaurally presented independent narrowband noises (NBNs) with overlapping spectra were recorded simultaneously in the inferior colliculus (IC) and auditory cortex (AC) in anesthetized rats. The results showed that for both IC FFRs and AC FFRs, introducing an interaural time difference (ITD) disparity between the two concurrent NBNs enhanced the representation fidelity, reflected by the increased coherence between the responses evoked by double-NBN stimulation and the responses evoked by single NBNs. The ITD disparity effect varied across frequency bands, being more marked for higher frequency bands in the IC and lower frequency bands in the AC. Moreover, the coherence between IC responses and AC responses was also enhanced by the ITD disparity, and the enhancement was most prominent for low-frequency bands and the IC and the AC on the same side. These results suggest a critical role of the ITD cue in the neural segregation of spectrotemporally overlapping sounds. NEW & NOTEWORTHY When two spectrally overlapped narrowband noises are presented at the same time with the same sound-pressure level, they mask each other. Introducing a disparity in interaural time difference between these two narrowband noises improves the accuracy of the neural representation of individual sounds in both the inferior colliculus and the auditory cortex. The lower frequency signal transformation from the inferior colliculus to the auditory cortex on the same side is also enhanced, showing the effect of binaural unmasking.


2019 ◽  
Author(s):  
Jérémy Giroud ◽  
Agnès Trébuchon ◽  
Daniele Schön ◽  
Patrick Marquis ◽  
Catherine Liegeois-Chauvel ◽  
...  

AbstractSpeech perception is mediated by both left and right auditory cortices, but with differential sensitivity to specific acoustic information contained in the speech signal. A detailed description of this functional asymmetry is missing, and the underlying models are widely debated. We analyzed cortical responses from 96 epilepsy patients with electrode implantation in left or right primary, secondary, and/or association auditory cortex. We presented short acoustic transients to reveal the stereotyped spectro-spatial oscillatory response profile of the auditory cortical hierarchy. We show remarkably similar bimodal spectral response profiles in left and right primary and secondary regions, with preferred processing modes in the theta (∼4-8 Hz) and low gamma (∼25-50 Hz) ranges. These results highlight that the human auditory system employs a two-timescale processing mode. Beyond these first cortical levels of auditory processing, a hemispheric asymmetry emerged, with delta and beta band (∼3/15 Hz) responsivity prevailing in the right hemisphere and theta and gamma band (∼6/40 Hz) activity in the left. These intracranial data provide a more fine-grained and nuanced characterization of cortical auditory processing in the two hemispheres, shedding light on the neural dynamics that potentially shape auditory and speech processing at different levels of the cortical hierarchy.Author summarySpeech processing is now known to be distributed across the two hemispheres, but the origin and function of lateralization continues to be vigorously debated. The asymmetric sampling in time (AST) hypothesis predicts that (1) the auditory system employs a two-timescales processing mode, (2) present in both hemispheres but with a different ratio of fast and slow timescales, (3) that emerges outside of primary cortical regions. Capitalizing on intracranial data from 96 epileptic patients we sensitively validated each of these predictions and provide a precise estimate of the processing timescales. In particular, we reveal that asymmetric sampling in associative areas is subtended by distinct two-timescales processing modes. Overall, our results shed light on the neurofunctional architecture of cortical auditory processing.


2018 ◽  
Author(s):  
Christian D. Márton ◽  
Makoto Fukushima ◽  
Corrie R. Camalier ◽  
Simon R. Schultz ◽  
Bruno B. Averbeck

AbstractPredictive coding is a theoretical framework that provides a functional interpretation of top-down and bottom up interactions in sensory processing. The theory has suggested that specific frequency bands relay bottom-up and top-down information (e.g. “γ up, β down”). But it remains unclear whether this notion generalizes to cross-frequency interactions. Furthermore, most of the evidence so far comes from visual pathways. Here we examined cross-frequency coupling across four sectors of the auditory hierarchy in the macaque. We computed two measures of cross-frequency coupling, phase-amplitude coupling (PAC) and amplitude-amplitude coupling (AAC). Our findings revealed distinct patterns for bottom-up and top-down information processing among cross-frequency interactions. Both top-down and bottom-up made prominent use of low frequencies: low-to-low frequency (θ, α, β) and low frequency-to-high γ couplings were predominant top-down, while low frequency-to-low γ couplings were predominant bottom-up. These patterns were largely preserved across coupling types (PAC and AAC) and across stimulus types (natural and synthetic auditory stimuli), suggesting they are a general feature of information processing in auditory cortex. Moreover, our findings showed that low-frequency PAC alternated between predominantly top-down or bottom-up over time. Altogether, this suggests sensory information need not be propagated along separate frequencies upwards and downwards. Rather, information can be unmixed by having low frequencies couple to distinct frequency ranges in the target region, and by alternating top-down and bottom-up processing over time.1SignificanceThe brain consists of highly interconnected cortical areas, yet the patterns in directional cortical communication are not fully understood, in particular with regards to interactions between different signal components across frequencies. We employed a a unified, computationally advantageous Granger-causal framework to examine bi-directional cross-frequency interactions across four sectors of the auditory cortical hierarchy in macaques. Our findings extend the view of cross-frequency interactions in auditory cortex, suggesting they also play a prominent role in top-down processing. Our findings also suggest information need not be propagated along separate channels up and down the cortical hierarchy, with important implications for theories of information processing in the brain such as predictive coding.


2020 ◽  
Vol 10 (1) ◽  
Author(s):  
Raphaël Thézé ◽  
Mehdi Ali Gadiri ◽  
Louis Albert ◽  
Antoine Provost ◽  
Anne-Lise Giraud ◽  
...  

Abstract Natural speech is processed in the brain as a mixture of auditory and visual features. An example of the importance of visual speech is the McGurk effect and related perceptual illusions that result from mismatching auditory and visual syllables. Although the McGurk effect has widely been applied to the exploration of audio-visual speech processing, it relies on isolated syllables, which severely limits the conclusions that can be drawn from the paradigm. In addition, the extreme variability and the quality of the stimuli usually employed prevents comparability across studies. To overcome these limitations, we present an innovative methodology using 3D virtual characters with realistic lip movements synchronized on computer-synthesized speech. We used commercially accessible and affordable tools to facilitate reproducibility and comparability, and the set-up was validated on 24 participants performing a perception task. Within complete and meaningful French sentences, we paired a labiodental fricative viseme (i.e. /v/) with a bilabial occlusive phoneme (i.e. /b/). This audiovisual mismatch is known to induce the illusion of hearing /v/ in a proportion of trials. We tested the rate of the illusion while varying the magnitude of background noise and audiovisual lag. Overall, the effect was observed in 40% of trials. The proportion rose to about 50% with added background noise and up to 66% when controlling for phonetic features. Our results conclusively demonstrate that computer-generated speech stimuli are judicious, and that they can supplement natural speech with higher control over stimulus timing and content.


Sign in / Sign up

Export Citation Format

Share Document