stimulus reconstruction
Recently Published Documents


TOTAL DOCUMENTS

33
(FIVE YEARS 8)

H-INDEX

6
(FIVE YEARS 3)

2022 ◽  
Author(s):  
Simon Geirnaert ◽  
Tom Francart ◽  
Alexander Bertrand

The goal of auditory attention decoding (AAD) is to determine to which speaker out of multiple competing speakers a listener is attending based on the brain signals recorded via, e.g., electroencephalography (EEG). AAD algorithms are a fundamental building block of so-called neuro-steered hearing devices that would allow identifying the speaker that should be amplified based on the brain activity. A common approach is to train a subject-specific decoder that reconstructs the amplitude envelope of the attended speech signal. However, training this decoder requires a dedicated 'ground-truth' EEG recording of the subject under test, during which the attended speaker is known. Furthermore, this decoder remains fixed during operation and can thus not adapt to changing conditions and situations. Therefore, we propose an online time-adaptive unsupervised stimulus reconstruction method that continuously and automatically adapts over time when new EEG and audio data are streaming in. The adaptive decoder does not require ground-truth attention labels obtained from a training session with the end-user, and instead can be initialized with a generic subject-independent decoder or even completely random values. We propose two different implementations: a sliding window and recursive implementation, which we extensively validate based on multiple performance metrics on three independent datasets. We show that the proposed time-adaptive unsupervised decoder outperforms a time-invariant supervised decoder, representing an important step towards practically applicable AAD algorithms for neuro-steered hearing devices.


2021 ◽  
Vol 15 ◽  
Author(s):  
Moïra-Phoebé Huet ◽  
Christophe Micheyl ◽  
Etienne Parizet ◽  
Etienne Gaudrain

During the past decade, several studies have identified electroencephalographic (EEG) correlates of selective auditory attention to speech. In these studies, typically, listeners are instructed to focus on one of two concurrent speech streams (the “target”), while ignoring the other (the “masker”). EEG signals are recorded while participants are performing this task, and subsequently analyzed to recover the attended stream. An assumption often made in these studies is that the participant’s attention can remain focused on the target throughout the test. To check this assumption, and assess when a participant’s attention in a concurrent speech listening task was directed toward the target, the masker, or neither, we designed a behavioral listen-then-recall task (the Long-SWoRD test). After listening to two simultaneous short stories, participants had to identify keywords from the target story, randomly interspersed among words from the masker story and words from neither story, on a computer screen. To modulate task difficulty, and hence, the likelihood of attentional switches, masker stories were originally uttered by the same talker as the target stories. The masker voice parameters were then manipulated to parametrically control the similarity of the two streams, from clearly dissimilar to almost identical. While participants listened to the stories, EEG signals were measured and subsequently, analyzed using a temporal response function (TRF) model to reconstruct the speech stimuli. Responses in the behavioral recall task were used to infer, retrospectively, when attention was directed toward the target, the masker, or neither. During the model-training phase, the results of these behavioral-data-driven inferences were used as inputs to the model in addition to the EEG signals, to determine if this additional information would improve stimulus reconstruction accuracy, relative to performance of models trained under the assumption that the listener’s attention was unwaveringly focused on the target. Results from 21 participants show that information regarding the actual – as opposed to, assumed – attentional focus can be used advantageously during model training, to enhance subsequent (test phase) accuracy of auditory stimulus-reconstruction based on EEG signals. This is the case, especially, in challenging listening situations, where the participants’ attention is less likely to remain focused entirely on the target talker. In situations where the two competing voices are clearly distinct and easily separated perceptually, the assumption that listeners are able to stay focused on the target is reasonable. The behavioral recall protocol introduced here provides experimenters with a means to behaviorally track fluctuations in auditory selective attention, including, in combined behavioral/neurophysiological studies.


2021 ◽  
Vol 15 ◽  
Author(s):  
Emina Alickovic ◽  
Elaine Hoi Ning Ng ◽  
Lorenz Fiedler ◽  
Sébastien Santurette ◽  
Hamish Innes-Brown ◽  
...  

ObjectivesPrevious research using non-invasive (magnetoencephalography, MEG) and invasive (electrocorticography, ECoG) neural recordings has demonstrated the progressive and hierarchical representation and processing of complex multi-talker auditory scenes in the auditory cortex. Early responses (<85 ms) in primary-like areas appear to represent the individual talkers with almost equal fidelity and are independent of attention in normal-hearing (NH) listeners. However, late responses (>85 ms) in higher-order non-primary areas selectively represent the attended talker with significantly higher fidelity than unattended talkers in NH and hearing–impaired (HI) listeners. Motivated by these findings, the objective of this study was to investigate the effect of a noise reduction scheme (NR) in a commercial hearing aid (HA) on the representation of complex multi-talker auditory scenes in distinct hierarchical stages of the auditory cortex by using high-density electroencephalography (EEG).DesignWe addressed this issue by investigating early (<85 ms) and late (>85 ms) EEG responses recorded in 34 HI subjects fitted with HAs. The HA noise reduction (NR) was either on or off while the participants listened to a complex auditory scene. Participants were instructed to attend to one of two simultaneous talkers in the foreground while multi-talker babble noise played in the background (+3 dB SNR). After each trial, a two-choice question about the content of the attended speech was presented.ResultsUsing a stimulus reconstruction approach, our results suggest that the attention-related enhancement of neural representations of target and masker talkers located in the foreground, as well as suppression of the background noise in distinct hierarchical stages is significantly affected by the NR scheme. We found that the NR scheme contributed to the enhancement of the foreground and of the entire acoustic scene in the early responses, and that this enhancement was driven by better representation of the target speech. We found that the target talker in HI listeners was selectively represented in late responses. We found that use of the NR scheme resulted in enhanced representations of the target and masker speech in the foreground and a suppressed representation of the noise in the background in late responses. We found a significant effect of EEG time window on the strengths of the cortical representation of the target and masker.ConclusionTogether, our analyses of the early and late responses obtained from HI listeners support the existing view of hierarchical processing in the auditory cortex. Our findings demonstrate the benefits of a NR scheme on the representation of complex multi-talker auditory scenes in different areas of the auditory cortex in HI listeners.


2020 ◽  
Author(s):  
Simon Geirnaert ◽  
Tom Francart ◽  
Alexander Bertrand

AbstractObjectiveNoise reduction algorithms in current hearing devices lack information about the sound source a user attends to when multiple sources are present. To resolve this issue, they can be complemented with auditory attention decoding (AAD) algorithms, which decode the attention using electroencephalography (EEG) sensors. State-of-the-art AAD algorithms employ a stimulus reconstruction approach, in which the envelope of the attended source is reconstructed from the EEG and correlated with the envelopes of the individual sources. This approach, however, performs poorly on short signal segments, while longer segments yield impractically long detection delays when the user switches attention.MethodsWe propose decoding the directional focus of attention using filterbank common spatial pattern filters (FB-CSP) as an alternative AAD paradigm, which does not require access to the clean source envelopes.ResultsThe proposed FB-CSP approach outperforms both the stimulus reconstruction approach on short signal segments, as well as a convolutional neural network approach on the same task. We achieve a high accuracy (80% for 1 s windows and 70% for quasi-instantaneous decisions), which is sufficient to reach minimal expected switch durations below 4 s. We also demonstrate that the decoder can adapt to unlabeled data from an unseen subject and works with only a subset of EEG channels located around the ear to emulate a wearable EEG setup.ConclusionThe proposed FB-CSP method provides fast and accurate decoding of the directional focus of auditory attention.SignificanceThe high accuracy on very short data segments is a major step forward towards practical neuro-steered hearing devices.


2019 ◽  
Vol 30 (4) ◽  
pp. 2600-2614 ◽  
Author(s):  
Xiangbin Teng ◽  
David Poeppel

Abstract Natural sounds contain acoustic dynamics ranging from tens to hundreds of milliseconds. How does the human auditory system encode acoustic information over wide-ranging timescales to achieve sound recognition? Previous work (Teng et al. 2017) demonstrated a temporal coding preference for the theta and gamma ranges, but it remains unclear how acoustic dynamics between these two ranges are coded. Here, we generated artificial sounds with temporal structures over timescales from ~200 to ~30 ms and investigated temporal coding on different timescales. Participants discriminated sounds with temporal structures at different timescales while undergoing magnetoencephalography recording. Although considerable intertrial phase coherence can be induced by acoustic dynamics of all the timescales, classification analyses reveal that the acoustic information of all timescales is preferentially differentiated through the theta and gamma bands, but not through the alpha and beta bands; stimulus reconstruction shows that the acoustic dynamics in the theta and gamma ranges are preferentially coded. We demonstrate that the theta and gamma bands show the generality of temporal coding with comparable capacity. Our findings provide a novel perspective—acoustic information of all timescales is discretised into two discrete temporal chunks for further perceptual analysis.


2019 ◽  
Vol 9 (3) ◽  
pp. 70 ◽  
Author(s):  
Brett Myers ◽  
Miriam Lense ◽  
Reyna Gordon

Prosodic cues in speech are indispensable for comprehending a speaker’s message, recognizing emphasis and emotion, parsing segmental units, and disambiguating syntactic structures. While it is commonly accepted that prosody provides a fundamental service to higher-level features of speech, the neural underpinnings of prosody processing are not clearly defined in the cognitive neuroscience literature. Many recent electrophysiological studies have examined speech comprehension by measuring neural entrainment to the speech amplitude envelope, using a variety of methods including phase-locking algorithms and stimulus reconstruction. Here we review recent evidence for neural tracking of the speech envelope and demonstrate the importance of prosodic contributions to the neural tracking of speech. Prosodic cues may offer a foundation for supporting neural synchronization to the speech envelope, which scaffolds linguistic processing. We argue that prosody has an inherent role in speech perception, and future research should fill the gap in our knowledge of how prosody contributes to speech envelope entrainment.


2018 ◽  
Author(s):  
Gregory Ciccarelli ◽  
Michael Nolan ◽  
Joseph Perricone ◽  
Paul Calamia ◽  
Stephanie Haro ◽  
...  

Auditory attention decoding (AAD) through a brain-computer interface has had a flowering of developments since it was first introduced by Mesgarani and Chang (2012) using electrocorticograph recordings. AAD has been pursued for its potential application to hearing-aid design in which an attention-guided algorithm selects, from multiple competing acoustic sources, which should be enhanced for the listener and which should be suppressed. Traditionally, researchers have separated the AAD problem into two stages: reconstruction of a representation of the attended audio from neural signals, followed by determining the similarity between the candidate audio streams and the reconstruction. In this work, we compare the traditional two-stage approach with a novel neural-network architecture that subsumes the explicit similarity step. We compare this new architecture against linear and non-linear (neural-network) baselines using both wet and dry electroencephalogram (EEG) systems. Our results indicate that the wet and dry systems can deliver comparable results despite the latter having one third as many EEG channels as the former, and that the new architecture outperforms the baseline stimulus-reconstruction methods for both EEG modalities. The 14-subject, wet-electrode AAD dataset for two competing, co-located talkers, the 11-subject, dry-electrode AAD dataset, and our software are available to download for further validation, experimentation, and modification.


2018 ◽  
Author(s):  
Stef Garasto ◽  
Wilten Nicola ◽  
Anil A. Bharath ◽  
Simon R. Schultz

AbstractDeciphering the neural code involves interpreting the responses of sensory neurons from the perspective of a downstream population. Performing such a read-out is an important step towards understanding how the brain processes sensory information and has implications for Brain-Machine Interfaces. While previous work has focused on classification algorithms to identify a stimulus in a predefined set of categories, few studies have approached a full-stimulus reconstruction task, especially from calcium imaging recordings. Here, we attempt a pixel-by-pixel reconstruction of complex natural stimuli from two-photon calcium imaging of mouse primary visual cortex. We decoded the activity of 103 neurons from layer 2/3 using an optimal linear estimator and investigated which factors drive the reconstruction performance at the pixel level. We find the density of receptive fields to be the most influential feature. Finally, we use the receptive field data and simulations from a linear-nonlinear Poisson model to extrapolate decoding accuracy as a function of network size. We find that, on this dataset, reconstruction performance can increase by more than 50%, provided that the receptive fields are sampled more uniformly in the full visual field. These results provide practical experimental guidelines to boost the accuracy of full-stimulus reconstruction.


Sign in / Sign up

Export Citation Format

Share Document