scholarly journals Inductive biases, pretraining and fine-tuning jointly account for brain responses to speech

2021 ◽  
Author(s):  
Juliette MILLET ◽  
Jean-Remi KING

Our ability to comprehend speech remains, to date, unrivaled by deep learning models. This feat could result from the brain’s ability to fine-tune generic sound representations for speech-specific processes. To test this hypothesis, we compare i) five types of deep neural networks to ii) human brain responses elicited by spoken sentences and recorded in 102 Dutch subjects using functional Magnetic Resonance Imaging (fMRI). Each network was either trained on an acoustics scene classification, a speech-to-text task (based on Bengali, English, or Dutch), or not trained. The similarity between each model and the brain is assessed by correlating their respective activations after an optimal linear projection. The differences in brain-similarity across networks revealed three main results. First, speech representations in the brain can be accounted for by random deep networks. Second, learning to classify acoustic scenes leads deep nets to increase their brain similarity. Third, learning to process phonetically-related speech inputs (i.e., Dutch vs English) leads deep nets to reach higher levels of brain-similarity than learning to process phonetically-distant speech inputs (i.e. Dutch vs Bengali). Together, these results suggest that the human brain fine-tunes its heavily-trained auditory hierarchy to learn to process speech.

2020 ◽  
Vol 6 (30) ◽  
pp. eaba7830
Author(s):  
Laurianne Cabrera ◽  
Judit Gervain

Speech perception is constrained by auditory processing. Although at birth infants have an immature auditory system and limited language experience, they show remarkable speech perception skills. To assess neonates’ ability to process the complex acoustic cues of speech, we combined near-infrared spectroscopy (NIRS) and electroencephalography (EEG) to measure brain responses to syllables differing in consonants. The syllables were presented in three conditions preserving (i) original temporal modulations of speech [both amplitude modulation (AM) and frequency modulation (FM)], (ii) both fast and slow AM, but not FM, or (iii) only the slowest AM (<8 Hz). EEG responses indicate that neonates can encode consonants in all conditions, even without the fast temporal modulations, similarly to adults. Yet, the fast and slow AM activate different neural areas, as shown by NIRS. Thus, the immature human brain is already able to decompose the acoustic components of speech, laying the foundations of language learning.


2019 ◽  
Author(s):  
Keiichi Kitajo ◽  
Takumi Sase ◽  
Yoko Mizuno ◽  
Hiromichi Suetani

AbstractIt is an open question as to whether macroscopic human brain responses to repeatedly presented external inputs show consistent patterns across trials. We here provide experimental evidence that human brain responses to noisy time-varying visual inputs, as measured by scalp electroencephalography (EEG), show a signature of consistency. The results indicate that the EEG-recorded responses are robust against fluctuating ongoing activity, and that they respond to visual stimuli in a repeatable manner. This consistency presumably mediates robust information processing in the brain. Moreover, the EEG response waveforms were discriminable between individuals, and were invariant over a number of days within individuals. We reveal that time-varying noisy visual inputs can harness macroscopic brain dynamics and can manifest hidden individual variations.


2021 ◽  
Author(s):  
Charlotte Caucheteux ◽  
Alexandre Gramfort ◽  
Jean-Rémi King

Language transformers, like GPT-2, have demonstrated remarkable abilities to process text, and now constitute the backbone of deep translation, summarization and dialogue algorithms. However, whether these models actually understand language is highly controversial. Here, we show that the representations of GPT-2 not only map onto the brain responses to spoken stories, but also predict the extent to which subjects understand the narratives. To this end, we analyze 101 subjects recorded with functional Magnetic Resonance Imaging while listening to 70 min of short stories. We then fit a linear model to predict brain activity from GPT-2 activations, and correlate this mapping with subjects’ comprehension scores as assessed for each story. The results show that GPT-2’s brain predictions significantly correlate with semantic comprehension. These effects are bilaterally distributed in the language network and peak with a correlation above 30% in the infero-frontal and medio-temporal gyri as well as in the superior frontal cortex, the planum temporale and the precuneus. Overall, this study provides an empirical framework to probe and dissect semantic comprehension in brains and deep learning algorithms.


2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Kyle M. Gilbert ◽  
Justine C. Cléry ◽  
Joseph S. Gati ◽  
Yuki Hori ◽  
Kevin D. Johnston ◽  
...  

AbstractSocial cognition is a dynamic process that requires the perception and integration of a complex set of idiosyncratic features between interacting conspecifics. Here we present a method for simultaneously measuring the whole-brain activation of two socially interacting marmoset monkeys using functional magnetic resonance imaging. MRI hardware (a radiofrequency coil and peripheral devices) and image-processing pipelines were developed to assess brain responses to socialization, both on an intra-brain and inter-brain level. Notably, the brain activation of a marmoset when viewing a second marmoset in-person versus when viewing a pre-recorded video of the same marmoset—i.e., when either capable or incapable of socially interacting with a visible conspecific—demonstrates increased activation in the face-patch network. This method enables a wide range of possibilities for potentially studying social function and dysfunction in a non-human primate model.


2018 ◽  
Vol 29 (2) ◽  
pp. 89-98
Author(s):  
Zheng Ye ◽  
Bahram Mohammadi ◽  
Robert Kopyciok ◽  
Marcus Heldmann ◽  
Amir Samii ◽  
...  

Abstract. Interpersonal and intrapersonal differences in brain responses to sexual stimuli have been linked with individuals’ testosterone levels. However, it remains unclear how hormones modulate brain functions underlying sexual arousal. In order to assess the effects of chronic hormonal treatment, we used functional magnetic resonance imaging in a group of female-to-male transsexuals before and during androgen therapy while they watched a set of pictures representing dressed or nude (erotic content) men or women (sex information). A broad network of cortical and subcortical regions were activated during the processing of erotic stimuli (nude vs. dressed), including the insula, amygdala, and hypothalamus. The insula activity in response to erotic male stimuli decreased over the initial 4 months of hormonal therapy. In the following 8 months, the insula response to erotic female stimuli increased. In other words, long-term androgen administration makes the brain more “male” by reducing the sexual arousal caused by male stimuli and amplifying that caused by female stimuli.


2020 ◽  
Vol 34 (04) ◽  
pp. 4060-4066
Author(s):  
Yunhui Guo ◽  
Yandong Li ◽  
Liqiang Wang ◽  
Tajana Rosing

There is an increasing number of pre-trained deep neural network models. However, it is still unclear how to effectively use these models for a new task. Transfer learning, which aims to transfer knowledge from source tasks to a target task, is an effective solution to this problem. Fine-tuning is a popular transfer learning technique for deep neural networks where a few rounds of training are applied to the parameters of a pre-trained model to adapt them to a new task. Despite its popularity, in this paper we show that fine-tuning suffers from several drawbacks. We propose an adaptive fine-tuning approach, called AdaFilter, which selects only a part of the convolutional filters in the pre-trained model to optimize on a per-example basis. We use a recurrent gated network to selectively fine-tune convolutional filters based on the activations of the previous layer. We experiment with 7 public image classification datasets and the results show that AdaFilter can reduce the average classification error of the standard fine-tuning by 2.54%.


2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Yaoda Xu ◽  
Maryam Vaziri-Pashkam

AbstractConvolutional neural networks (CNNs) are increasingly used to model human vision due to their high object categorization capabilities and general correspondence with human brain responses. Here we evaluate the performance of 14 different CNNs compared with human fMRI responses to natural and artificial images using representational similarity analysis. Despite the presence of some CNN-brain correspondence and CNNs’ impressive ability to fully capture lower level visual representation of real-world objects, we show that CNNs do not fully capture higher level visual representations of real-world objects, nor those of artificial objects, either at lower or higher levels of visual representations. The latter is particularly critical, as the processing of both real-world and artificial visual stimuli engages the same neural circuits. We report similar results regardless of differences in CNN architecture, training, or the presence of recurrent processing. This indicates some fundamental differences exist in how the brain and CNNs represent visual information.


2012 ◽  
Author(s):  
Χρυσούλα Λιθαρή

[…] Social drinking, for most people, is an inseparable part of every-day life. Alcohol is used and abused for its ability to modify emotional states, and more precisely, to reduce anxiety [1], [2]. It is therefore essential to study the effects of inebriation in healthy, non-dependent individuals, given the frequency of abuse and binge drinking. A better understanding of the neural underpinnings of alcohol consumption could have a number of social implications, including the origin of the inebriation-induced aggressiveness, the tendency to abuse and the driving or work-related hazards [3]. More precisely, this study aimed to answer to the following questions: - How acute alcohol intake affects the human brain responses to affective pictures? - Is the effect of alcohol emotion-specific or is it the same for all kinds of emotion-eliciting images? - Is the brain functional organization at rest modulated by inebriation? - What are the similarities and the differences between the EEG and MEG studies conducted? In a second level, there has been an effort to design the optimal experimental procedure to examine as accurately as possible the multi-factorial issue of inebriation effects on the human brain. Regarding the analysis of the recordings, the standard analysis techniques on sensor levels were first applied, and then, more advanced techniques, such as cortical source estimation and functional connectivity were used to examine whether any additional information is provided. […]


2021 ◽  
Author(s):  
Kyle M. Gilbert ◽  
Justine C. Cléry ◽  
Joseph S. Gati ◽  
Yuki Hori ◽  
Alexander Mashkovtsev ◽  
...  

AbstractSocial cognition is a dynamic process that requires the perception and integration of a complex set of idiosyncratic features between interacting conspecifics. Here we present a method for simultaneously measuring the whole-brain activation of two socially interacting marmoset monkeys using functional magnetic resonance imaging. MRI hardware (a radiofrequency coil and peripheral devices) and image-processing pipelines were developed to assess brain responses to socialization, both on an intra-brain and inter-brain level. Notably, brain-activation maps acquired during constant interaction demonstrated neuronal synchrony between marmosets in regions of the brain responsible for processing social interaction. This method enables a wide range of possibilities for studying social function and dysfunction in a non-human primate model, including using transgenic models of neuropsychiatric disorders.


Sign in / Sign up

Export Citation Format

Share Document