scholarly journals How predictive can be predictions in the neurocognitive processing of auditory and audiovisual speech? A deep learning study

2018 ◽  
Author(s):  
Thomas Hueber ◽  
Eric Tatulli ◽  
Laurent Girin ◽  
Jean-luc Schwartz

AbstractSensory processing is increasingly conceived in a predictive framework in which neurons would constantly process the error signal resulting from the comparison of expected and observed stimuli. Surprisingly, few data exist on the amount of predictions that can be computed in real sensory scenes. Here, we focus on the sensory processing of auditory and audiovisual speech. We propose a set of computational models based on artificial neural networks (mixing deep feed-forward and convolutional networks) which are trained to predict future audio observations from 25 ms to 250 ms past audio or audiovisual observations (i.e. including lip movements). Experiments are conducted on the multispeaker NTCD-TIMIT audiovisual speech database. Predictions are efficient in a short temporal range (25-50 ms), predicting 40 to 60 % of the variance of the incoming stimulus, which could result in potentially saving up to 2/3 of the processing power. Then they quickly decrease to vanish after 100 ms. Adding information on the lips slightly improves predictions, with a 5 to 10 % increase in explained variance.Interestingly the visual gain vanishes more slowly, and the gain is maximum for a delay of 75 ms between image and predicted sound.

2020 ◽  
Vol 32 (3) ◽  
pp. 596-625
Author(s):  
Thomas Hueber ◽  
Eric Tatulli ◽  
Laurent Girin ◽  
Jean-Luc Schwartz

Sensory processing is increasingly conceived in a predictive framework in which neurons would constantly process the error signal resulting from the comparison of expected and observed stimuli. Surprisingly, few data exist on the accuracy of predictions that can be computed in real sensory scenes. Here, we focus on the sensory processing of auditory and audiovisual speech. We propose a set of computational models based on artificial neural networks (mixing deep feedforward and convolutional networks), which are trained to predict future audio observations from present and past audio or audiovisual observations (i.e., including lip movements). Those predictions exploit purely local phonetic regularities with no explicit call to higher linguistic levels. Experiments are conducted on the multispeaker LibriSpeech audio speech database (around 100 hours) and on the NTCD-TIMIT audiovisual speech database (around 7 hours). They appear to be efficient in a short temporal range (25–50 ms), predicting 50% to 75% of the variance of the incoming stimulus, which could result in potentially saving up to three-quarters of the processing power. Then they quickly decrease and almost vanish after 250 ms. Adding information on the lips slightly improves predictions, with a 5% to 10% increase in explained variance. Interestingly the visual gain vanishes more slowly, and the gain is maximum for a delay of 75 ms between image and predicted sound.


1998 ◽  
Vol 10 (4) ◽  
pp. 771-805 ◽  
Author(s):  
Jean-Marc Fellous ◽  
Christiane Linster

Computational modeling of neural substrates provides an excellent theoretical framework for the understanding of the computational roles of neuromodulation. In this review, we illustrate, with a large number of modeling studies, the specific computations performed by neuromodulation in the context of various neural models of invertebrate and vertebrate preparations. We base our characterization of neuromodulations on their computational and functional roles rather than on anatomical or chemical criteria. We review the main framework in which neuromodulation has been studied theoretically (central pattern generation and oscillations, sensory processing, memory and information integration). Finally, we present a detailed mathematical overview of how neuromodulation has been implemented at the single cell and network levels in modeling studies. Overall, neuromodulation is found to increase and control computational complexity.


2015 ◽  
Vol 26 (2) ◽  
pp. 1-13
Author(s):  
Fabio Porto ◽  
Ramon G. Costa ◽  
Ana Maria de C. Moura ◽  
Bernardo Gonçalves

Computational Simulations are important tools that enable scientists to study complex phenomena about which few data is available or that require dangerous human interventions. They involve complex and heterogeneous components, including: mathematical equations, hypothesis, computational models and data. In order to support in-silico scientific research this complex environment needs to be modeled and have its data and metadata managed enabling model evolution, prediction analysis and decision-making. This paper proposes a scientific hypothesis conceptual model that allows scientists to represent the phenomenon been investigated, the hypotheses formulated in the attempt to explain it, and provides the ability to store results of experiment simulations with their corresponding provenance metadata. The proposed model supports scientific life-cycle through: provenance management, exchange of hypothesis as data, experiment reproducibility, model steering and simulation result analyses. A cardiovascular numerical simulation illustrates the applicability of the model and an initial implementation using SciDB is discussed.


2019 ◽  
Author(s):  
Andrea Alamia ◽  
Canhuang Luo ◽  
Matthew Ricci ◽  
Junkyung Kim ◽  
Thomas Serre ◽  
...  

AbstractThe development of deep convolutional networks (DCNs) has recently led to great successes in computer vision and have become de facto computational models of vision. However, a growing body of work suggests that they exhibit critical limitations beyond image categorization. Here, we study a fundamental limitation of DCNs for judging whether two items are the same or different (SD) compared to a baseline assessment of their spatial relationship (SR). We test the prediction that SD tasks recruit additional cortical mechanisms which underlie critical aspects of visual cognition that are not explained by current computational models. We thus recorded EEG signals from 14 participants engaged in the same tasks as the computational models. Importantly, the two tasks were matched in terms of difficulty by an adaptive psychometric procedure: yet, on top of a modulation of evoked potentials, our results revealed higher activity in the low beta (13-20Hz) band in the SD compared to the SR conditions, which we surmise as reflecting the crucial involvement of recurrent mechanisms sustaining working memory and attention.Author SummaryDespite the impressive progress of deep convolutional networks (DCNs) in object recognition, recent studies demonstrated that state-of-the-art vision algorithms encounter severe limitations when performing certain visual reasoning tasks: for instance, convolutional networks can easily solve problems involving spatial relations, but fail in identifying whether two items are identical or different (same-different task). This conclusion led us to test the hypothesis that different computational mechanisms are needed to successfully perform these tasks also in the visual system. First, we confirmed in our simulations that DCNs can successfully perform spatial relationship tasks but struggle with same-different ones. Then, we tested 14 participants on the same experimental design while recording their EEG signals. Remarkably, our results revealed a significant difference between the tasks in the occipital brain regions both in evoked potentials and in the oscillatory dynamics. Specifically, an increase of activity was found when performing the SD over the SR condition. We interpret these results as reflecting the fundamental involvement of recurrent mechanisms implementing cognitive functions such as working memory and attention.


Author(s):  
Nicola Strisciuglio ◽  
Nicolai Petkov

AbstractThe study of the visual system of the brain has attracted the attention and interest of many neuro-scientists, that derived computational models of some types of neuron that compose it. These findings inspired researchers in image processing and computer vision to deploy such models to solve problems of visual data processing.In this paper, we review approaches for image processing and computer vision, the design of which is based on neuro-scientific findings about the functions of some neurons in the visual cortex. Furthermore, we analyze the connection between the hierarchical organization of the visual system of the brain and the structure of Convolutional Networks (ConvNets). We pay particular attention to the mechanisms of inhibition of the responses of some neurons, which provide the visual system with improved stability to changing input stimuli, and discuss their implementation in image processing operators and in ConvNets.


Author(s):  
K Das Chowdhury ◽  
R. W. Carpenter ◽  
W. Braue

Research on reaction-bonded SiC (RBSiC) is aimed at developing a reliable structural ceramic with improved mechanical properties. The starting materials for RBSiC were Si,C and α-SiC powder. The formation of the complex microstructure of RBSiC involves (i) solution of carbon in liquid silicon, (ii) nucleation and epitaxial growth of secondary β-SiC on the original α-SiC grains followed by (iii) β>α-SiC phase transformation of newly formed SiC. Due to their coherent nature, epitaxial SiC/SiC interfaces are considered to be segregation-free and “strong” with respect to their effect on the mechanical properties of RBSiC. But the “weak” Si/SiC interface limits its use in high temperature situations. However, few data exist on the structure and chemistry of these interfaces. Microanalytical results obtained by parallel EELS and HREM imaging are reported here.


Sign in / Sign up

Export Citation Format

Share Document