scholarly journals Directly interfacing brain and deep networks exposes non-hierarchical visual processing

2021 ◽  
Author(s):  
Nicholas J Sexton ◽  
Bradley C Love

One reason the mammalian visual system is viewed as hierarchical, such that successive stages of processing contain ever higher-level information, is because of functional correspondences with deep convolutional neural networks (DCNNs). However, these correspondences between brain and model activity involve shared, not task-relevant, variance. We propose a stricter test of correspondence: If a DCNN layer corresponds to a brain region, then replacing model activity with brain activity should successfully drive the DCNN's object recognition decision. Using this approach on three datasets, we found all regions along the ventral visual stream best corresponded with later model layers, indicating all stages of processing contained higher-level information about object category. Time course analyses suggest long-range recurrent connections transmit object class information from late to early visual areas.

2021 ◽  
Author(s):  
Aran Nayebi ◽  
Nathan C. L. Kong ◽  
Chengxu Zhuang ◽  
Justin L. Gardner ◽  
Anthony M. Norcia ◽  
...  

Task-optimized deep convolutional neural networks are the most quantitatively accurate models of the primate ventral visual stream. However, such networks are implausible as a model of the mouse visual system because mouse visual cortex has a known shallower hierarchy and the supervised objectives these networks are typically trained with are likely neither ethologically relevant in content nor in quantity. Here we develop shallow network architectures that are more consistent with anatomical and physiological studies of mouse visual cortex than current models. We demonstrate that hierarchically shallow architectures trained using contrastive objective functions applied to visual-acuity-adapted images achieve neural prediction performance that exceed those of the same architectures trained in a supervised manner and result in the most quantitatively accurate models of the mouse visual system. Moreover, these models' neural predictivity significantly surpasses those of supervised, deep architectures that are known to correspond well to the primate ventral visual stream. Finally, we derive a novel measure of inter-animal consistency, and show that the best models closely match this quantity across visual areas. Taken together, our results suggest that contrastive objectives operating on shallow architectures with ethologically-motivated image transformations may be a biologically-plausible computational theory of visual coding in mice.


2005 ◽  
Vol 93 (6) ◽  
pp. 3453-3462 ◽  
Author(s):  
Mary-Ellen Large ◽  
Adrian Aldcroft ◽  
Tutis Vilis

Perceptual continuity is an important aspect of our experience of the visual world. In this study, we focus on an example of perceptual continuity involving the maintenance of figure—ground segregation despite the removal of binding cues that initiated the segregation. Fragmented line drawings of objects were superimposed on a background of randomly oriented lines. Global forms could be discriminated from the background based on differences in motion or differences in color/brightness. Furthermore, perception of a global form persisted after the binding cue had been removed. A comparison between the persistence of forms constructed from motion or color demonstrated that both forms produced persistence after the object defining cues were removed. Functional imaging showed a gradual increase in the persistence of brain activity in the lower visual areas (V1, V2, VP), which reached significance in V4v and peaked in the lateral occipital area. There was no difference in the location of persistence for color- or motion-defined forms. These results suggest that the retention of a global percept is an emerging property of the ventral visual processing stream and the maintenance of grouped visual elements is independent of cue type. We postulated that perceptual persistence depends on a system of perceptual memory reflecting the state of perceptual organization.


2000 ◽  
Vol 12 (4) ◽  
pp. 615-621 ◽  
Author(s):  
Glen M. Doniger ◽  
John J. Foxe ◽  
Micah M. Murray ◽  
Beth A. Higgins ◽  
Joan Gay Snodgrass ◽  
...  

Object recognition is achieved even in circumstances when only partial information is available to the observer. Perceptual closure processes are essential in enabling such recognitions to occur. We presented successively less fragmented images while recording high-density event-related potentials (ERPs), which permitted us to monitor brain activity during the perceptual closure processes leading up to object recognition. We reveal a bilateral ERP component (Ncl) that tracks these processes (onsets ∼ 230 msec, maximal at ∼290 msec). Scalp-current density mapping of the Ncl revealed bilateral occipito-temporal scalp foci, which are consistent with generators in the human ventral visual stream, and specifically the lateral-occipital or LO complex as defined by hemodynamic studies of object recognition.


2020 ◽  
Author(s):  
Franziska Geiger ◽  
Martin Schrimpf ◽  
Tiago Marques ◽  
James J. DiCarlo

AbstractAfter training on large datasets, certain deep neural networks are surprisingly good models of the neural mechanisms of adult primate visual object recognition. Nevertheless, these models are poor models of the development of the visual system because they posit millions of sequential, precisely coordinated synaptic updates, each based on a labeled image. While ongoing research is pursuing the use of unsupervised proxies for labels, we here explore a complementary strategy of reducing the required number of supervised synaptic updates to produce an adult-like ventral visual stream (as judged by the match to V1, V2, V4, IT, and behavior). Such models might require less precise machinery and energy expenditure to coordinate these updates and would thus move us closer to viable neuroscientific hypotheses about how the visual system wires itself up. Relative to the current leading model of the adult ventral stream, we here demonstrate that the total number of supervised weight updates can be substantially reduced using three complementary strategies: First, we find that only 2% of supervised updates (epochs and images) are needed to achieve ~80% of the match to adult ventral stream. Second, by improving the random distribution of synaptic connectivity, we find that 54% of the brain match can already be achieved “at birth” (i.e. no training at all). Third, we find that, by training only ~5% of model synapses, we can still achieve nearly 80% of the match to the ventral stream. When these three strategies are applied in combination, we find that these new models achieve ~80% of a fully trained model’s match to the brain, while using two orders of magnitude fewer supervised synaptic updates. These results reflect first steps in modeling not just primate adult visual processing during inference, but also how the ventral visual stream might be “wired up” by evolution (a model’s “birth” state) and by developmental learning (a model’s updates based on visual experience).


2020 ◽  
Vol 10 (9) ◽  
pp. 602
Author(s):  
Yibo Cui ◽  
Chi Zhang ◽  
Kai Qiao ◽  
Linyuan Wang ◽  
Bin Yan ◽  
...  

Representation invariance plays a significant role in the performance of deep convolutional neural networks (CNNs) and human visual information processing in various complicated image-based tasks. However, there has been abounding confusion concerning the representation invariance mechanisms of the two sophisticated systems. To investigate their relationship under common conditions, we proposed a representation invariance analysis approach based on data augmentation technology. Firstly, the original image library was expanded by data augmentation. The representation invariances of CNNs and the ventral visual stream were then studied by comparing the similarities of the corresponding layer features of CNNs and the prediction performance of visual encoding models based on functional magnetic resonance imaging (fMRI) before and after data augmentation. Our experimental results suggest that the architecture of CNNs, combinations of convolutional and fully-connected layers, developed representation invariance of CNNs. Remarkably, we found representation invariance belongs to all successive stages of the ventral visual stream. Hence, the internal correlation between CNNs and the human visual system in representation invariance was revealed. Our study promotes the advancement of invariant representation of computer vision and deeper comprehension of the representation invariance mechanism of human visual information processing.


2021 ◽  
Vol 11 (8) ◽  
pp. 1004
Author(s):  
Jingwei Li ◽  
Chi Zhang ◽  
Linyuan Wang ◽  
Penghui Ding ◽  
Lulu Hu ◽  
...  

Visual encoding models are important computational models for understanding how information is processed along the visual stream. Many improved visual encoding models have been developed from the perspective of the model architecture and the learning objective, but these are limited to the supervised learning method. From the view of unsupervised learning mechanisms, this paper utilized a pre-trained neural network to construct a visual encoding model based on contrastive self-supervised learning for the ventral visual stream measured by functional magnetic resonance imaging (fMRI). We first extracted features using the ResNet50 model pre-trained in contrastive self-supervised learning (ResNet50-CSL model), trained a linear regression model for each voxel, and finally calculated the prediction accuracy of different voxels. Compared with the ResNet50 model pre-trained in a supervised classification task, the ResNet50-CSL model achieved an equal or even relatively better encoding performance in multiple visual cortical areas. Moreover, the ResNet50-CSL model performs hierarchical representation of input visual stimuli, which is similar to the human visual cortex in its hierarchical information processing. Our experimental results suggest that the encoding model based on contrastive self-supervised learning is a strong computational model to compete with supervised models, and contrastive self-supervised learning proves an effective learning method to extract human brain-like representations.


2017 ◽  
Author(s):  
Radoslaw M. Cichy ◽  
Nikolaus Kriegeskorte ◽  
Kamila M. Jozwik ◽  
Jasper J.F. van den Bosch ◽  
Ian Charest

1AbstractVision involves complex neuronal dynamics that link the sensory stream to behaviour. To capture the richness and complexity of the visual world and the behaviour it entails, we used an ecologically valid task with a rich set of real-world object images. We investigated how human brain activity, resolved in space with functional MRI and in time with magnetoencephalography, links the sensory stream to behavioural responses. We found that behaviour-related brain activity emerged rapidly in the ventral visual pathway within 200ms of stimulus onset. The link between stimuli, brain activity, and behaviour could not be accounted for by either category membership or visual features (as provided by an artificial deep neural network model). Our results identify behaviourally-relevant brain activity during object vision, and suggest that object representations guiding behaviour are complex and can neither be explained by visual features or semantic categories alone. Our findings support the view that visual representations in the ventral visual stream need to be understood in terms of their relevance to behaviour, and highlight the importance of complex behavioural assessment for human brain mapping.


2018 ◽  
Author(s):  
Jonas Kubilius ◽  
Martin Schrimpf ◽  
Aran Nayebi ◽  
Daniel Bear ◽  
Daniel L. K. Yamins ◽  
...  

AbstractDeep artificial neural networks with spatially repeated processing (a.k.a., deep convolutional ANNs) have been established as the best class of candidate models of visual processing in primate ventral visual processing stream. Over the past five years, these ANNs have evolved from a simple feedforward eight-layer architecture in AlexNet to extremely deep and branching NAS-Net architectures, demonstrating increasingly better object categorization performance and increasingly better explanatory power of both neural and behavioral responses. However, from the neuroscientist’s point of view, the relationship between such very deep architectures and the ventral visual pathway is incomplete in at least two ways. On the one hand, current state-of-the-art ANNs appear to be too complex (e.g., now over 100 levels) compared with the relatively shallow cortical hierarchy (4-8 levels), which makes it difficult to map their elements to those in the ventral visual stream and to understand what they are doing. On the other hand, current state-of-the-art ANNs appear to be not complex enough in that they lack recurrent connections and the resulting neural response dynamics that are commonplace in the ventral visual stream. Here we describe our ongoing efforts to resolve both of these issues by developing a “CORnet” family of deep neural network architectures. Rather than just seeking high object recognition performance (as the state-of-the-art ANNs above), we instead try to reduce the model family to its most important elements and then gradually build new ANNs with recurrent and skip connections while monitoring both performance and the match between each new CORnet model and a large body of primate brain and behavioral data. We report here that our current best ANN model derived from this approach (CORnet-S) is among the top models on Brain-Score, a composite benchmark for comparing models to the brain, but is simpler than other deep ANNs in terms of the number of convolutions performed along the longest path of information processing in the model. All CORnet models are available at github.com/dicarlolab/CORnet, and we plan to up-date this manuscript and the available models in this family as they are produced.


2008 ◽  
Vol 18 (10) ◽  
pp. 2402-2409 ◽  
Author(s):  
Deepak Sarpal ◽  
Bradley R. Buchsbaum ◽  
Philip D. Kohn ◽  
J. Shane Kippenhan ◽  
Carolyn B. Mervis ◽  
...  

2021 ◽  
Vol 15 ◽  
Author(s):  
Trung Quang Pham ◽  
Shota Nishiyama ◽  
Norihiro Sadato ◽  
Junichi Chikazoe

Multivoxel pattern analysis (MVPA) has become a standard tool for decoding mental states from brain activity patterns. Recent studies have demonstrated that MVPA can be applied to decode activity patterns of a certain region from those of the other regions. By applying a similar region-to-region decoding technique, we examined whether the information represented in the visual areas can be explained by those represented in the other visual areas. We first predicted the brain activity patterns of an area on the visual pathway from the others, then subtracted the predicted patterns from their originals. Subsequently, the visual features were derived from these residuals. During the visual perception task, the elimination of the top-down signals enhanced the simple visual features represented in the early visual cortices. By contrast, the elimination of the bottom-up signals enhanced the complex visual features represented in the higher visual cortices. The directions of such modulation effects varied across visual perception/imagery tasks, indicating that the information flow across the visual cortices is dynamically altered, reflecting the contents of visual processing. These results demonstrated that the distillation approach is a useful tool to estimate the hidden content of information conveyed across brain regions.


Sign in / Sign up

Export Citation Format

Share Document