scholarly journals Wiring Up Vision: Minimizing Supervised Synaptic Updates Needed to Produce a Primate Ventral Stream

2020 ◽  
Author(s):  
Franziska Geiger ◽  
Martin Schrimpf ◽  
Tiago Marques ◽  
James J. DiCarlo

AbstractAfter training on large datasets, certain deep neural networks are surprisingly good models of the neural mechanisms of adult primate visual object recognition. Nevertheless, these models are poor models of the development of the visual system because they posit millions of sequential, precisely coordinated synaptic updates, each based on a labeled image. While ongoing research is pursuing the use of unsupervised proxies for labels, we here explore a complementary strategy of reducing the required number of supervised synaptic updates to produce an adult-like ventral visual stream (as judged by the match to V1, V2, V4, IT, and behavior). Such models might require less precise machinery and energy expenditure to coordinate these updates and would thus move us closer to viable neuroscientific hypotheses about how the visual system wires itself up. Relative to the current leading model of the adult ventral stream, we here demonstrate that the total number of supervised weight updates can be substantially reduced using three complementary strategies: First, we find that only 2% of supervised updates (epochs and images) are needed to achieve ~80% of the match to adult ventral stream. Second, by improving the random distribution of synaptic connectivity, we find that 54% of the brain match can already be achieved “at birth” (i.e. no training at all). Third, we find that, by training only ~5% of model synapses, we can still achieve nearly 80% of the match to the ventral stream. When these three strategies are applied in combination, we find that these new models achieve ~80% of a fully trained model’s match to the brain, while using two orders of magnitude fewer supervised synaptic updates. These results reflect first steps in modeling not just primate adult visual processing during inference, but also how the ventral visual stream might be “wired up” by evolution (a model’s “birth” state) and by developmental learning (a model’s updates based on visual experience).

2014 ◽  
Vol 111 (1) ◽  
pp. 91-102 ◽  
Author(s):  
Leyla Isik ◽  
Ethan M. Meyers ◽  
Joel Z. Leibo ◽  
Tomaso Poggio

The human visual system can rapidly recognize objects despite transformations that alter their appearance. The precise timing of when the brain computes neural representations that are invariant to particular transformations, however, has not been mapped in humans. Here we employ magnetoencephalography decoding analysis to measure the dynamics of size- and position-invariant visual information development in the ventral visual stream. With this method we can read out the identity of objects beginning as early as 60 ms. Size- and position-invariant visual information appear around 125 ms and 150 ms, respectively, and both develop in stages, with invariance to smaller transformations arising before invariance to larger transformations. Additionally, the magnetoencephalography sensor activity localizes to neural sources that are in the most posterior occipital regions at the early decoding times and then move temporally as invariant information develops. These results provide previously unknown latencies for key stages of human-invariant object recognition, as well as new and compelling evidence for a feed-forward hierarchical model of invariant object recognition where invariance increases at each successive visual area along the ventral stream.


2016 ◽  
Vol 371 (1705) ◽  
pp. 20160278 ◽  
Author(s):  
Nikolaus Kriegeskorte ◽  
Jörn Diedrichsen

High-resolution functional imaging is providing increasingly rich measurements of brain activity in animals and humans. A major challenge is to leverage such data to gain insight into the brain's computational mechanisms. The first step is to define candidate brain-computational models (BCMs) that can perform the behavioural task in question. We would then like to infer which of the candidate BCMs best accounts for measured brain-activity data. Here we describe a method that complements each BCM by a measurement model (MM), which simulates the way the brain-activity measurements reflect neuronal activity (e.g. local averaging in functional magnetic resonance imaging (fMRI) voxels or sparse sampling in array recordings). The resulting generative model (BCM-MM) produces simulated measurements. To avoid having to fit the MM to predict each individual measurement channel of the brain-activity data, we compare the measured and predicted data at the level of summary statistics. We describe a novel particular implementation of this approach, called probabilistic representational similarity analysis (pRSA) with MMs, which uses representational dissimilarity matrices (RDMs) as the summary statistics. We validate this method by simulations of fMRI measurements (locally averaging voxels) based on a deep convolutional neural network for visual object recognition. Results indicate that the way the measurements sample the activity patterns strongly affects the apparent representational dissimilarities. However, modelling of the measurement process can account for these effects, and different BCMs remain distinguishable even under substantial noise. The pRSA method enables us to perform Bayesian inference on the set of BCMs and to recognize the data-generating model in each case. This article is part of the themed issue ‘Interpreting BOLD: a dialogue between cognitive and cellular neuroscience’.


2017 ◽  
Vol 117 (1) ◽  
pp. 388-402 ◽  
Author(s):  
Michael A. Cohen ◽  
George A. Alvarez ◽  
Ken Nakayama ◽  
Talia Konkle

Visual search is a ubiquitous visual behavior, and efficient search is essential for survival. Different cognitive models have explained the speed and accuracy of search based either on the dynamics of attention or on similarity of item representations. Here, we examined the extent to which performance on a visual search task can be predicted from the stable representational architecture of the visual system, independent of attentional dynamics. Participants performed a visual search task with 28 conditions reflecting different pairs of categories (e.g., searching for a face among cars, body among hammers, etc.). The time it took participants to find the target item varied as a function of category combination. In a separate group of participants, we measured the neural responses to these object categories when items were presented in isolation. Using representational similarity analysis, we then examined whether the similarity of neural responses across different subdivisions of the visual system had the requisite structure needed to predict visual search performance. Overall, we found strong brain/behavior correlations across most of the higher-level visual system, including both the ventral and dorsal pathways when considering both macroscale sectors as well as smaller mesoscale regions. These results suggest that visual search for real-world object categories is well predicted by the stable, task-independent architecture of the visual system. NEW & NOTEWORTHY Here, we ask which neural regions have neural response patterns that correlate with behavioral performance in a visual processing task. We found that the representational structure across all of high-level visual cortex has the requisite structure to predict behavior. Furthermore, when directly comparing different neural regions, we found that they all had highly similar category-level representational structures. These results point to a ubiquitous and uniform representational structure in high-level visual cortex underlying visual object processing.


Author(s):  
Kohitij Kar ◽  
James J DiCarlo

SummaryDistributed neural population spiking patterns in macaque inferior temporal (IT) cortex that support core visual object recognition require additional time to develop for specific (“late-solved”) images suggesting the necessity of recurrent processing in these computations. Which brain circuit motifs are most responsible for computing and transmitting these putative recurrent signals to IT? To test whether the ventral prefrontal cortex (vPFC) is a critical recurrent circuit node in this system, here we pharmacologically inactivated parts of the vPFC and simultaneously measured IT population activity, while monkeys performed object discrimination tasks. Our results show that vPFC inactivation deteriorated the quality of the late-phase (>150 ms from image onset) IT population code, along with commensurate, specific behavioral deficits for “late-solved” images. Finally, silencing vPFC caused the monkeys’ IT activity patterns and behavior to become more like those produced by feedforward artificial neural network models of the ventral stream. Together with prior work, these results argue that fast recurrent processing through the vPFC is critical to the production of behaviorally-sufficient object representations in IT.


Science ◽  
2020 ◽  
Vol 367 (6482) ◽  
pp. 1112-1119 ◽  
Author(s):  
Gerit Arne Linneweber ◽  
Maheva Andriatsilavo ◽  
Suchetana Bias Dutta ◽  
Mercedes Bengochea ◽  
Liz Hellbruegge ◽  
...  

The genome versus experience dichotomy has dominated understanding of behavioral individuality. By contrast, the role of nonheritable noise during brain development in behavioral variation is understudied. Using Drosophila melanogaster, we demonstrate a link between stochastic variation in brain wiring and behavioral individuality. A visual system circuit called the dorsal cluster neurons (DCN) shows nonheritable, interindividual variation in right/left wiring asymmetry and controls object orientation in freely walking flies. We show that DCN wiring asymmetry instructs an individual’s object responses: The greater the asymmetry, the better the individual orients toward a visual object. Silencing DCNs abolishes correlations between anatomy and behavior, whereas inducing DCN asymmetry suffices to improve object responses.


2019 ◽  
Vol 5 (5) ◽  
pp. eaav7903 ◽  
Author(s):  
Khaled Nasr ◽  
Pooja Viswanathan ◽  
Andreas Nieder

Humans and animals have a “number sense,” an innate capability to intuitively assess the number of visual items in a set, its numerosity. This capability implies that mechanisms to extract numerosity indwell the brain’s visual system, which is primarily concerned with visual object recognition. Here, we show that network units tuned to abstract numerosity, and therefore reminiscent of real number neurons, spontaneously emerge in a biologically inspired deep neural network that was merely trained on visual object recognition. These numerosity-tuned units underlay the network’s number discrimination performance that showed all the characteristics of human and animal number discriminations as predicted by the Weber-Fechner law. These findings explain the spontaneous emergence of the number sense based on mechanisms inherent to the visual system.


2020 ◽  
Author(s):  
Doris Voina ◽  
Stefano Recanatesi ◽  
Brian Hu ◽  
Eric Shea-Brown ◽  
Stefan Mihalas

AbstractAs animals adapt to their environments, their brains are tasked with processing stimuli in different sensory contexts. Whether these computations are context dependent or independent, they are all implemented in the same neural tissue. A crucial question is what neural architectures can respond flexibly to a range of stimulus conditions and switch between them. This is a particular case of flexible architecture that permits multiple related computations within a single circuit.Here, we address this question in the specific case of the visual system circuitry, focusing on context integration, defined as the integration of feedforward and surround information across visual space. We show that a biologically inspired microcircuit with multiple inhibitory cell types can switch between visual processing of the static context and the moving context. In our model, the VIP population acts as the switch and modulates the visual circuit through a disinhibitory motif. Moreover, the VIP population is efficient, requiring only a relatively small number of neurons to switch contexts. This circuit eliminates noise in videos by using appropriate lateral connections for contextual spatio-temporal surround modulation, having superior denoising performance compared to circuits where only one context is learned. Our findings shed light on a minimally complex architecture that is capable of switching between two naturalistic contexts using few switching units.Author SummaryThe brain processes information at all times and much of that information is context-dependent. The visual system presents an important example: processing is ongoing, but the context changes dramatically when an animal is still vs. running. How is context-dependent information processing achieved? We take inspiration from recent neurophysiology studies on the role of distinct cell types in primary visual cortex (V1).We find that relatively few “switching units” — akin to the VIP neuron type in V1 in that they turn on and off in the running vs. still context and have connections to and from the main population — is sufficient to drive context dependent image processing. We demonstrate this in a model of feature integration, and in a test of image denoising. The underlying circuit architecture illustrates a concrete computational role for the multiple cell types under increasing study across the brain, and may inspire more flexible neurally inspired computing architectures.


2021 ◽  
Author(s):  
Aran Nayebi ◽  
Nathan C. L. Kong ◽  
Chengxu Zhuang ◽  
Justin L. Gardner ◽  
Anthony M. Norcia ◽  
...  

Task-optimized deep convolutional neural networks are the most quantitatively accurate models of the primate ventral visual stream. However, such networks are implausible as a model of the mouse visual system because mouse visual cortex has a known shallower hierarchy and the supervised objectives these networks are typically trained with are likely neither ethologically relevant in content nor in quantity. Here we develop shallow network architectures that are more consistent with anatomical and physiological studies of mouse visual cortex than current models. We demonstrate that hierarchically shallow architectures trained using contrastive objective functions applied to visual-acuity-adapted images achieve neural prediction performance that exceed those of the same architectures trained in a supervised manner and result in the most quantitatively accurate models of the mouse visual system. Moreover, these models' neural predictivity significantly surpasses those of supervised, deep architectures that are known to correspond well to the primate ventral visual stream. Finally, we derive a novel measure of inter-animal consistency, and show that the best models closely match this quantity across visual areas. Taken together, our results suggest that contrastive objectives operating on shallow architectures with ethologically-motivated image transformations may be a biologically-plausible computational theory of visual coding in mice.


2021 ◽  
Vol 118 (3) ◽  
pp. e2014196118
Author(s):  
Chengxu Zhuang ◽  
Siming Yan ◽  
Aran Nayebi ◽  
Martin Schrimpf ◽  
Michael C. Frank ◽  
...  

Deep neural networks currently provide the best quantitative models of the response patterns of neurons throughout the primate ventral visual stream. However, such networks have remained implausible as a model of the development of the ventral stream, in part because they are trained with supervised methods requiring many more labels than are accessible to infants during development. Here, we report that recent rapid progress in unsupervised learning has largely closed this gap. We find that neural network models learned with deep unsupervised contrastive embedding methods achieve neural prediction accuracy in multiple ventral visual cortical areas that equals or exceeds that of models derived using today’s best supervised methods and that the mapping of these neural network models’ hidden layers is neuroanatomically consistent across the ventral stream. Strikingly, we find that these methods produce brain-like representations even when trained solely with real human child developmental data collected from head-mounted cameras, despite the fact that these datasets are noisy and limited. We also find that semisupervised deep contrastive embeddings can leverage small numbers of labeled examples to produce representations with substantially improved error-pattern consistency to human behavior. Taken together, these results illustrate a use of unsupervised learning to provide a quantitative model of a multiarea cortical brain system and present a strong candidate for a biologically plausible computational theory of primate sensory learning.


2016 ◽  
Author(s):  
Darren Seibert ◽  
Daniel L Yamins ◽  
Diego Ardila ◽  
Ha Hong ◽  
James J DiCarlo ◽  
...  

Human visual object recognition is subserved by a multitude of cortical areas. To make sense of this system, one line of research focused on response properties of primary visual cortex neurons and developed theoretical models of a set of canonical computations such as convolution, thresholding, exponentiating and normalization that could be hierarchically repeated to give rise to more complex representations. Another line or research focused on response properties of high-level visual cortex and linked these to semantic categories useful for object recognition. Here, we hypothesized that the panoply of visual representations in the human ventral stream may be understood as emergent properties of a system constrained both by simple canonical computations and by top-level, object recognition functionality in a single unified framework (Yamins et al., 2014; Khaligh-Razavi and Kriegeskorte, 2014; Guclu and van Gerven, 2015). We built a deep convolutional neural network model optimized for object recognition and compared representations at various model levels using representational similarity analysis to human functional imaging responses elicited from viewing hundreds of image stimuli. Neural network layers developed representations that corresponded in a hierarchical consistent fashion to visual areas from V1 to LOC. This correspondence increased with optimization of the model's recognition performance. These findings support a unified view of the ventral stream in which representations from the earliest to the latest stages can be understood as being built from basic computations inspired by modeling of early visual cortex shaped by optimization for high-level object-based performance constraints.


Sign in / Sign up

Export Citation Format

Share Document