scholarly journals Visual number sense in untrained deep neural networks

2021 ◽  
Vol 7 (1) ◽  
pp. eabd6127
Author(s):  
Gwangsu Kim ◽  
Jaeson Jang ◽  
Seungdae Baek ◽  
Min Song ◽  
Se-Bum Paik

Number sense, the ability to estimate numerosity, is observed in naïve animals, but how this cognitive function emerges in the brain remains unclear. Here, using an artificial deep neural network that models the ventral visual stream of the brain, we show that number-selective neurons can arise spontaneously, even in the complete absence of learning. We also show that the responses of these neurons can induce the abstract number sense, the ability to discriminate numerosity independent of low-level visual cues. We found number tuning in a randomly initialized network originating from a combination of monotonically decreasing and increasing neuronal activities, which emerges spontaneously from the statistical properties of bottom-up projections. We confirmed that the responses of these number-selective neurons show the single- and multineuron characteristics observed in the brain and enable the network to perform number comparison tasks. These findings provide insight into the origin of innate cognitive functions.

2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Seungdae Baek ◽  
Min Song ◽  
Jaeson Jang ◽  
Gwangsu Kim ◽  
Se-Bum Paik

AbstractFace-selective neurons are observed in the primate visual pathway and are considered as the basis of face detection in the brain. However, it has been debated as to whether this neuronal selectivity can arise innately or whether it requires training from visual experience. Here, using a hierarchical deep neural network model of the ventral visual stream, we suggest a mechanism in which face-selectivity arises in the complete absence of training. We found that units selective to faces emerge robustly in randomly initialized networks and that these units reproduce many characteristics observed in monkeys. This innate selectivity also enables the untrained network to perform face-detection tasks. Intriguingly, we observed that units selective to various non-face objects can also arise innately in untrained networks. Our results imply that the random feedforward connections in early, untrained deep neural networks may be sufficient for initializing primitive visual selectivity.


2019 ◽  
Author(s):  
Gwangsu Kim ◽  
Jaeson Jang ◽  
Seungdae Baek ◽  
Min Song ◽  
Se-Bum Paik

AbstractNumber-selective neurons are observed in numerically naïve animals, but it was not understood how this innate function emerges in the brain. Here, we show that neurons tuned to numbers can arise in random feedforward networks, even in the complete absence of learning. Using a biologically inspired deep neural network, we found that number tuning arises in three cases of networks: one trained to non-numerical natural images, one randomized after trained, and one never trained. Number-tuned neurons showed characteristics that were observed in the brain following the Weber-Fechner law. These neurons suddenly vanished when the feedforward weight variation decreased to a certain level. These results suggest that number tuning can develop from the statistical variation of bottom-up projections in the visual pathway, initializing innate number sense.


2018 ◽  
Vol 8 (4) ◽  
pp. 20180013 ◽  
Author(s):  
Kalanit Grill-Spector ◽  
Kevin S. Weiner ◽  
Jesse Gomez ◽  
Anthony Stigliani ◽  
Vaidehi S. Natu

A central goal in neuroscience is to understand how processing within the ventral visual stream enables rapid and robust perception and recognition. Recent neuroscientific discoveries have significantly advanced understanding of the function, structure and computations along the ventral visual stream that serve as the infrastructure supporting this behaviour. In parallel, significant advances in computational models, such as hierarchical deep neural networks (DNNs), have brought machine performance to a level that is commensurate with human performance. Here, we propose a new framework using the ventral face network as a model system to illustrate how increasing the neural accuracy of present DNNs may allow researchers to test the computational benefits of the functional architecture of the human brain. Thus, the review (i) considers specific neural implementational features of the ventral face network, (ii) describes similarities and differences between the functional architecture of the brain and DNNs, and (iii) provides a hypothesis for the computational value of implementational features within the brain that may improve DNN performance. Importantly, this new framework promotes the incorporation of neuroscientific findings into DNNs in order to test the computational benefits of fundamental organizational features of the visual system.


2020 ◽  
Author(s):  
Franziska Geiger ◽  
Martin Schrimpf ◽  
Tiago Marques ◽  
James J. DiCarlo

AbstractAfter training on large datasets, certain deep neural networks are surprisingly good models of the neural mechanisms of adult primate visual object recognition. Nevertheless, these models are poor models of the development of the visual system because they posit millions of sequential, precisely coordinated synaptic updates, each based on a labeled image. While ongoing research is pursuing the use of unsupervised proxies for labels, we here explore a complementary strategy of reducing the required number of supervised synaptic updates to produce an adult-like ventral visual stream (as judged by the match to V1, V2, V4, IT, and behavior). Such models might require less precise machinery and energy expenditure to coordinate these updates and would thus move us closer to viable neuroscientific hypotheses about how the visual system wires itself up. Relative to the current leading model of the adult ventral stream, we here demonstrate that the total number of supervised weight updates can be substantially reduced using three complementary strategies: First, we find that only 2% of supervised updates (epochs and images) are needed to achieve ~80% of the match to adult ventral stream. Second, by improving the random distribution of synaptic connectivity, we find that 54% of the brain match can already be achieved “at birth” (i.e. no training at all). Third, we find that, by training only ~5% of model synapses, we can still achieve nearly 80% of the match to the ventral stream. When these three strategies are applied in combination, we find that these new models achieve ~80% of a fully trained model’s match to the brain, while using two orders of magnitude fewer supervised synaptic updates. These results reflect first steps in modeling not just primate adult visual processing during inference, but also how the ventral visual stream might be “wired up” by evolution (a model’s “birth” state) and by developmental learning (a model’s updates based on visual experience).


2017 ◽  
Author(s):  
Michael F. Bonner ◽  
Russell A. Epstein

ABSTRACTBiologically inspired deep convolutional neural networks (CNNs), trained for computer vision tasks, have been found to predict cortical responses with remarkable accuracy. However, the complex internal operations of these models remain poorly understood, and the factors that account for their success are unknown. Here we developed a set of techniques for using CNNs to gain insights into the computational mechanisms underlying cortical responses. We focused on responses in the occipital place area (OPA), a scene-selective region of dorsal occipitoparietal cortex. In a previous study, we showed that fMRI activation patterns in the OPA contain information about the navigational affordances of scenes: that is, information about where one can and cannot move within the immediate environment. We hypothesized that this affordance information could be extracted using a set of purely feedforward computations. To test this idea, we examined a deep CNN with a feedforward architecture that had been previously trained for scene classification. We found that the CNN was highly predictive of OPA representations, and, importantly, that it accounted for the portion of OPA variance that reflected the navigational affordances of scenes. The CNN could thus serve as an image-computable candidate model of affordance-related responses in the OPA. We then ran a series of in silico experiments on this model to gain insights into its internal computations. These analyses showed that the computation of affordance-related features relied heavily on visual information at high-spatial frequencies and cardinal orientations, both of which have previously been identified as low-level stimulus preferences of scene-selective visual cortex. These computations also exhibited a strong preference for information in the lower visual field, which is consistent with known retinotopic biases in the OPA. Visualizations of feature selectivity within the CNN suggested that affordance-based responses encoded features that define the layout of the spatial environment, such as boundary-defining junctions and large extended surfaces. Together, these results map the sensory functions of the OPA onto a fully quantitative model that provides insights into its visual computations. More broadly, they advance integrative techniques for understanding visual cortex across multiple level of analysis: from the identification of cortical sensory functions to the modeling of their underlying algorithmic implementations.AUTHOR SUMMARYHow does visual cortex compute behaviorally relevant properties of the local environment from sensory inputs? For decades, computational models have been able to explain only the earliest stages of biological vision, but recent advances in the engineering of deep neural networks have yielded a breakthrough in the modeling of high-level visual cortex. However, these models are not explicitly designed for testing neurobiological theories, and, like the brain itself, their complex internal operations remain poorly understood. Here we examined a deep neural network for insights into the cortical representation of the navigational affordances of visual scenes. In doing so, we developed a set of high-throughput techniques and statistical tools that are broadly useful for relating the internal operations of neural networks with the information processes of the brain. Our findings demonstrate that a deep neural network with purely feedforward computations can account for the processing of navigational layout in high-level visual cortex. We next performed a series of experiments and visualization analyses on this neural network, which characterized a set of stimulus input features that may be critical for computing navigationally related cortical representations and identified a set of high-level, complex scene features that may serve as a basis set for the cortical coding of navigational layout. These findings suggest a computational mechanism through which high-level visual cortex might encode the spatial structure of the local navigational environment, and they demonstrate an experimental approach for leveraging the power of deep neural networks to understand the visual computations of the brain.


2017 ◽  
Vol 40 ◽  
Author(s):  
Ivilin Peev Stoianov ◽  
Marco Zorzi

AbstractWe provide an emergentist perspective on the computational mechanism underlying numerosity perception, its development, and the role of inhibition, based on our deep neural network model. We argue that the influence of continuous visual properties does not challenge the notion of number sense, but reveals limit conditions for the computation that yields invariance in numerosity perception. Alternative accounts should be formalized in a computational model.


2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Irina Higgins ◽  
Le Chang ◽  
Victoria Langston ◽  
Demis Hassabis ◽  
Christopher Summerfield ◽  
...  

AbstractIn order to better understand how the brain perceives faces, it is important to know what objective drives learning in the ventral visual stream. To answer this question, we model neural responses to faces in the macaque inferotemporal (IT) cortex with a deep self-supervised generative model, β-VAE, which disentangles sensory data into interpretable latent factors, such as gender or age. Our results demonstrate a strong correspondence between the generative factors discovered by β-VAE and those coded by single IT neurons, beyond that found for the baselines, including the handcrafted state-of-the-art model of face perception, the Active Appearance Model, and deep classifiers. Moreover, β-VAE is able to reconstruct novel face images using signals from just a handful of cells. Together our results imply that optimising the disentangling objective leads to representations that closely resemble those in the IT at the single unit level. This points at disentangling as a plausible learning objective for the visual brain.


2014 ◽  
Vol 111 (1) ◽  
pp. 91-102 ◽  
Author(s):  
Leyla Isik ◽  
Ethan M. Meyers ◽  
Joel Z. Leibo ◽  
Tomaso Poggio

The human visual system can rapidly recognize objects despite transformations that alter their appearance. The precise timing of when the brain computes neural representations that are invariant to particular transformations, however, has not been mapped in humans. Here we employ magnetoencephalography decoding analysis to measure the dynamics of size- and position-invariant visual information development in the ventral visual stream. With this method we can read out the identity of objects beginning as early as 60 ms. Size- and position-invariant visual information appear around 125 ms and 150 ms, respectively, and both develop in stages, with invariance to smaller transformations arising before invariance to larger transformations. Additionally, the magnetoencephalography sensor activity localizes to neural sources that are in the most posterior occipital regions at the early decoding times and then move temporally as invariant information develops. These results provide previously unknown latencies for key stages of human-invariant object recognition, as well as new and compelling evidence for a feed-forward hierarchical model of invariant object recognition where invariance increases at each successive visual area along the ventral stream.


Sign in / Sign up

Export Citation Format

Share Document