scholarly journals CORnet: Modeling the Neural Mechanisms of Core Object Recognition

2018 ◽  
Author(s):  
Jonas Kubilius ◽  
Martin Schrimpf ◽  
Aran Nayebi ◽  
Daniel Bear ◽  
Daniel L. K. Yamins ◽  
...  

AbstractDeep artificial neural networks with spatially repeated processing (a.k.a., deep convolutional ANNs) have been established as the best class of candidate models of visual processing in primate ventral visual processing stream. Over the past five years, these ANNs have evolved from a simple feedforward eight-layer architecture in AlexNet to extremely deep and branching NAS-Net architectures, demonstrating increasingly better object categorization performance and increasingly better explanatory power of both neural and behavioral responses. However, from the neuroscientist’s point of view, the relationship between such very deep architectures and the ventral visual pathway is incomplete in at least two ways. On the one hand, current state-of-the-art ANNs appear to be too complex (e.g., now over 100 levels) compared with the relatively shallow cortical hierarchy (4-8 levels), which makes it difficult to map their elements to those in the ventral visual stream and to understand what they are doing. On the other hand, current state-of-the-art ANNs appear to be not complex enough in that they lack recurrent connections and the resulting neural response dynamics that are commonplace in the ventral visual stream. Here we describe our ongoing efforts to resolve both of these issues by developing a “CORnet” family of deep neural network architectures. Rather than just seeking high object recognition performance (as the state-of-the-art ANNs above), we instead try to reduce the model family to its most important elements and then gradually build new ANNs with recurrent and skip connections while monitoring both performance and the match between each new CORnet model and a large body of primate brain and behavioral data. We report here that our current best ANN model derived from this approach (CORnet-S) is among the top models on Brain-Score, a composite benchmark for comparing models to the brain, but is simpler than other deep ANNs in terms of the number of convolutions performed along the longest path of information processing in the model. All CORnet models are available at github.com/dicarlolab/CORnet, and we plan to up-date this manuscript and the available models in this family as they are produced.

Author(s):  
Albert L. Rothenstein

Most biologically-inspired models of object recognition rely on a feed-forward architecture in which abstract representations are gradually built from simple representations, but recognition performance in such systems drops when multiple objects are present in the input. This chapter puts forward the proposal that by using multiple passes of the visual processing hierarchy, both bottom-up and top-down, it is possible to address the limitations of feed-forward architectures and explain the different recognition behaviors that primate vision exhibits. The model relies on the reentrant connections that are ubiquitous in the primate brain to recover spatial information, and thus allow for the selective processing of stimuli. The chapter ends with a discussion of the implications of this work, its explanatory power, and a number of predictions for future experimental work.


2016 ◽  
Author(s):  
Darren Seibert ◽  
Daniel L Yamins ◽  
Diego Ardila ◽  
Ha Hong ◽  
James J DiCarlo ◽  
...  

Human visual object recognition is subserved by a multitude of cortical areas. To make sense of this system, one line of research focused on response properties of primary visual cortex neurons and developed theoretical models of a set of canonical computations such as convolution, thresholding, exponentiating and normalization that could be hierarchically repeated to give rise to more complex representations. Another line or research focused on response properties of high-level visual cortex and linked these to semantic categories useful for object recognition. Here, we hypothesized that the panoply of visual representations in the human ventral stream may be understood as emergent properties of a system constrained both by simple canonical computations and by top-level, object recognition functionality in a single unified framework (Yamins et al., 2014; Khaligh-Razavi and Kriegeskorte, 2014; Guclu and van Gerven, 2015). We built a deep convolutional neural network model optimized for object recognition and compared representations at various model levels using representational similarity analysis to human functional imaging responses elicited from viewing hundreds of image stimuli. Neural network layers developed representations that corresponded in a hierarchical consistent fashion to visual areas from V1 to LOC. This correspondence increased with optimization of the model's recognition performance. These findings support a unified view of the ventral stream in which representations from the earliest to the latest stages can be understood as being built from basic computations inspired by modeling of early visual cortex shaped by optimization for high-level object-based performance constraints.


2000 ◽  
Vol 12 (4) ◽  
pp. 615-621 ◽  
Author(s):  
Glen M. Doniger ◽  
John J. Foxe ◽  
Micah M. Murray ◽  
Beth A. Higgins ◽  
Joan Gay Snodgrass ◽  
...  

Object recognition is achieved even in circumstances when only partial information is available to the observer. Perceptual closure processes are essential in enabling such recognitions to occur. We presented successively less fragmented images while recording high-density event-related potentials (ERPs), which permitted us to monitor brain activity during the perceptual closure processes leading up to object recognition. We reveal a bilateral ERP component (Ncl) that tracks these processes (onsets ∼ 230 msec, maximal at ∼290 msec). Scalp-current density mapping of the Ncl revealed bilateral occipito-temporal scalp foci, which are consistent with generators in the human ventral visual stream, and specifically the lateral-occipital or LO complex as defined by hemodynamic studies of object recognition.


2020 ◽  
Author(s):  
Franziska Geiger ◽  
Martin Schrimpf ◽  
Tiago Marques ◽  
James J. DiCarlo

AbstractAfter training on large datasets, certain deep neural networks are surprisingly good models of the neural mechanisms of adult primate visual object recognition. Nevertheless, these models are poor models of the development of the visual system because they posit millions of sequential, precisely coordinated synaptic updates, each based on a labeled image. While ongoing research is pursuing the use of unsupervised proxies for labels, we here explore a complementary strategy of reducing the required number of supervised synaptic updates to produce an adult-like ventral visual stream (as judged by the match to V1, V2, V4, IT, and behavior). Such models might require less precise machinery and energy expenditure to coordinate these updates and would thus move us closer to viable neuroscientific hypotheses about how the visual system wires itself up. Relative to the current leading model of the adult ventral stream, we here demonstrate that the total number of supervised weight updates can be substantially reduced using three complementary strategies: First, we find that only 2% of supervised updates (epochs and images) are needed to achieve ~80% of the match to adult ventral stream. Second, by improving the random distribution of synaptic connectivity, we find that 54% of the brain match can already be achieved “at birth” (i.e. no training at all). Third, we find that, by training only ~5% of model synapses, we can still achieve nearly 80% of the match to the ventral stream. When these three strategies are applied in combination, we find that these new models achieve ~80% of a fully trained model’s match to the brain, while using two orders of magnitude fewer supervised synaptic updates. These results reflect first steps in modeling not just primate adult visual processing during inference, but also how the ventral visual stream might be “wired up” by evolution (a model’s “birth” state) and by developmental learning (a model’s updates based on visual experience).


2021 ◽  
Author(s):  
Moritz Wurm ◽  
Alfonso Caramazza

The ventral visual stream is conceived as a pathway for object recognition. However, we also recognize the actions an object can be involved in. Here, we show that action recognition relies on a pathway in lateral occipitotemporal cortex, partially overlapping and topographically aligned with object representations that are precursors for action recognition. By contrast, object features that are more relevant for object recognition, such as color and texture, are restricted to medial areas of the ventral stream. We argue that the ventral stream bifurcates into lateral and medial pathways for action and object recognition, respectively. This account explains a number of observed phenomena, such as the duplication of object domains and the specific representational profiles in lateral and medial areas.


2000 ◽  
Vol 6 (4) ◽  
pp. 455-459 ◽  
Author(s):  
ANNA M. BARRETT ◽  
J. BRENT CROSSON ◽  
GREGORY P. CRUCIAN ◽  
KENNETH M. HEILMAN

Whereas the ventral cortical visual stream is important in object recognition, the dorsal stream is specialized for spatial localization. In humans there are also right and left hemisphere asymmetries in visual processing: the left hemisphere being more important in object recognition and the right in specifying spatial locations. Based on these dorsal–ventral and right–left where–what dichotomies, one would expect that the dorsal right hemisphere systems would be most activated during spatial localization tasks, and this activation may induce a leftward spatial bias in lower space. To determine if visual stimuli in upper and lower body space evoke different hemispheric activation, we had 12 normal participants bisect horizontal lines above and below eye level. Participants erred leftward in lower body space relative to upper body space (M = 1.3345 mm and 0.4225 mm, respectively; p = .011). In upper body space, bisection errors did not differ from zero, but in lower body space, errors tended to deviate leftward (M = 1.3345 mm, differs from null hypotheses at p = .0755). Our results are consistent with dorsal stream/right hemisphere activation when performing a spatial localization task in lower versus upper body space. (JINS, 2000, 6, 455–459.)


2018 ◽  
Vol 115 (35) ◽  
pp. 8835-8840 ◽  
Author(s):  
Hanlin Tang ◽  
Martin Schrimpf ◽  
William Lotter ◽  
Charlotte Moerman ◽  
Ana Paredes ◽  
...  

Making inferences from partial information constitutes a critical aspect of cognition. During visual perception, pattern completion enables recognition of poorly visible or occluded objects. We combined psychophysics, physiology, and computational models to test the hypothesis that pattern completion is implemented by recurrent computations and present three pieces of evidence that are consistent with this hypothesis. First, subjects robustly recognized objects even when they were rendered <15% visible, but recognition was largely impaired when processing was interrupted by backward masking. Second, invasive physiological responses along the human ventral cortex exhibited visually selective responses to partially visible objects that were delayed compared with whole objects, suggesting the need for additional computations. These physiological delays were correlated with the effects of backward masking. Third, state-of-the-art feed-forward computational architectures were not robust to partial visibility. However, recognition performance was recovered when the model was augmented with attractor-based recurrent connectivity. The recurrent model was able to predict which images of heavily occluded objects were easier or harder for humans to recognize, could capture the effect of introducing a backward mask on recognition behavior, and was consistent with the physiological delays along the human ventral visual stream. These results provide a strong argument of plausibility for the role of recurrent computations in making visual inferences from partial information.


2008 ◽  
Vol 18 (10) ◽  
pp. 2402-2409 ◽  
Author(s):  
Deepak Sarpal ◽  
Bradley R. Buchsbaum ◽  
Philip D. Kohn ◽  
J. Shane Kippenhan ◽  
Carolyn B. Mervis ◽  
...  

2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Irina Higgins ◽  
Le Chang ◽  
Victoria Langston ◽  
Demis Hassabis ◽  
Christopher Summerfield ◽  
...  

AbstractIn order to better understand how the brain perceives faces, it is important to know what objective drives learning in the ventral visual stream. To answer this question, we model neural responses to faces in the macaque inferotemporal (IT) cortex with a deep self-supervised generative model, β-VAE, which disentangles sensory data into interpretable latent factors, such as gender or age. Our results demonstrate a strong correspondence between the generative factors discovered by β-VAE and those coded by single IT neurons, beyond that found for the baselines, including the handcrafted state-of-the-art model of face perception, the Active Appearance Model, and deep classifiers. Moreover, β-VAE is able to reconstruct novel face images using signals from just a handful of cells. Together our results imply that optimising the disentangling objective leads to representations that closely resemble those in the IT at the single unit level. This points at disentangling as a plausible learning objective for the visual brain.


Sign in / Sign up

Export Citation Format

Share Document