CORnet: Modeling the Neural Mechanisms of Core Object Recognition

AbstractDeep artificial neural networks with spatially repeated processing (a.k.a., deep convolutional ANNs) have been established as the best class of candidate models of visual processing in primate ventral visual processing stream. Over the past five years, these ANNs have evolved from a simple feedforward eight-layer architecture in AlexNet to extremely deep and branching NAS-Net architectures, demonstrating increasingly better object categorization performance and increasingly better explanatory power of both neural and behavioral responses. However, from the neuroscientist’s point of view, the relationship between such very deep architectures and the ventral visual pathway is incomplete in at least two ways. On the one hand, current state-of-the-art ANNs appear to be too complex (e.g., now over 100 levels) compared with the relatively shallow cortical hierarchy (4-8 levels), which makes it difficult to map their elements to those in the ventral visual stream and to understand what they are doing. On the other hand, current state-of-the-art ANNs appear to be not complex enough in that they lack recurrent connections and the resulting neural response dynamics that are commonplace in the ventral visual stream. Here we describe our ongoing efforts to resolve both of these issues by developing a “CORnet” family of deep neural network architectures. Rather than just seeking high object recognition performance (as the state-of-the-art ANNs above), we instead try to reduce the model family to its most important elements and then gradually build new ANNs with recurrent and skip connections while monitoring both performance and the match between each new CORnet model and a large body of primate brain and behavioral data. We report here that our current best ANN model derived from this approach (CORnet-S) is among the top models on Brain-Score, a composite benchmark for comparing models to the brain, but is simpler than other deep ANNs in terms of the number of convolutions performed along the longest path of information processing in the model. All CORnet models are available at github.com/dicarlolab/CORnet, and we plan to up-date this manuscript and the available models in this family as they are produced.

Download Full-text

Task, Timing, and Representation in Visual Object Recognition

Developing and Applying Biologically-Inspired Vision Systems ◽

10.4018/978-1-4666-2539-6.ch003 ◽

2012 ◽

pp. 44-64

Author(s):

Albert L. Rothenstein

Keyword(s):

Object Recognition ◽

Visual Processing ◽

Spatial Information ◽

Recognition Performance ◽

Explanatory Power ◽

Visual Object ◽

Visual Object Recognition ◽

Biologically Inspired ◽

Feed Forward ◽

Primate Vision

Most biologically-inspired models of object recognition rely on a feed-forward architecture in which abstract representations are gradually built from simple representations, but recognition performance in such systems drops when multiple objects are present in the input. This chapter puts forward the proposal that by using multiple passes of the visual processing hierarchy, both bottom-up and top-down, it is possible to address the limitations of feed-forward architectures and explain the different recognition behaviors that primate vision exhibits. The model relies on the reentrant connections that are ubiquitous in the primate brain to recover spatial information, and thus allow for the selective processing of stimuli. The chapter ends with a discussion of the implications of this work, its explanatory power, and a number of predictions for future experimental work.

Download Full-text

Object Recognition at Higher Regions of the Ventral Visual Stream via Dynamic Inference

Frontiers in Computational Neuroscience ◽

10.3389/fncom.2020.00046 ◽

2020 ◽

Vol 14 ◽

Author(s):

Siamak K. Sorooshyari ◽

Huanjie Sheng ◽

H. Vincent Poor

Keyword(s):

Object Recognition ◽

Visual Stream ◽

Ventral Visual Stream

Download Full-text

A performance-optimized model of neural responses across the ventral visual stream

10.1101/036475 ◽

2016 ◽

Cited By ~ 8

Author(s):

Darren Seibert ◽

Daniel L Yamins ◽

Diego Ardila ◽

Ha Hong ◽

James J DiCarlo ◽

...

Keyword(s):

Neural Network ◽

Visual Cortex ◽

Object Recognition ◽

Recognition Performance ◽

Ventral Stream ◽

Visual Object ◽

Emergent Properties ◽

Visual Stream ◽

Response Properties ◽

High Level

Human visual object recognition is subserved by a multitude of cortical areas. To make sense of this system, one line of research focused on response properties of primary visual cortex neurons and developed theoretical models of a set of canonical computations such as convolution, thresholding, exponentiating and normalization that could be hierarchically repeated to give rise to more complex representations. Another line or research focused on response properties of high-level visual cortex and linked these to semantic categories useful for object recognition. Here, we hypothesized that the panoply of visual representations in the human ventral stream may be understood as emergent properties of a system constrained both by simple canonical computations and by top-level, object recognition functionality in a single unified framework (Yamins et al., 2014; Khaligh-Razavi and Kriegeskorte, 2014; Guclu and van Gerven, 2015). We built a deep convolutional neural network model optimized for object recognition and compared representations at various model levels using representational similarity analysis to human functional imaging responses elicited from viewing hundreds of image stimuli. Neural network layers developed representations that corresponded in a hierarchical consistent fashion to visual areas from V1 to LOC. This correspondence increased with optimization of the model's recognition performance. These findings support a unified view of the ventral stream in which representations from the earliest to the latest stages can be understood as being built from basic computations inspired by modeling of early visual cortex shaped by optimization for high-level object-based performance constraints.

Download Full-text

Activation Timecourse of Ventral Visual Stream Object-recognition Areas: High Density Electrical Mapping of Perceptual Closure Processes

Journal of Cognitive Neuroscience ◽

10.1162/089892900562372 ◽

2000 ◽

Vol 12 (4) ◽

pp. 615-621 ◽

Cited By ~ 161

Author(s):

Glen M. Doniger ◽

John J. Foxe ◽

Micah M. Murray ◽

Beth A. Higgins ◽

Joan Gay Snodgrass ◽

...

Keyword(s):

Object Recognition ◽

Partial Information ◽

Brain Activity ◽

Event Related Potentials ◽

High Density ◽

Visual Stream ◽

Density Mapping ◽

Related Potentials ◽

Ventral Visual Stream ◽

Perceptual Closure

Object recognition is achieved even in circumstances when only partial information is available to the observer. Perceptual closure processes are essential in enabling such recognitions to occur. We presented successively less fragmented images while recording high-density event-related potentials (ERPs), which permitted us to monitor brain activity during the perceptual closure processes leading up to object recognition. We reveal a bilateral ERP component (Ncl) that tracks these processes (onsets ∼ 230 msec, maximal at ∼290 msec). Scalp-current density mapping of the Ncl revealed bilateral occipito-temporal scalp foci, which are consistent with generators in the human ventral visual stream, and specifically the lateral-occipital or LO complex as defined by hemodynamic studies of object recognition.

Download Full-text

Wiring Up Vision: Minimizing Supervised Synaptic Updates Needed to Produce a Primate Ventral Stream

10.1101/2020.06.08.140111 ◽

2020 ◽

Author(s):

Franziska Geiger ◽

Martin Schrimpf ◽

Tiago Marques ◽

James J. DiCarlo

Keyword(s):

Visual System ◽

Visual Processing ◽

Ventral Stream ◽

Visual Object ◽

Visual Object Recognition ◽

Visual Stream ◽

Ongoing Research ◽

Ventral Visual Stream ◽

And Behavior ◽

The Brain

AbstractAfter training on large datasets, certain deep neural networks are surprisingly good models of the neural mechanisms of adult primate visual object recognition. Nevertheless, these models are poor models of the development of the visual system because they posit millions of sequential, precisely coordinated synaptic updates, each based on a labeled image. While ongoing research is pursuing the use of unsupervised proxies for labels, we here explore a complementary strategy of reducing the required number of supervised synaptic updates to produce an adult-like ventral visual stream (as judged by the match to V1, V2, V4, IT, and behavior). Such models might require less precise machinery and energy expenditure to coordinate these updates and would thus move us closer to viable neuroscientific hypotheses about how the visual system wires itself up. Relative to the current leading model of the adult ventral stream, we here demonstrate that the total number of supervised weight updates can be substantially reduced using three complementary strategies: First, we find that only 2% of supervised updates (epochs and images) are needed to achieve ~80% of the match to adult ventral stream. Second, by improving the random distribution of synaptic connectivity, we find that 54% of the brain match can already be achieved “at birth” (i.e. no training at all). Third, we find that, by training only ~5% of model synapses, we can still achieve nearly 80% of the match to the ventral stream. When these three strategies are applied in combination, we find that these new models achieve ~80% of a fully trained model’s match to the brain, while using two orders of magnitude fewer supervised synaptic updates. These results reflect first steps in modeling not just primate adult visual processing during inference, but also how the ventral visual stream might be “wired up” by evolution (a model’s “birth” state) and by developmental learning (a model’s updates based on visual experience).

Download Full-text

Action and object representation in the ventral "what" stream

10.31234/osf.io/af65s ◽

2021 ◽

Author(s):

Moritz Wurm ◽

Alfonso Caramazza

Keyword(s):

Object Recognition ◽

Action Recognition ◽

Object Representation ◽

Ventral Stream ◽

Object Representations ◽

Visual Stream ◽

Ventral Visual Stream ◽

Occipitotemporal Cortex ◽

Object Features

The ventral visual stream is conceived as a pathway for object recognition. However, we also recognize the actions an object can be involved in. Here, we show that action recognition relies on a pathway in lateral occipitotemporal cortex, partially overlapping and topographically aligned with object representations that are precursors for action recognition. By contrast, object features that are more relevant for object recognition, such as color and texture, are restricted to medial areas of the ventral stream. We argue that the ventral stream bifurcates into lateral and medial pathways for action and object recognition, respectively. This account explains a number of observed phenomena, such as the duplication of object domains and the specific representational profiles in lateral and medial areas.

Download Full-text

Horizontal line bisections in upper and lower body space

Journal of the International Neuropsychological Society ◽

10.1017/s135561770064403x ◽

2000 ◽

Vol 6 (4) ◽

pp. 455-459 ◽

Cited By ~ 15

Author(s):

ANNA M. BARRETT ◽

J. BRENT CROSSON ◽

GREGORY P. CRUCIAN ◽

KENNETH M. HEILMAN

Keyword(s):

Object Recognition ◽

Left Hemisphere ◽

Visual Processing ◽

Right Hemisphere ◽

Dorsal Stream ◽

Spatial Localization ◽

Upper Body ◽

Horizontal Line ◽

Lower Body ◽

Visual Stream

Whereas the ventral cortical visual stream is important in object recognition, the dorsal stream is specialized for spatial localization. In humans there are also right and left hemisphere asymmetries in visual processing: the left hemisphere being more important in object recognition and the right in specifying spatial locations. Based on these dorsal–ventral and right–left where–what dichotomies, one would expect that the dorsal right hemisphere systems would be most activated during spatial localization tasks, and this activation may induce a leftward spatial bias in lower space. To determine if visual stimuli in upper and lower body space evoke different hemispheric activation, we had 12 normal participants bisect horizontal lines above and below eye level. Participants erred leftward in lower body space relative to upper body space (M = 1.3345 mm and 0.4225 mm, respectively; p = .011). In upper body space, bisection errors did not differ from zero, but in lower body space, errors tended to deviate leftward (M = 1.3345 mm, differs from null hypotheses at p = .0755). Our results are consistent with dorsal stream/right hemisphere activation when performing a spatial localization task in lower versus upper body space. (JINS, 2000, 6, 455–459.)

Download Full-text

Recurrent computations for visual pattern completion

Proceedings of the National Academy of Sciences ◽

10.1073/pnas.1719397115 ◽

2018 ◽

Vol 115 (35) ◽

pp. 8835-8840 ◽

Cited By ~ 54

Author(s):

Hanlin Tang ◽

Martin Schrimpf ◽

William Lotter ◽

Charlotte Moerman ◽

Ana Paredes ◽

...

Keyword(s):

Computational Models ◽

Partial Information ◽

Recognition Performance ◽

Backward Masking ◽

Visual Stream ◽

Strong Argument ◽

Pattern Completion ◽

Occluded Objects ◽

Ventral Visual Stream

Making inferences from partial information constitutes a critical aspect of cognition. During visual perception, pattern completion enables recognition of poorly visible or occluded objects. We combined psychophysics, physiology, and computational models to test the hypothesis that pattern completion is implemented by recurrent computations and present three pieces of evidence that are consistent with this hypothesis. First, subjects robustly recognized objects even when they were rendered <15% visible, but recognition was largely impaired when processing was interrupted by backward masking. Second, invasive physiological responses along the human ventral cortex exhibited visually selective responses to partially visible objects that were delayed compared with whole objects, suggesting the need for additional computations. These physiological delays were correlated with the effects of backward masking. Third, state-of-the-art feed-forward computational architectures were not robust to partial visibility. However, recognition performance was recovered when the model was augmented with attractor-based recurrent connectivity. The recurrent model was able to predict which images of heavily occluded objects were easier or harder for humans to recognize, could capture the effect of introducing a backward mask on recognition behavior, and was consistent with the physiological delays along the human ventral visual stream. These results provide a strong argument of plausibility for the role of recurrent computations in making visual inferences from partial information.

Download Full-text

A Genetic Model for Understanding Higher Order Visual Processing: Functional Interactions of the Ventral Visual Stream in Williams Syndrome

Cerebral Cortex ◽

10.1093/cercor/bhn004 ◽

2008 ◽

Vol 18 (10) ◽

pp. 2402-2409 ◽

Cited By ~ 30

Author(s):

Deepak Sarpal ◽

Bradley R. Buchsbaum ◽

Philip D. Kohn ◽

J. Shane Kippenhan ◽

Carolyn B. Mervis ◽

...

Keyword(s):

Williams Syndrome ◽

Visual Processing ◽

Genetic Model ◽

Higher Order ◽

Visual Stream ◽

Functional Interactions ◽

Ventral Visual Stream

Download Full-text

Unsupervised deep learning identifies semantic disentanglement in single inferotemporal face patch neurons

Nature Communications ◽

10.1038/s41467-021-26751-5 ◽

2021 ◽

Vol 12 (1) ◽

Author(s):

Irina Higgins ◽

Le Chang ◽

Victoria Langston ◽

Demis Hassabis ◽

Christopher Summerfield ◽

...

Keyword(s):

State Of The Art ◽

Appearance Model ◽

Neural Responses ◽

Latent Factors ◽

Visual Stream ◽

Sensory Data ◽

Face Images ◽

Unsupervised Deep Learning ◽

Ventral Visual Stream ◽

The Brain

AbstractIn order to better understand how the brain perceives faces, it is important to know what objective drives learning in the ventral visual stream. To answer this question, we model neural responses to faces in the macaque inferotemporal (IT) cortex with a deep self-supervised generative model, β-VAE, which disentangles sensory data into interpretable latent factors, such as gender or age. Our results demonstrate a strong correspondence between the generative factors discovered by β-VAE and those coded by single IT neurons, beyond that found for the baselines, including the handcrafted state-of-the-art model of face perception, the Active Appearance Model, and deep classifiers. Moreover, β-VAE is able to reconstruct novel face images using signals from just a handful of cells. Together our results imply that optimising the disentangling objective leads to representations that closely resemble those in the IT at the single unit level. This points at disentangling as a plausible learning objective for the visual brain.

Download Full-text