Goal-Driven Recurrent Neural Network Models of the Ventral Visual Stream

The ventral visual stream (VVS) is a hierarchically connected series of cortical areas known to underlie core object recognition behaviors, enabling humans and non-human primates to effortlessly recognize objects across a multitude of viewing conditions. While recent feedforward convolutional neural networks (CNNs) provide quantitatively accurate predictions of temporally-averaged neural responses throughout the ventral pathway, they lack two ubiquitous neuroanatomical features: local recurrence within cortical areas and long-range feedback from downstream areas to upstream areas. As a result, such models are unable to account for the temporally-varying dynamical patterns thought to arise from recurrent visual circuits, nor can they provide insight into the behavioral goals that these recurrent circuits might help support. In this work, we augment CNNs with local recurrence and long-range feedback, developing convolutional RNN (ConvRNN) network models that more correctly mimic the gross neuroanatomy of the ventral pathway. Moreover, when the form of the recurrent circuit is chosen properly, ConvRNNs with comparatively small numbers of layers can achieve high performance on a core recognition task, comparable to that of much deeper feedforward networks. We then compared these models to temporally fine-grained neural and behavioral recordings from primates to thousands of images. We found that ConvRNNs better matched these data than alternative models, including the deepest feedforward networks, on two metrics: 1) neural dynamics in V4 and inferotemporal (IT) cortex at late timepoints after stimulus onset, and 2) the varying times at which object identity can be decoded from IT, including more challenging images that take longer to decode. Moreover, these results differentiate within the class of ConvRNNs, suggesting that there are strong functional constraints on the recurrent connectivity needed to match these phenomena. Finally, we find that recurrent circuits that attain high task performance while having a smaller network size as measured by number of units, rather than another metric such as the number of parameters, are overall most consistent with these data. Taken together, our results evince the role of recurrence and feedback in the ventral pathway to reliably perform core object recognition while subject to a strong total network size constraint.

Download Full-text

Object Recognition at Higher Regions of the Ventral Visual Stream via Dynamic Inference

Frontiers in Computational Neuroscience ◽

10.3389/fncom.2020.00046 ◽

2020 ◽

Vol 14 ◽

Author(s):

Siamak K. Sorooshyari ◽

Huanjie Sheng ◽

H. Vincent Poor

Keyword(s):

Object Recognition ◽

Visual Stream ◽

Ventral Visual Stream

Download Full-text

Unsupervised neural network models of the ventral visual stream

Proceedings of the National Academy of Sciences ◽

10.1073/pnas.2014196118 ◽

2021 ◽

Vol 118 (3) ◽

pp. e2014196118

Author(s):

Chengxu Zhuang ◽

Siming Yan ◽

Aran Nayebi ◽

Martin Schrimpf ◽

Michael C. Frank ◽

...

Keyword(s):

Neural Network ◽

Unsupervised Learning ◽

Network Models ◽

Quantitative Model ◽

Ventral Stream ◽

Neural Network Models ◽

Visual Stream ◽

Unsupervised Neural Network ◽

Supervised Methods ◽

Ventral Visual Stream

Deep neural networks currently provide the best quantitative models of the response patterns of neurons throughout the primate ventral visual stream. However, such networks have remained implausible as a model of the development of the ventral stream, in part because they are trained with supervised methods requiring many more labels than are accessible to infants during development. Here, we report that recent rapid progress in unsupervised learning has largely closed this gap. We find that neural network models learned with deep unsupervised contrastive embedding methods achieve neural prediction accuracy in multiple ventral visual cortical areas that equals or exceeds that of models derived using today’s best supervised methods and that the mapping of these neural network models’ hidden layers is neuroanatomically consistent across the ventral stream. Strikingly, we find that these methods produce brain-like representations even when trained solely with real human child developmental data collected from head-mounted cameras, despite the fact that these datasets are noisy and limited. We also find that semisupervised deep contrastive embeddings can leverage small numbers of labeled examples to produce representations with substantially improved error-pattern consistency to human behavior. Taken together, these results illustrate a use of unsupervised learning to provide a quantitative model of a multiarea cortical brain system and present a strong candidate for a biologically plausible computational theory of primate sensory learning.

Download Full-text

Activation Timecourse of Ventral Visual Stream Object-recognition Areas: High Density Electrical Mapping of Perceptual Closure Processes

Journal of Cognitive Neuroscience ◽

10.1162/089892900562372 ◽

2000 ◽

Vol 12 (4) ◽

pp. 615-621 ◽

Cited By ~ 161

Author(s):

Glen M. Doniger ◽

John J. Foxe ◽

Micah M. Murray ◽

Beth A. Higgins ◽

Joan Gay Snodgrass ◽

...

Keyword(s):

Object Recognition ◽

Partial Information ◽

Brain Activity ◽

Event Related Potentials ◽

High Density ◽

Visual Stream ◽

Density Mapping ◽

Related Potentials ◽

Ventral Visual Stream ◽

Perceptual Closure

Object recognition is achieved even in circumstances when only partial information is available to the observer. Perceptual closure processes are essential in enabling such recognitions to occur. We presented successively less fragmented images while recording high-density event-related potentials (ERPs), which permitted us to monitor brain activity during the perceptual closure processes leading up to object recognition. We reveal a bilateral ERP component (Ncl) that tracks these processes (onsets ∼ 230 msec, maximal at ∼290 msec). Scalp-current density mapping of the Ncl revealed bilateral occipito-temporal scalp foci, which are consistent with generators in the human ventral visual stream, and specifically the lateral-occipital or LO complex as defined by hemodynamic studies of object recognition.

Download Full-text

Unsupervised Neural Network Models of the Ventral Visual Stream

10.1101/2020.06.16.155556 ◽

2020 ◽

Author(s):

Chengxu Zhuang ◽

Siming Yan ◽

Aran Nayebi ◽

Martin Schrimpf ◽

Michael C. Frank ◽

...

Keyword(s):

Neural Network ◽

Network Models ◽

Ventral Stream ◽

Visual Development ◽

Neural Network Models ◽

Visual Stream ◽

Unsupervised Neural Network ◽

Supervised Methods ◽

Ventral Visual Stream ◽

Visual Cortical

Deep neural networks currently provide the best quantitative models of the response patterns of neurons throughout the primate ventral visual stream. However, such networks have remained implausible as a model of the development of the ventral stream, in part because they are trained with supervised methods requiring many more labels than are accessible to infants during development. Here, we report that recent rapid progress in unsupervised learning has largely closed this gap. We find that neural network models learned with deep unsupervised contrastive embedding methods achieve neural prediction accuracy in multiple ventral visual cortical areas that equals or exceeds that of models derived using today’s best supervised methods, and that the mapping of these neural network models’ hidden layers is neuroanatomically consistent across the ventral stream. Moreover, we find that these methods produce brain-like representations even when trained on noisy and limited data measured from real children’s developmental experience. We also find that semi-supervised deep contrastive embeddings can leverage small numbers of labelled examples to produce representations with substantially improved error-pattern consistency to human behavior. Taken together, these results suggest that deep contrastive embedding objectives may be a biologically-plausible computational theory of primate visual development.

Download Full-text

Action and object representation in the ventral "what" stream

10.31234/osf.io/af65s ◽

2021 ◽

Author(s):

Moritz Wurm ◽

Alfonso Caramazza

Keyword(s):

Object Recognition ◽

Action Recognition ◽

Object Representation ◽

Ventral Stream ◽

Object Representations ◽

Visual Stream ◽

Ventral Visual Stream ◽

Occipitotemporal Cortex ◽

Object Features

The ventral visual stream is conceived as a pathway for object recognition. However, we also recognize the actions an object can be involved in. Here, we show that action recognition relies on a pathway in lateral occipitotemporal cortex, partially overlapping and topographically aligned with object representations that are precursors for action recognition. By contrast, object features that are more relevant for object recognition, such as color and texture, are restricted to medial areas of the ventral stream. We argue that the ventral stream bifurcates into lateral and medial pathways for action and object recognition, respectively. This account explains a number of observed phenomena, such as the duplication of object domains and the specific representational profiles in lateral and medial areas.

Download Full-text

CORnet: Modeling the Neural Mechanisms of Core Object Recognition

10.1101/408385 ◽

2018 ◽

Cited By ~ 26

Author(s):

Jonas Kubilius ◽

Martin Schrimpf ◽

Aran Nayebi ◽

Daniel Bear ◽

Daniel L. K. Yamins ◽

...

Keyword(s):

Object Recognition ◽

Visual Processing ◽

State Of The Art ◽

Recognition Performance ◽

Explanatory Power ◽

Large Body ◽

Response Dynamics ◽

Visual Stream ◽

Current State ◽

Ventral Visual Stream

AbstractDeep artificial neural networks with spatially repeated processing (a.k.a., deep convolutional ANNs) have been established as the best class of candidate models of visual processing in primate ventral visual processing stream. Over the past five years, these ANNs have evolved from a simple feedforward eight-layer architecture in AlexNet to extremely deep and branching NAS-Net architectures, demonstrating increasingly better object categorization performance and increasingly better explanatory power of both neural and behavioral responses. However, from the neuroscientist’s point of view, the relationship between such very deep architectures and the ventral visual pathway is incomplete in at least two ways. On the one hand, current state-of-the-art ANNs appear to be too complex (e.g., now over 100 levels) compared with the relatively shallow cortical hierarchy (4-8 levels), which makes it difficult to map their elements to those in the ventral visual stream and to understand what they are doing. On the other hand, current state-of-the-art ANNs appear to be not complex enough in that they lack recurrent connections and the resulting neural response dynamics that are commonplace in the ventral visual stream. Here we describe our ongoing efforts to resolve both of these issues by developing a “CORnet” family of deep neural network architectures. Rather than just seeking high object recognition performance (as the state-of-the-art ANNs above), we instead try to reduce the model family to its most important elements and then gradually build new ANNs with recurrent and skip connections while monitoring both performance and the match between each new CORnet model and a large body of primate brain and behavioral data. We report here that our current best ANN model derived from this approach (CORnet-S) is among the top models on Brain-Score, a composite benchmark for comparing models to the brain, but is simpler than other deep ANNs in terms of the number of convolutions performed along the longest path of information processing in the model. All CORnet models are available at github.com/dicarlolab/CORnet, and we plan to up-date this manuscript and the available models in this family as they are produced.

Download Full-text

Category-Selective Visual Regions Have Distinctive Signatures of Connectivity in Neonates

10.1101/675421 ◽

2019 ◽

Author(s):

Laura Cabral ◽

Leire Zubiaurre ◽

Conor Wild ◽

Annika Linke ◽

Rhodri Cusack

Keyword(s):

Long Range ◽

Brain Regions ◽

Visual Stream ◽

Linear Discriminant ◽

Face Area ◽

Spatial Frequencies ◽

Time Period ◽

Ventral Visual Stream ◽

Affective Associations ◽

Additional Constraints

AbstractThe development of the ventral visual stream is shaped both by an innate proto-organization and by experience. The fusiform face area (FFA), for example, has stronger connectivity to early visual regions representing the fovea and lower spatial frequencies. In adults, category-selective regions in the ventral stream (e.g. the FFA) also have distinct signatures of connectivity to widely distributed brain regions, which are thought to encode rich cross-modal, motoric, and affective associations (e.g., tool regions to the motor cortex). It is unclear whether this long-range connectivity is also innate, or if it develops with experience. We used MRI diffusion-weighted imaging with tractography to characterize the connectivity of face, place, and tool category-selective regions in neonates (N=445), 1-9 month old infants (N=11), and adults (N=14). Using a set of linear-discriminant classifiers, category-selective connectivity was found to be both innate and shaped by experience. Connectivity for faces was the most developed, with no evidence of significant change in the time period studied. Place and tool networks were present at birth but also demonstrated evidence of development with experience, with tool connectivity developing over a more protracted period (9 months). Taken together, the results support an extended proto-organizon to include long-range connectivity that could provide additional constraints on experience dependent development.

Download Full-text

The dynamics of invariant object recognition in the human visual system

Journal of Neurophysiology ◽

10.1152/jn.00394.2013 ◽

2014 ◽

Vol 111 (1) ◽

pp. 91-102 ◽

Cited By ~ 135

Author(s):

Leyla Isik ◽

Ethan M. Meyers ◽

Joel Z. Leibo ◽

Tomaso Poggio

Keyword(s):

Object Recognition ◽

Visual System ◽

Human Visual System ◽

Visual Information ◽

Compelling Evidence ◽

Visual Stream ◽

Neural Representations ◽

Invariant Object Recognition ◽

Ventral Visual Stream ◽

The Brain

The human visual system can rapidly recognize objects despite transformations that alter their appearance. The precise timing of when the brain computes neural representations that are invariant to particular transformations, however, has not been mapped in humans. Here we employ magnetoencephalography decoding analysis to measure the dynamics of size- and position-invariant visual information development in the ventral visual stream. With this method we can read out the identity of objects beginning as early as 60 ms. Size- and position-invariant visual information appear around 125 ms and 150 ms, respectively, and both develop in stages, with invariance to smaller transformations arising before invariance to larger transformations. Additionally, the magnetoencephalography sensor activity localizes to neural sources that are in the most posterior occipital regions at the early decoding times and then move temporally as invariant information develops. These results provide previously unknown latencies for key stages of human-invariant object recognition, as well as new and compelling evidence for a feed-forward hierarchical model of invariant object recognition where invariance increases at each successive visual area along the ventral stream.

Download Full-text

Using Convolutional Neural Networks to measure the contribution of visual features to the representation of object animacy in the brain

10.31237/osf.io/fxz4q ◽

2019 ◽

Cited By ~ 1

Author(s):

Sushrut Thorat

Keyword(s):

Neural Networks ◽

Convolutional Neural Networks ◽

Temporal Cortex ◽

Large Degree ◽

Visual Features ◽

Visual Feature ◽

Visual Stream ◽

Feature Information ◽

Ventral Visual Stream ◽

Animal Images

A mediolateral gradation in neural responses for images spanning animals to artificial objects is observed in the ventral temporal cortex (VTC). Which information streams drive this organisation is an ongoing debate. Recently, in Proklova et al. (2016), the visual shape and category (“animacy”) dimensions in a set of stimuli were dissociated using a behavioural measure of visual feature information. fMRI responses revealed a neural cluster (extra-visual animacy cluster - xVAC) which encoded category information unexplained by visual feature information, suggesting extra-visual contributions to the organisation in the ventral visual stream. We reassess these findings using Convolutional Neural Networks (CNNs) as models for the ventral visual stream. The visual features developed in the CNN layers can categorise the shape-matched stimuli from Proklova et al. (2016) in contrast to the behavioural measures used in the study. The category organisations in xVAC and VTC are explained to a large degree by the CNN visual feature differences, casting doubt over the suggestion that visual feature differences cannot account for the animacy organisation. To inform the debate further, we designed a set of stimuli with animal images to dissociate the animacy organisation driven by the CNN visual features from the degree of familiarity and agency (thoughtfulness and feelings). Preliminary results from a new fMRI experiment designed to understand the contribution of these non-visual features are presented.

Download Full-text

Visuopathy of prematurity: is retinopathy just the tip of the iceberg?

Pediatric Research ◽

10.1038/s41390-021-01625-0 ◽

2021 ◽

Author(s):

Sigrid Hegna Ingvaldsen ◽

Tora Sund Morken ◽

Dordi Austeng ◽

Olaf Dammann

Keyword(s):

Visual Processing ◽

Structural Changes ◽

Developmental Trajectories ◽

Visual Pathway ◽

Visual Stream ◽

Visual Deficits ◽

Cortical Areas ◽

Cellular Architecture ◽

Increased Risk ◽

Dorsal Visual Stream

AbstractResearch on retinopathy of prematurity (ROP) focuses mainly on the abnormal vascularization patterns that are directly visible for ophthalmologists. However, recent findings indicate that children born prematurely also exhibit changes in the retinal cellular architecture and along the dorsal visual stream, such as structural changes between and within cortical areas. Moreover, perinatal sustained systemic inflammation (SSI) is associated with an increased risk for ROP and the visual deficits that follow. In this paper, we propose that ROP might just be the tip of an iceberg we call visuopathy of prematurity (VOP). The VOP paradigm comprises abnormal vascularization of the retina, alterations in retinal cellular architecture, choroidal degeneration, and abnormalities in the visual pathway, including cortical areas. Furthermore, VOP itself might influence the developmental trajectories of cerebral structures and functions deemed responsible for visual processing, thereby explaining visual deficits among children born preterm.

Download Full-text