Invariant Representation of Physical Stability in the Human Brain

Successful engagement with the world requires the ability to predict what will happen next. Here we investigate how the brain makes the most basic prediction about the physical world: whether the situation in front of us is stable, and hence likely to stay the same, or unstable, and hence likely to change in the immediate future. Specifically, we ask if judgements of stability can be supported by the kinds of representations that have proven to be highly effective at visual object recognition in both machines and brains, or instead if the ability to determine the physical stability of natural scenes may require generative algorithms that simulate the physics of the world. To find out, we measured responses in both convolutional neural networks (CNNs) and the brain (using fMRI) to natural images of physically stable versus unstable scenarios. We find no evidence for generalizable representations of physical stability in either standard CNNs trained on visual object and scene classification (ImageNet), or in the human ventral visual pathway, which has long been implicated in the same process. However, in fronto-parietal regions previously implicated in intuitive physical reasoning we find both scenario-invariant representations of physical stability, and higher univariate responses to unstable than stable scenes. These results demonstrate abstract representations of physical stability in the dorsal but not ventral pathway, consistent with the hypothesis that the computations underlying stability entail not just pattern classification but forward physical simulation.

Download Full-text

Inferring brain-computational mechanisms with models of activity measurements

Philosophical Transactions of the Royal Society B Biological Sciences ◽

10.1098/rstb.2016.0278 ◽

2016 ◽

Vol 371 (1705) ◽

pp. 20160278 ◽

Cited By ~ 30

Author(s):

Nikolaus Kriegeskorte ◽

Jörn Diedrichsen

Keyword(s):

Computational Models ◽

Brain Activity ◽

Visual Object ◽

Visual Object Recognition ◽

Summary Statistics ◽

Activity Data ◽

Individual Measurement ◽

Activity Measurements ◽

The Brain ◽

The Way

High-resolution functional imaging is providing increasingly rich measurements of brain activity in animals and humans. A major challenge is to leverage such data to gain insight into the brain's computational mechanisms. The first step is to define candidate brain-computational models (BCMs) that can perform the behavioural task in question. We would then like to infer which of the candidate BCMs best accounts for measured brain-activity data. Here we describe a method that complements each BCM by a measurement model (MM), which simulates the way the brain-activity measurements reflect neuronal activity (e.g. local averaging in functional magnetic resonance imaging (fMRI) voxels or sparse sampling in array recordings). The resulting generative model (BCM-MM) produces simulated measurements. To avoid having to fit the MM to predict each individual measurement channel of the brain-activity data, we compare the measured and predicted data at the level of summary statistics. We describe a novel particular implementation of this approach, called probabilistic representational similarity analysis (pRSA) with MMs, which uses representational dissimilarity matrices (RDMs) as the summary statistics. We validate this method by simulations of fMRI measurements (locally averaging voxels) based on a deep convolutional neural network for visual object recognition. Results indicate that the way the measurements sample the activity patterns strongly affects the apparent representational dissimilarities. However, modelling of the measurement process can account for these effects, and different BCMs remain distinguishable even under substantial noise. The pRSA method enables us to perform Bayesian inference on the set of BCMs and to recognize the data-generating model in each case. This article is part of the themed issue ‘Interpreting BOLD: a dialogue between cognitive and cellular neuroscience’.

Download Full-text

The ventral visual system

Brain Computations ◽

10.1093/oso/9780198871101.003.0002 ◽

2020 ◽

pp. 40-175

Author(s):

Edmund T. Rolls

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Visual Object ◽

Visual Object Recognition ◽

Learning Approaches ◽

Learning Mechanisms ◽

System P ◽

Inferior Temporal Visual Cortex ◽

The Brain ◽

Ventral Visual System

The brain processes involved in visual object recognition are described. Evidence is presented that what is computed are sparse distributed representations of objects that are invariant with respect to transforms including position, size, and even view in the ventral stream towards the inferior temporal visual cortex. Then biologically plausible unsupervised learning mechanisms that can perform this computation are described that use a synaptic modification rule what utilises a memory trace. These are compared with deep learning and other machine learning approaches that require supervision.

Download Full-text

Extended Cognition and The Innovation Process

New Developments in Evolutionary Innovation ◽

10.1093/oso/9780198837091.003.0008 ◽

2021 ◽

pp. 144-168

Author(s):

Antonio Mastrogiorgio ◽

Enrico Petracca ◽

Riccardo Palumbo

Keyword(s):

Cognitive Processes ◽

Procedural Knowledge ◽

Innovation Process ◽

Extended Cognition ◽

Physical World ◽

Current State ◽

The World ◽

Behavioural Factors ◽

The Brain ◽

Innovation Processes

Innovations advance into the ‘adjacent possible’, enabled and constrained by the current state of the world, in a way that is unpredictable and not law-entailed. Unpredictability is the hallmark of the idea that innovation processes are contingent and embodied in the interaction between individuals and artefacts in the environment. In this chapter, we explore the cognitive and behavioural factors involved in exaptive innovation processes by using the notion of ‘extended cognition’. Extended cognition builds on the hypothesis that cognitive processes are not limited to the brain but also extend into the physical world as the objects of the environment facilitate, integrate with, and even constitute specific cognitive processes. We argue that exaptive innovations can be better understood by focusing on practicality and procedural knowledge from an extended cognition perspective. Artefact manipulation is not merely pragmatic but also epistemic as it enables specific reasoning processes that lead to the discovery of new uses.

Download Full-text

Graded Size Sensitivity of Object-Exemplar–Evoked Activity Patterns Within Human LOC Subregions

Journal of Neurophysiology ◽

10.1152/jn.90305.2008 ◽

2008 ◽

Vol 100 (4) ◽

pp. 2038-2047 ◽

Cited By ~ 24

Author(s):

Evelyn Eger ◽

Christian A. Kell ◽

Andreas Kleinschmidt

Keyword(s):

Sensory Information ◽

Size Difference ◽

Support Vector ◽

Visual Object ◽

Visual Object Recognition ◽

Object Identity ◽

Invariant Representation ◽

Size Change ◽

Population Activity ◽

Lateral Occipital Complex

A central issue for understanding visual object recognition is how the cortical hierarchy represents incoming sensory information and transforms it across successive processing stages. The format of object representation in the human brain has thus far mostly been studied using adaptation paradigms because the neuronal layout of object selectivities was thought to be beyond the resolution of conventional functional MRI (fMRI). Recently, however, multivariate pattern recognition succeeded in discriminating fMRI responses of object-selective cortex to different object exemplars within a given category. Here, we use increased spatial fMRI resolution to explore size sensitivity and tolerance to size change of response patterns evoked by object exemplars across a range of three sizes. Results from Support Vector Classification on responses of the human lateral occipital complex (LOC) show that discrimination of size (for a given object) and discrimination of objects across changes in size depended on the amount of size difference. Even across the largest amount of size change, accuracy for generalization was still significant in LOC, whereas the same comparison was at chance performance in early visual (calcarine) cortex. Analyzing subregions, we further found an anterior-posterior gradient in the degree of size sensitivity and size generalization within the posterior-dorsal and anterior-ventral parts of LOC. These results speak against fully size-invariant representation of object information in human LOC and are hence congruent with findings in monkeys showing object identity and size information in population activity of inferotemporal cortex. Moreover, these results provide evidence for a fine-grained functional heterogeneity within human LOC beyond the commonly used LO/fusiform subdivision.

Download Full-text

Similarity-based fusion of MEG and fMRI reveals spatio-temporal dynamics in human cortex during visual object recognition

10.1101/032656 ◽

2015 ◽

Cited By ~ 5

Author(s):

Radoslaw Cichy ◽

Dimitrios Pantazis ◽

Aude Oliva

Keyword(s):

Object Recognition ◽

Temporal Dynamics ◽

Imaging Techniques ◽

Visual Object ◽

Data Sets ◽

Visual Object Recognition ◽

Occipital Pole ◽

Spatio Temporal ◽

The Brain

Every human cognitive function, such as visual object recognition, is realized in a complex spatio-temporal activity pattern in the brain. Current brain imaging techniques in isolation cannot resolve the brain's spatio-temporal dynamics because they provide either high spatial or temporal resolution but not both. To overcome this limitation, we developed a new integration approach that uses representational similarities to combine measurements from different imaging modalities - magnetoencephalography (MEG) and functional MRI (fMRI) - to yield a spatially and temporally integrated characterization of neuronal activation. Applying this approach to two independent MEG-fMRI data sets, we observed that neural activity first emerged in the occipital pole at 50-80ms, before spreading rapidly and progressively in the anterior direction along the ventral and dorsal visual streams. These results provide a novel and comprehensive, spatio-temporally resolved view of the rapid neural dynamics during the first few hundred milliseconds of object vision. They further demonstrate the feasibility of spatially unbiased representational similarity based fusion of MEG and fMRI, promising new insights into how the brain computes complex cognitive functions.

Download Full-text

Wiring Up Vision: Minimizing Supervised Synaptic Updates Needed to Produce a Primate Ventral Stream

10.1101/2020.06.08.140111 ◽

2020 ◽

Author(s):

Franziska Geiger ◽

Martin Schrimpf ◽

Tiago Marques ◽

James J. DiCarlo

Keyword(s):

Visual System ◽

Visual Processing ◽

Ventral Stream ◽

Visual Object ◽

Visual Object Recognition ◽

Visual Stream ◽

Ongoing Research ◽

Ventral Visual Stream ◽

And Behavior ◽

The Brain

AbstractAfter training on large datasets, certain deep neural networks are surprisingly good models of the neural mechanisms of adult primate visual object recognition. Nevertheless, these models are poor models of the development of the visual system because they posit millions of sequential, precisely coordinated synaptic updates, each based on a labeled image. While ongoing research is pursuing the use of unsupervised proxies for labels, we here explore a complementary strategy of reducing the required number of supervised synaptic updates to produce an adult-like ventral visual stream (as judged by the match to V1, V2, V4, IT, and behavior). Such models might require less precise machinery and energy expenditure to coordinate these updates and would thus move us closer to viable neuroscientific hypotheses about how the visual system wires itself up. Relative to the current leading model of the adult ventral stream, we here demonstrate that the total number of supervised weight updates can be substantially reduced using three complementary strategies: First, we find that only 2% of supervised updates (epochs and images) are needed to achieve ~80% of the match to adult ventral stream. Second, by improving the random distribution of synaptic connectivity, we find that 54% of the brain match can already be achieved “at birth” (i.e. no training at all). Third, we find that, by training only ~5% of model synapses, we can still achieve nearly 80% of the match to the ventral stream. When these three strategies are applied in combination, we find that these new models achieve ~80% of a fully trained model’s match to the brain, while using two orders of magnitude fewer supervised synaptic updates. These results reflect first steps in modeling not just primate adult visual processing during inference, but also how the ventral visual stream might be “wired up” by evolution (a model’s “birth” state) and by developmental learning (a model’s updates based on visual experience).

Download Full-text

Viewpoint Dependency in Visual Object Recognition Does Not Necessarily Imply Viewer-Centered Representation

Journal of Cognitive Neuroscience ◽

10.1162/08989290152541458 ◽

2001 ◽

Vol 13 (6) ◽

pp. 793-799 ◽

Cited By ~ 21

Author(s):

Moshe Bar

Keyword(s):

Object Representation ◽

Visual Object ◽

Structural Description ◽

Visual Object Recognition ◽

Neural Basis ◽

Principal Tool ◽

The Subject ◽

Recognition Of Objects ◽

The Brain ◽

Different Levels

The nature of visual object representation in the brain is the subject of a prolonged debate. One set of theories asserts that objects are represented by their structural description and the representation is “object-centered.” Theories from the other side of the debate suggest that humans store multiple “snapshots” for each object, depicting it as seen under various conditions, and the representation is therefore “viewer-centered.” The principal tool that has been used to support and criticize each of these hypotheses is subjects' performance in recognizing objects under novel viewing conditions. For example, if subjects take more time in recognizing an object from an unfamiliar viewpoint, it is common to claim that the representation of that object is viewpoint-dependent and therefore viewer-centered. It is suggested here, however, that performance cost in recognition of objects under novel conditions may be misleading when studying the nature of object representation. Specifically, it is argued that viewpoint-dependent performance is not necessarily an indication of viewer-centered representation. An account for the neural basis of perceptual priming is first provided. In light of this account, it is conceivable that viewpoint dependency reflects the utilization of neural paths with different levels of sensitivity en route to the same representation, rather than the existence of viewpoint-specific representations. New experimental paradigms are required to study the validity of the viewer-centered approach.

Download Full-text

Disentangling presentation and processing times in the brain

10.1101/566042 ◽

2019 ◽

Author(s):

Laurent Caplette ◽

Robin A. A. Ince ◽

Karim Jerbi ◽

Frédéric Gosselin

Keyword(s):

Time Course ◽

Visual Object ◽

Specific Information ◽

Visual Object Recognition ◽

Processing Times ◽

The Face ◽

Information Sampling ◽

Integration Mechanisms ◽

The Moment ◽

The Brain

AbstractVisual object recognition seems to occur almost instantaneously. However, not only does it require hundreds of milliseconds of processing, but our eyes also typically fixate the object for hundreds of milliseconds. Consequently, information reaching our eyes at different moments is processed in the brain together. Moreover, information received at different moments during fixation is likely to be processed differently, notably because different features might be selectively attended at different moments. Here, we introduce a novel reverse correlation paradigm that allows us to uncover with millisecond precision the processing time course of specific information received on the retina at specific moments. Using faces as stimuli, we observed that processing at several electrodes and latencies was different depending on the moment at which information was received. Some of these variations were caused by a disruption occurring 160-200 ms after the face onset, suggesting a role of the N170 ERP component in gating information processing; others hinted at temporal compression and integration mechanisms. Importantly, the observed differences were not explained by simple adaptation or repetition priming, they were modulated by the task, and they were correlated with differences in behavior. These results suggest that top-down routines of information sampling are applied to the continuous visual input, even within a single eye fixation.

Download Full-text