How does the brain rapidly learn and reorganize view-invariant and position-invariant object representations in the inferotemporal cortex?

2011 ◽  
Vol 24 (10) ◽  
pp. 1050-1061 ◽  
Author(s):  
Yongqiang Cao ◽  
Stephen Grossberg ◽  
Jeffrey Markowitz
2017 ◽  
Vol 118 (1) ◽  
pp. 353-362
Author(s):  
N. Apurva Ratan Murty ◽  
S. P. Arun

We effortlessly recognize objects across changes in viewpoint, but we know relatively little about the features that underlie viewpoint invariance in the brain. Here, we set out to characterize how viewpoint invariance in monkey inferior temporal (IT) neurons is influenced by two image manipulations—silhouetting and inversion. Reducing an object into its silhouette removes internal detail, so this would reveal how much viewpoint invariance depends on the external contours. Inverting an object retains but rearranges features, so this would reveal how much viewpoint invariance depends on the arrangement and orientation of features. Our main findings are 1) view invariance is weakened by silhouetting but not by inversion; 2) view invariance was stronger in neurons that generalized across silhouetting and inversion; 3) neuronal responses to natural objects matched early with that of silhouettes and only later to that of inverted objects, indicative of coarse-to-fine processing; and 4) the impact of silhouetting and inversion depended on object structure. Taken together, our results elucidate the underlying features and dynamics of view-invariant object representations in the brain. NEW & NOTEWORTHY We easily recognize objects across changes in viewpoint, but the underlying features are unknown. Here, we show that view invariance in the monkey inferotemporal cortex is driven mainly by external object contours and is not specialized for object orientation. We also find that the responses to natural objects match with that of their silhouettes early in the response, and with inverted versions later in the response—indicative of a coarse-to-fine processing sequence in the brain.


2019 ◽  
Author(s):  
David A. Tovar ◽  
Micah M. Murray ◽  
Mark T. Wallace

AbstractObjects are the fundamental building blocks of how we create a representation of the external world. One major distinction amongst objects is between those that are animate versus inanimate. Many objects are specified by more than a single sense, yet the nature by which multisensory objects are represented by the brain remains poorly understood. Using representational similarity analysis of human EEG signals, we show enhanced encoding of audiovisual objects when compared to their corresponding visual and auditory objects. Surprisingly, we discovered the often-found processing advantages for animate objects was not evident in a multisensory context due to greater neural enhancement of inanimate objects—the more weakly encoded objects under unisensory conditions. Further analysis showed that the selective enhancement of inanimate audiovisual objects corresponded with an increase in shared representations across brain areas, suggesting that neural enhancement was mediated by multisensory integration. Moreover, a distance-to-bound analysis provided critical links between neural findings and behavior. Improvements in neural decoding at the individual exemplar level for audiovisual inanimate objects predicted reaction time differences between multisensory and unisensory presentations during a go/no-go animate categorization task. Interestingly, links between neural activity and behavioral measures were most prominent 100 to 200ms and 350 to 500ms after stimulus presentation, corresponding to time periods associated with sensory evidence accumulation and decision-making, respectively. Collectively, these findings provide key insights into a fundamental process the brain uses to maximize information it captures across sensory systems to perform object recognition.Significance StatementOur world is filled with an ever-changing milieu of sensory information that we are able to seamlessly transform into meaningful perceptual experience. We accomplish this feat by combining different features from our senses to construct objects. However, despite the fact that our senses do not work in isolation but rather in concert with each other, little is known about how the brain combines the senses together to form object representations. Here, we used EEG and machine learning to study how the brain processes auditory, visual, and audiovisual objects. Surprisingly, we found that non-living objects, the objects which were more difficult to process with one sense alone, benefited the most from engaging multiple senses.


2014 ◽  
Vol 26 (1) ◽  
pp. 132-142 ◽  
Author(s):  
Thomas A. Carlson ◽  
J. Brendan Ritchie ◽  
Nikolaus Kriegeskorte ◽  
Samir Durvasula ◽  
Junsheng Ma

How does the brain translate an internal representation of an object into a decision about the object's category? Recent studies have uncovered the structure of object representations in inferior temporal cortex (IT) using multivariate pattern analysis methods. These studies have shown that representations of individual object exemplars in IT occupy distinct locations in a high-dimensional activation space, with object exemplar representations clustering into distinguishable regions based on category (e.g., animate vs. inanimate objects). In this study, we hypothesized that a representational boundary between category representations in this activation space also constitutes a decision boundary for categorization. We show that behavioral RTs for categorizing objects are well described by our activation space hypothesis. Interpreted in terms of classical and contemporary models of decision-making, our results suggest that the process of settling on an internal representation of a stimulus is itself partially constitutive of decision-making for object categorization.


2008 ◽  
Vol 31 (3) ◽  
pp. 321-331 ◽  
Author(s):  
Sylvain Sirois ◽  
Michael Spratling ◽  
Michael S. C. Thomas ◽  
Gert Westermann ◽  
Denis Mareschal ◽  
...  

AbstractNeuroconstructivism: How the Brain Constructs Cognition proposes a unifying framework for the study of cognitive development that brings together (1) constructivism (which views development as the progressive elaboration of increasingly complex structures), (2) cognitive neuroscience (which aims to understand the neural mechanisms underlying behavior), and (3) computational modeling (which proposes formal and explicit specifications of information processing). The guiding principle of our approach is context dependence, within and (in contrast to Marr [1982]) between levels of organization. We propose that three mechanisms guide the emergence of representations: competition, cooperation, and chronotopy; which themselves allow for two central processes: proactivity and progressive specialization. We suggest that the main outcome of development is partial representations, distributed across distinct functional circuits. This framework is derived by examining development at the level of single neurons, brain systems, and whole organisms. We use the terms encellment, embrainment, and embodiment to describe the higher-level contextual influences that act at each of these levels of organization. To illustrate these mechanisms in operation we provide case studies in early visual perception, infant habituation, phonological development, and object representations in infancy. Three further case studies are concerned with interactions between levels of explanation: social development, atypical development and within that, developmental dyslexia. We conclude that cognitive development arises from a dynamic, contextual change in embodied neural structures leading to partial representations across multiple brain regions and timescales, in response to proactively specified physical and social environment.


2018 ◽  
Author(s):  
Sasa L. Kivisaari ◽  
Marijn van Vliet ◽  
Annika Hultén ◽  
Tiina Lindh-Knuutila ◽  
Ali Faisal ◽  
...  

AbstractWe can easily identify a dog merely by the sound of barking or an orange by its citrus scent. In this work, we study the neural underpinnings of how the brain combines bits of information into meaningful object representations. Modern theories of semantics posit that the meaning of words can be decomposed into a unique combination of individual semantic features (e.g., “barks”, “has citrus scent”). Here, participants received clues of individual objects in form of three isolated semantic features, given as verbal descriptions. We used machine-learning-based neural decoding to learn a mapping between individual semantic features and BOLD activation patterns. We discovered that the recorded brain patterns were best decoded using a combination of not only the three semantic features that were presented as clues, but a far richer set of semantic features typically linked to the target object. We conclude that our experimental protocol allowed us to observe how fragmented information is combined into a complete semantic representation of an object and suggest neuroanatomical underpinnings for this process.


Author(s):  
Farran Briggs

Many mammals, including humans, rely primarily on vision to sense the environment. While a large proportion of the brain is devoted to vision in highly visual animals, there are not enough neurons in the visual system to support a neuron-per-object look-up table. Instead, visual animals evolved ways to rapidly and dynamically encode an enormous diversity of visual information using minimal numbers of neurons (merely hundreds of millions of neurons and billions of connections!). In the mammalian visual system, a visual image is essentially broken down into simple elements that are reconstructed through a series of processing stages, most of which occur beneath consciousness. Importantly, visual information processing is not simply a serial progression along the hierarchy of visual brain structures (e.g., retina to visual thalamus to primary visual cortex to secondary visual cortex, etc.). Instead, connections within and between visual brain structures exist in all possible directions: feedforward, feedback, and lateral. Additionally, many mammalian visual systems are organized into parallel channels, presumably to enable efficient processing of information about different and important features in the visual environment (e.g., color, motion). The overall operations of the mammalian visual system are to: (1) combine unique groups of feature detectors in order to generate object representations and (2) integrate visual sensory information with cognitive and contextual information from the rest of the brain. Together, these operations enable individuals to perceive, plan, and act within their environment.


2009 ◽  
Vol 101 (4) ◽  
pp. 1867-1875 ◽  
Author(s):  
David B. T. McMahon ◽  
Carl R. Olson

How does the brain represent a red circle? One possibility is that there is a specialized and possibly time-consuming process whereby the attributes of shape and color, carried by separate populations of neurons in low-order visual cortex, are bound together into a unitary neural representation. Another possibility is that neurons in high-order visual cortex are selective, by virtue of their bottom-up input from low-order visual areas, for particular conjunctions of shape and color. A third possibility is that they simply sum shape and color signals linearly. We tested these ideas by measuring the responses of inferotemporal cortex neurons to sets of stimuli in which two attributes—shape and color—varied independently. We find that a few neurons exhibit conjunction selectivity but that in most neurons the influences of shape and color sum linearly. Contrary to the idea of conjunction coding, few neurons respond selectively to a particular combination of shape and color. Contrary to the idea that binding requires time, conjunction signals, when present, occur as early as feature signals. We argue that neither conjunction selectivity nor a specialized feature binding process is necessary for the effective representation of shape–color combinations.


2019 ◽  
Vol 31 (3) ◽  
pp. 412-430 ◽  
Author(s):  
Pawel J. Matusz ◽  
Nora Turoman ◽  
Ruxandra I. Tivadar ◽  
Chrysa Retsa ◽  
Micah M. Murray

In real-world environments, information is typically multisensory, and objects are a primary unit of information processing. Object recognition and action necessitate attentional selection of task-relevant from among task-irrelevant objects. However, the brain and cognitive mechanisms governing these processes remain not well understood. Here, we demonstrate that attentional selection of visual objects is controlled by integrated top–down audiovisual object representations (“attentional templates”) while revealing a new brain mechanism through which they can operate. In multistimulus (visual) arrays, attentional selection of objects in humans and animal models is traditionally quantified via “the N2pc component”: spatially selective enhancements of neural processing of objects within ventral visual cortices at approximately 150–300 msec poststimulus. In our adaptation of Folk et al.'s [Folk, C. L., Remington, R. W., & Johnston, J. C. Involuntary covert orienting is contingent on attentional control settings. Journal of Experimental Psychology: Human Perception and Performance, 18, 1030–1044, 1992] spatial cueing paradigm, visual cues elicited weaker behavioral attention capture and an attenuated N2pc during audiovisual versus visual search. To provide direct evidence for the brain, and so, cognitive, mechanisms underlying top–down control in multisensory search, we analyzed global features of the electrical field at the scalp across our N2pcs. In the N2pc time window (170–270 msec), color cues elicited brain responses differing in strength and their topography. This latter finding is indicative of changes in active brain sources. Thus, in multisensory environments, attentional selection is controlled via integrated top–down object representations, and so not only by separate sensory-specific top–down feature templates (as suggested by traditional N2pc analyses). We discuss how the electrical neuroimaging approach can aid research on top–down attentional control in naturalistic, multisensory settings and on other neurocognitive functions in the growing area of real-world neuroscience.


2014 ◽  
Vol 26 (10) ◽  
pp. 2135-2162 ◽  
Author(s):  
Sidney R. Lehky ◽  
Roozbeh Kiani ◽  
Hossein Esteky ◽  
Keiji Tanaka

We have calculated the intrinsic dimensionality of visual object representations in anterior inferotemporal (AIT) cortex, based on responses of a large sample of cells stimulated with photographs of diverse objects. Because dimensionality was dependent on data set size, we determined asymptotic dimensionality as both the number of neurons and number of stimulus image approached infinity. Our final dimensionality estimate was 93 (SD: [Formula: see text] 11), indicating that there is basis set of approximately 100 independent features that characterize the dimensions of neural object space. We believe this is the first estimate of the dimensionality of neural visual representations based on single-cell neurophysiological data. The dimensionality of AIT object representations was much lower than the dimensionality of the stimuli. We suggest that there may be a gradual reduction in the dimensionality of object representations in neural populations going from retina to inferotemporal cortex as receptive fields become increasingly complex.


2019 ◽  
Author(s):  
Georgin Jacob ◽  
R. T. Pramod ◽  
Harish Katti ◽  
S. P. Arun

ABSTRACTDeep neural networks have revolutionized computer vision, and their object representations match coarsely with the brain. As a result, it is widely believed that any fine scale differences between deep networks and brains can be fixed with increased training data or minor changes in architecture. But what if there are qualitative differences between brains and deep networks? Do deep networks even see the way we do? To answer this question, we chose a deep neural network optimized for object recognition and asked whether it exhibits well-known perceptual and neural phenomena despite not being explicitly trained to do so. To our surprise, many phenomena were present in the network, including the Thatcher effect, mirror confusion, Weber’s law, relative size, multiple object normalization and sparse coding along multiple dimensions. However, some perceptual phenomena were notably absent, including processing of 3D shape, patterns on surfaces, occlusion, natural parts and a global advantage. Our results elucidate the computational challenges of vision by showing that learning to recognize objects suffices to produce some perceptual phenomena but not others and reveal the perceptual properties that could be incorporated into deep networks to improve their performance.


Sign in / Sign up

Export Citation Format

Share Document