Coordination of what and where in Visual Attention

Perception ◽  
1993 ◽  
Vol 22 (11) ◽  
pp. 1261-1270 ◽  
Author(s):  
John Duncan

Performance often suffers when two visual discriminations must be made concurrently (‘divided attention’). In the modular primate visual system, different cortical areas analyse different kinds of visual information. Especially important is a distinction between an occipitoparietal ‘where?’ system, analysing spatial relations, and an occipitotemporal ‘what?’ system responsible for object recognition. Though such visual subsystems are anatomically parallel, their functional relationship when ‘what?’ and ‘where?’ discriminations are made concurrently is unknown. In the present experiments, human subjects made concurrent discriminations concerning a brief visual display. Discriminations were either similar (two ‘what?’ or two ‘where?’ discriminations) or dissimilar (one of each), and concerned the same or different objects. When discriminations concerned different objects, there was strong interference between them. This was equally severe whether discriminations were similar—and therefore dependent on the same cortical system—or dissimilar. When concurrent ‘what?’ and ‘where?’ discriminations concerned the same object, however, all interference disappeared. Such results suggest that ‘what?’ and ‘where?’ systems are coordinated in visual attention: their separate outputs can be used simultaneously without cost, but only when they concern one object.

2002 ◽  
Vol 88 (2) ◽  
pp. 1051-1058 ◽  
Author(s):  
M. Tettamanti ◽  
E. Paulesu ◽  
P. Scifo ◽  
A. Maravita ◽  
F. Fazio ◽  
...  

Normal human subjects underwent functional magnetic resonance imaging (fMRI) while performing a simple visual manual reaction-time (RT) task with lateralized brief stimuli, the so-called Poffenberger's paradigm. This paradigm was employed to measure interhemispheric transmission (IT) time by subtracting mean RT for the uncrossed hemifield-hand conditions, that is, those conditions not requiring an IT, from the crossed hemifield-hand conditions, that is, those conditions requiring an IT to relay visual information from the hemisphere of entry to the hemisphere subserving the response. The obtained difference is widely believed to reflect callosal conduction time, but so far there is no direct physiological evidence in humans. The aim of our experiment was twofold: first, to test the hypothesis that IT of visuomotor information requires the corpus callosum and to identify the cortical areas specifically activated during IT. Second, we sought to discover whether IT occurs mainly at premotor or perceptual stages of information processing. We found significant activations in a number of frontal, parietal, and temporal cortical areas and in the genu of the corpus callosum. These activations were present only in the crossed conditions and therefore were specifically related to IT. No selective activation was present in the uncrossed conditions. The location of the activated callosal and cortical areas suggests that IT occurs mainly, but not exclusively, at premotor level. These results provide clear cut evidence in favor of the hypothesis that the crossed-uncrossed difference in the Poffenberger paradigm depends on IT rather than on a differential hemispheric activation.


Sensors ◽  
2019 ◽  
Vol 19 (7) ◽  
pp. 1534 ◽  
Author(s):  
Ghazal Rouhafzay ◽  
Ana-Maria Cretu

Drawing inspiration from haptic exploration of objects by humans, the current work proposes a novel framework for robotic tactile object recognition, where visual information in the form of a set of visually interesting points is employed to guide the process of tactile data acquisition. Neuroscience research confirms the integration of cutaneous data as a response to surface changes sensed by humans with data from joints, muscles, and bones (kinesthetic cues) for object recognition. On the other hand, psychological studies demonstrate that humans tend to follow object contours to perceive their global shape, which leads to object recognition. In compliance with these findings, a series of contours are determined around a set of 24 virtual objects from which bimodal tactile data (kinesthetic and cutaneous) are obtained sequentially and by adaptively changing the size of the sensor surface according to the object geometry for each object. A virtual Force Sensing Resistor array (FSR) is employed to capture cutaneous cues. Two different methods for sequential data classification are then implemented using Convolutional Neural Networks (CNN) and conventional classifiers, including support vector machines and k-nearest neighbors. In the case of conventional classifiers, we exploit contourlet transformation to extract features from tactile images. In the case of CNN, two networks are trained for cutaneous and kinesthetic data and a novel hybrid decision-making strategy is proposed for object recognition. The proposed framework is tested both for contours determined blindly (randomly determined contours of objects) and contours determined using a model of visual attention. Trained classifiers are tested on 4560 new sequential tactile data and the CNN trained over tactile data from object contours selected by the model of visual attention yields an accuracy of 98.97% which is the highest accuracy among other implemented approaches.


1998 ◽  
Vol 10 (4) ◽  
pp. 445-463 ◽  
Author(s):  
Sabine Gillner ◽  
Hanspeter A. Mallot

Spatial behavior in humans and animals includes a wide variety of behavioral competences and makes use of a large number of sensory cues. Here we studied the ability of human subjects to search locations, to find shortcuts and novel paths, to estimate distances between remembered places, and to draw sketch maps of the explored environment; these competences are related to goal-independent memory of space, or cognitive maps. Information on spatial relations was restricted to two types: a visual motion sequence generated by simulated movements in a virtual maze and the subject's own movement decisions defining the path through the maze. Visual information was local (i.e., no global landmarks or compass information was provided). Other position and movement information (vestibular or proprioceptive) was excluded. The amount of visual information provided was varied over four experimental conditions. The results indicate that human subjects are able to learn a virtual maze from sequences of local views and movements. The information acquired is local, consisting of recognized positions and movement decisions associated to them. Although simple associations of this type can be shown to be present in some subjects, more complete configurational knowledge is acquired as well. The results are discussed in a view-based framework of navigation and the representation of spatial knowledge by means of a view graph.


2017 ◽  
Vol 118 (4) ◽  
pp. 2458-2469 ◽  
Author(s):  
Wei Song Ong ◽  
Koorosh Mirpour ◽  
James W. Bisley

We can search for and locate specific objects in our environment by looking for objects with similar features. Object recognition involves stimulus similarity responses in ventral visual areas and task-related responses in prefrontal cortex. We tested whether neurons in the lateral intraparietal area (LIP) of posterior parietal cortex could form an intermediary representation, collating information from object-specific similarity map representations to allow general decisions about whether a stimulus matches the object being looked for. We hypothesized that responses to stimuli would correlate with how similar they are to a sample stimulus. When animals compared two peripheral stimuli to a sample at their fovea, the response to the matching stimulus was similar, independent of the sample identity, but the response to the nonmatch depended on how similar it was to the sample: the more similar, the greater the response to the nonmatch stimulus. These results could not be explained by task difficulty or confidence. We propose that LIP uses its known mechanistic properties to integrate incoming visual information, including that from the ventral stream about object identity, to create a dynamic representation that is concise, low dimensional, and task relevant and that signifies the choice priorities in mental matching behavior. NEW & NOTEWORTHY Studies in object recognition have focused on the ventral stream, in which neurons respond as a function of how similar a stimulus is to their preferred stimulus, and on prefrontal cortex, where neurons indicate which stimulus is being looked for. We found that parietal area LIP uses its known mechanistic properties to form an intermediary representation in this process. This creates a perceptual similarity map that can be used to guide decisions in prefrontal areas.


2017 ◽  
Author(s):  
Krishna C. Puvvada ◽  
Jonathan Z. Simon

AbstractThe ability to parse a complex auditory scene into perceptual objects is facilitated by a hierarchical auditory system. Successive stages in the hierarchy transform an auditory scene of multiple overlapping sources, from peripheral tonotopically-based representations in the auditory nerve, into perceptually distinct auditory-objects based representation in auditory cortex. Here, using magnetoencephalography (MEG) recordings from human subjects, both men and women, we investigate how a complex acoustic scene consisting of multiple speech sources is represented in distinct hierarchical stages of auditory cortex. Using systems-theoretic methods of stimulus reconstruction, we show that the primary-like areas in auditory cortex contain dominantly spectro-temporal based representations of the entire auditory scene. Here, both attended and ignored speech streams are represented with almost equal fidelity, and a global representation of the full auditory scene with all its streams is a better candidate neural representation than that of individual streams being represented separately. In contrast, we also show that higher order auditory cortical areas represent the attended stream separately, and with significantly higher fidelity, than unattended streams. Furthermore, the unattended background streams are more faithfully represented as a single unsegregated background object rather than as separated objects. Taken together, these findings demonstrate the progression of the representations and processing of a complex acoustic scene up through the hierarchy of human auditory cortex.Significance StatementUsing magnetoencephalography (MEG) recordings from human listeners in a simulated cocktail party environment, we investigate how a complex acoustic scene consisting of multiple speech sources is represented in separate hierarchical stages of auditory cortex. We show that the primary-like areas in auditory cortex use a dominantly spectro-temporal based representation of the entire auditory scene, with both attended and ignored speech streams represented with almost equal fidelity. In contrast, we show that higher order auditory cortical areas represent an attended speech stream separately from, and with significantly higher fidelity than, unattended speech streams. Furthermore, the unattended background streams are represented as a single undivided background object rather than as distinct background objects.


2019 ◽  
Author(s):  
Ahmad Yousef

We had learnt from cognitive vision that involuntarily visual awareness should be generated by exogenous stimuli; but not indigenous! Given the complexity of understanding the reasons behind the rapid eye movements during vivid dreams; dreams that carry highly bizarre information; dreams that disallow the human subjects to have control over what they see; these types of dreams should be therefore reside under the umbrella of the “involuntary human awareness”. We therefore suggest possibilities of physical particles that could carry the visual information of these extraordinary exogenous stimuli; particles that should be able to invade the human’s eyes while they are closed; particles that have the ability to move the eye rapidly aiming for perfect transformation of the visual information. The present research aims to talk about these particles, proposes scenarios of how human eye & retina deal with them.


2021 ◽  
pp. 1-55
Author(s):  
Jeffrey Frederic Queisser ◽  
Minju Jung ◽  
Takazumi Matsumoto ◽  
Jun Tani

Abstract Generalization by learning is an essential cognitive competency for humans. For example, we can manipulate even unfamiliar objects and can generate mental images before enacting a preplan. How is this possible? Our study investigated this problem by revisiting our previous study (Jung, Matsumoto, & Tani, 2019), which examined the problem of vision-based, goal-directed planning by robots performing a task of block stacking. By extending the previous study, our work introduces a large network comprising dynamically interacting submodules, including visual working memory (VWMs), a visual attention module, and an executive network. The executive network predicts motor signals, visual images, and various controls for attention, as well as masking of visual information. The most significant difference from the previous study is that our current model contains an additional VWM. The entire network is trained by using predictive coding and an optimal visuomotor plan to achieve a given goal state is inferred using active inference. Results indicate that our current model performs significantly better than that used in Jung et al. (2019), especially when manipulating blocks with unlearned colors and textures. Simulation results revealed that the observed generalization was achieved because content-agnostic information processing developed through synergistic interaction between the second VWM and other modules during the course of learning, in which memorizing image contents and transforming them are dissociated. This letter verifies this claim by conducting both qualitative and quantitative analysis of simulation results.


Sign in / Sign up

Export Citation Format

Share Document