The structure of 3-D shape representations in human vision revealed by eye movements during object recognition

2006 ◽  
Author(s):  
Charles Leek
2021 ◽  
Author(s):  
Gaurav Malhotra ◽  
Marin Dujmovic ◽  
John Hummel ◽  
Jeffrey S Bowers

The success of Convolutional Neural Networks (CNNs) in classifying objects has led to a surge of interest in using these systems to understand human vision. Recent studies have argued that when CNNs are trained in the correct learning environment, they can emulate a key property of human vision -- learning to classify objects based on their shape. While showing a shape-bias is indeed a desirable property for any model of human object recognition, it is unclear whether the resulting shape representations learned by these networks are human-like. We explored this question in the context of a well-known observation from psychology showing that humans encode the shape of objects in terms of relations between object features. To check whether this is also true for the representations of CNNs, we ran a series of simulations where we trained CNNs on datasets of novel shapes and tested them on a set of controlled deformations of these shapes. We found that CNNs do not show any enhanced sensitivity to deformations which alter relations between features, even when explicitly trained on such deformations. This behaviour contrasted with human participants in previous studies as well as in a new experiment. We argue that these results are a consequence of a fundamental difference between how humans and CNNs learn to recognise objects: while CNNs select features that allow them to optimally classify the proximal stimulus, humans select features that they infer to be properties of the distal stimulus. This makes human representations more generalisable to novel contexts and tasks.


2017 ◽  
Vol 14 (03) ◽  
pp. 1750006
Author(s):  
Xin Wang ◽  
Pieter Jonker

Using active vision to perceive surroundings instead of just passively receiving information, humans develop the ability to explore unknown environments. Humanoid robot active vision research has already half a century history. It covers comprehensive research areas and plenty of studies have been done. Nowadays, the new trend is to use a stereo setup or a Kinect with neck movements to realize active vision. However, human perception is a combination of eye and neck movements. This paper presents an advanced active vision system that works in a similar way as human vision. The main contributions are: a design of a set of controllers that mimic eye and neck movements, including saccade eye movements, pursuit eye movements, vestibulo-ocular reflex eye movements and vergence eye movements; an adaptive selection mechanism based on properties of objects to automatically choose an optimal tracking algorithm; a novel Multimodal Visual Odometry Perception method that combines stereopsis and convergence to enable robots to perform both precise action in action space and scene exploration in personal space. Experimental results prove the effectiveness and robustness of our system. Besides, the system works in real-time constraints with low-cost cameras and motors, providing an affordable solution for industrial applications.


2009 ◽  
Vol 49 (18) ◽  
pp. 2241-2253 ◽  
Author(s):  
Alexander C. Schütz ◽  
Doris I. Braun ◽  
Karl R. Gegenfurtner

2019 ◽  
Author(s):  
Vladislav Ayzenberg ◽  
Frederik S. Kamps ◽  
Daniel D. Dilks ◽  
Stella F. Lourenco

AbstractShape perception is crucial for object recognition. However, it remains unknown exactly how shape information is represented, and, consequently, used by the visual system. Here, we hypothesized that the visual system represents “shape skeletons” to both (1) perceptually organize contours and component parts into a shape percept, and (2) compare shapes to recognize objects. Using functional magnetic resonance imaging (fMRI) and representational similarity analysis (RSA), we found that a model of skeletal similarity explained significant unique variance in the response profiles of V3 and LO, regions known to be involved in perceptual organization and object recognition, respectively. Moreover, the skeletal model remained predictive in these regions even when controlling for other models of visual similarity that approximate low- to high-level visual features (i.e., Gabor-jet, GIST, HMAX, and AlexNet), and across different surface forms, a manipulation that altered object contours while preserving the underlying skeleton. Together, these findings shed light on the functional roles of shape skeletons in human vision, as well as the computational properties of V3 and LO.


Author(s):  
Anders Petersen ◽  
Søren Kyllingsbæk

In the attentional dwell time paradigm by Duncan, Ward, and Shapiro (1994) , two backward masked targets are presented at different spatial locations and separated by a varying time interval. Results show that report of the second target is severely impaired when the time interval is less than 500 ms which has been taken as a direct measure of attentional dwell time in human vision. However, we show that eye movements may have confounded the estimate of the dwell time and that the measure may not be robust as previously suggested. The latter is supported by evidence suggesting that intensive training strongly attenuates the dwell time because of habituation to the masks. Thus, this article points to eye movements and masking as two potential methodological pitfalls that should be considered when using the attentional dwell time paradigm to investigate the temporal dynamics of attention.


Author(s):  
Fiona Mulvey

This chapter introduces the basics of eye anatomy, eye movements and vision. It will explain the concepts behind human vision sufficiently for the reader to understand later chapters in the book on human perception and attention, and their relationship to (and potential measurement with) eye movements. We will first describe the path of light from the environment through the structures of the eye and on to the brain, as an introduction to the physiology of vision. We will then describe the image registered by the eye, and the types of movements the eye makes in order to perceive the environment as a cogent whole. This chapter explains how eye movements can be thought of as the interface between the visual world and the brain, and why eye movement data can be analysed not only in terms of the environment, or what is looked at, but also in terms of the brain, or subjective cognitive and emotional states. These two aspects broadly define the scope and applicability of eye movements technology in research and in human computer interaction in later sections of the book.


Author(s):  
SUNGHO KIM ◽  
GIJEONG JANG ◽  
WANG-HEON LEE ◽  
IN SO KWEON

This paper presents a combined model-based 3D object recognition method motivated by the robust properties of human vision. The human visual system (HVS) is very efficient and robust in identifying and grabbing objects, in part because of its properties of visual attention, contrast mechanism, feature binding, multiresolution and part-based representation. In addition, the HVS combines bottom-up and top-down information effectively using combined model representation. We propose a method for integrating these aspects under a Monte Carlo method. In this scheme, object recognition is regarded as a parameter optimization problem. The bottom-up process initializes parameters, and the top-down process optimizes them. Experimental results show that the proposed recognition model is feasible for 3D object identification and pose estimation.


Sign in / Sign up

Export Citation Format

Share Document