The structure of 3-D shape representations in human vision revealed by eye movements during object recognition

The success of Convolutional Neural Networks (CNNs) in classifying objects has led to a surge of interest in using these systems to understand human vision. Recent studies have argued that when CNNs are trained in the correct learning environment, they can emulate a key property of human vision -- learning to classify objects based on their shape. While showing a shape-bias is indeed a desirable property for any model of human object recognition, it is unclear whether the resulting shape representations learned by these networks are human-like. We explored this question in the context of a well-known observation from psychology showing that humans encode the shape of objects in terms of relations between object features. To check whether this is also true for the representations of CNNs, we ran a series of simulations where we trained CNNs on datasets of novel shapes and tested them on a set of controlled deformations of these shapes. We found that CNNs do not show any enhanced sensitivity to deformations which alter relations between features, even when explicitly trained on such deformations. This behaviour contrasted with human participants in previous studies as well as in a new experiment. We argue that these results are a consequence of a fundamental difference between how humans and CNNs learn to recognise objects: while CNNs select features that allow them to optimally classify the proximal stimulus, humans select features that they infer to be properties of the distal stimulus. This makes human representations more generalisable to novel contexts and tasks.

Download Full-text

Speed discrimination and its relation to involuntary eye movements in human vision

Neuroscience Letters ◽

10.1016/s0304-3940(85)80110-5 ◽

1985 ◽

Vol 54 (1) ◽

pp. 7-12 ◽

Cited By ~ 3

Author(s):

John L. Barbur

Keyword(s):

Eye Movements ◽

Human Vision ◽

Speed Discrimination

Download Full-text

An Advanced Active Vision System with Multimodal Visual Odometry Perception for Humanoid Robots

International Journal of Humanoid Robotics ◽

10.1142/s0219843617500062 ◽

2017 ◽

Vol 14 (03) ◽

pp. 1750006

Author(s):

Xin Wang ◽

Pieter Jonker

Keyword(s):

Eye Movements ◽

Vision System ◽

Low Cost ◽

Active Vision ◽

Human Perception ◽

Personal Space ◽

Visual Odometry ◽

Humanoid Robots ◽

Industrial Applications ◽

Human Vision

Using active vision to perceive surroundings instead of just passively receiving information, humans develop the ability to explore unknown environments. Humanoid robot active vision research has already half a century history. It covers comprehensive research areas and plenty of studies have been done. Nowadays, the new trend is to use a stereo setup or a Kinect with neck movements to realize active vision. However, human perception is a combination of eye and neck movements. This paper presents an advanced active vision system that works in a similar way as human vision. The main contributions are: a design of a set of controllers that mimic eye and neck movements, including saccade eye movements, pursuit eye movements, vestibulo-ocular reflex eye movements and vergence eye movements; an adaptive selection mechanism based on properties of objects to automatically choose an optimal tracking algorithm; a novel Multimodal Visual Odometry Perception method that combines stereopsis and convergence to enable robots to perform both precise action in action space and scene exploration in personal space. Experimental results prove the effectiveness and robustness of our system. Besides, the system works in real-time constraints with low-cost cameras and motors, providing an affordable solution for industrial applications.

Download Full-text

Object recognition during foveating eye movements

Vision Research ◽

10.1016/j.visres.2009.05.022 ◽

2009 ◽

Vol 49 (18) ◽

pp. 2241-2253 ◽

Cited By ~ 18

Author(s):

Alexander C. Schütz ◽

Doris I. Braun ◽

Karl R. Gegenfurtner

Keyword(s):

Object Recognition ◽

Eye Movements

Download Full-text

A dual role for shape skeletons in human vision: perceptual organization and object recognition

10.1101/799650 ◽

2019 ◽

Cited By ~ 1

Author(s):

Vladislav Ayzenberg ◽

Frederik S. Kamps ◽

Daniel D. Dilks ◽

Stella F. Lourenco

Keyword(s):

Object Recognition ◽

Visual System ◽

Perceptual Organization ◽

Human Vision ◽

Shape Information ◽

Functional Roles ◽

Response Profiles ◽

High Level ◽

Skeletal Model ◽

Shape Skeletons

AbstractShape perception is crucial for object recognition. However, it remains unknown exactly how shape information is represented, and, consequently, used by the visual system. Here, we hypothesized that the visual system represents “shape skeletons” to both (1) perceptually organize contours and component parts into a shape percept, and (2) compare shapes to recognize objects. Using functional magnetic resonance imaging (fMRI) and representational similarity analysis (RSA), we found that a model of skeletal similarity explained significant unique variance in the response profiles of V3 and LO, regions known to be involved in perceptual organization and object recognition, respectively. Moreover, the skeletal model remained predictive in these regions even when controlling for other models of visual similarity that approximate low- to high-level visual features (i.e., Gabor-jet, GIST, HMAX, and AlexNet), and across different surface forms, a manipulation that altered object contours while preserving the underlying skeleton. Together, these findings shed light on the functional roles of shape skeletons in human vision, as well as the computational properties of V3 and LO.

Download Full-text

Eye Movements and Practice Effects in the Attentional Dwell Time Paradigm

Experimental Psychology (formerly Zeitschrift für Experimentelle Psychologie) ◽

10.1027/1618-3169/a000170 ◽

2013 ◽

Vol 60 (1) ◽

pp. 22-33

Author(s):

Anders Petersen ◽

Søren Kyllingsbæk

Keyword(s):

Eye Movements ◽

Dwell Time ◽

Temporal Dynamics ◽

Human Vision ◽

Practice Effects ◽

Time Interval ◽

Direct Measure ◽

Intensive Training ◽

Spatial Locations ◽

Attentional Dwell Time

In the attentional dwell time paradigm by Duncan, Ward, and Shapiro (1994) , two backward masked targets are presented at different spatial locations and separated by a varying time interval. Results show that report of the second target is severely impaired when the time interval is less than 500 ms which has been taken as a direct measure of attentional dwell time in human vision. However, we show that eye movements may have confounded the estimate of the dwell time and that the measure may not be robust as previously suggested. The latter is supported by evidence suggesting that intensive training strongly attenuates the dwell time because of habituation to the masks. Thus, this article points to eye movements and masking as two potential methodological pitfalls that should be considered when using the attentional dwell time paradigm to investigate the temporal dynamics of attention.

Download Full-text

Eye Anatomy, Eye Movements and Vision

Gaze Interaction and Applications of Eye Tracking ◽

10.4018/978-1-61350-098-9.ch002 ◽

2012 ◽

pp. 10-20 ◽

Cited By ~ 3

Author(s):

Fiona Mulvey

Keyword(s):

Eye Movements ◽

Human Computer Interaction ◽

Eye Movement ◽

Human Perception ◽

Human Vision ◽

Emotional States ◽

Visual World ◽

Potential Measurement ◽

Movement Data ◽

The Brain

This chapter introduces the basics of eye anatomy, eye movements and vision. It will explain the concepts behind human vision sufficiently for the reader to understand later chapters in the book on human perception and attention, and their relationship to (and potential measurement with) eye movements. We will first describe the path of light from the environment through the structures of the eye and on to the brain, as an introduction to the physiology of vision. We will then describe the image registered by the eye, and the types of movements the eye makes in order to perceive the environment as a cogent whole. This chapter explains how eye movements can be thought of as the interface between the visual world and the brain, and why eye movement data can be analysed not only in terms of the environment, or what is looked at, but also in terms of the brain, or subjective cognitive and emotional states. These two aspects broadly define the scope and applicability of eye movements technology in research and in human computer interaction in later sections of the book.

Download Full-text

A Network Model of Object Recognition in Human Vision

Neural Networks for Perception ◽

10.1016/b978-0-12-741251-1.50009-6 ◽

1992 ◽

pp. 25-40 ◽

Cited By ~ 2

Author(s):

S. EDELMAN

Keyword(s):

Object Recognition ◽

Network Model ◽

Human Vision

Download Full-text

COMBINED MODEL-BASED 3D OBJECT RECOGNITION

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s0218001405004368 ◽

2005 ◽

Vol 19 (07) ◽

pp. 839-852 ◽

Cited By ~ 1

Author(s):

SUNGHO KIM ◽

GIJEONG JANG ◽

WANG-HEON LEE ◽

IN SO KWEON

Keyword(s):

Object Recognition ◽

Object Identification ◽

Feature Binding ◽

Human Vision ◽

3D Object Recognition ◽

Top Down ◽

Combined Model ◽

Bottom Up ◽

3D Object ◽

Model Based

This paper presents a combined model-based 3D object recognition method motivated by the robust properties of human vision. The human visual system (HVS) is very efficient and robust in identifying and grabbing objects, in part because of its properties of visual attention, contrast mechanism, feature binding, multiresolution and part-based representation. In addition, the HVS combines bottom-up and top-down information effectively using combined model representation. We propose a method for integrating these aspects under a Monte Carlo method. In this scheme, object recognition is regarded as a parameter optimization problem. The bottom-up process initializes parameters, and the top-down process optimizes them. Experimental results show that the proposed recognition model is feasible for 3D object identification and pose estimation.

Download Full-text