3D gaze tracking with easy calibration using stereo cameras for robot and human communication

Author(s):  
Takashi Nagamatsu ◽  
Junzo Kamahara ◽  
Naoki Tanaka
2018 ◽  
Vol 8 (10) ◽  
pp. 1769
Author(s):  
Zijing Wan ◽  
Xiangjun Wang ◽  
Lei Yin ◽  
Kai Zhou

This paper proposes a 3D point-of-regard estimation method based on 3D eye model and a corresponding head-mounted gaze tracking device. Firstly, a head-mounted gaze tracking system is given. The gaze tracking device uses two pairs of stereo cameras to capture the left and right eye images, respectively, and then sets a pair of scene cameras to capture the scene images. Secondly, a 3D eye model and the calibration process are established. Common eye features are used to estimate the eye model parameters. Thirdly, a 3D point-of-regard estimation algorithm is proposed. Three main parts of this method are summarized as follows: (1) the spatial coordinates of the eye features are directly calculated by using stereo cameras; (2) the pupil center normal is used to the initial value for the estimation of optical axis; (3) a pair of scene cameras are used to solve the actual position of the objects being watched in the calibration process, and the calibration for the proposed eye model does not need the assistance of the light source. Experimental results show that the proposed method can output the coordinates of 3D point-of-regard more accurately.


2011 ◽  
Vol 31 (4) ◽  
pp. 0415002
Author(s):  
张琼 Zhang Qiong ◽  
王志良 Wang Zhiliang ◽  
迟健男 Chi Jiannan ◽  
史雪飞 Shi Xuefei

2009 ◽  
Vol 23 (2) ◽  
pp. 63-76 ◽  
Author(s):  
Silke Paulmann ◽  
Sarah Jessen ◽  
Sonja A. Kotz

The multimodal nature of human communication has been well established. Yet few empirical studies have systematically examined the widely held belief that this form of perception is facilitated in comparison to unimodal or bimodal perception. In the current experiment we first explored the processing of unimodally presented facial expressions. Furthermore, auditory (prosodic and/or lexical-semantic) information was presented together with the visual information to investigate the processing of bimodal (facial and prosodic cues) and multimodal (facial, lexic, and prosodic cues) human communication. Participants engaged in an identity identification task, while event-related potentials (ERPs) were being recorded to examine early processing mechanisms as reflected in the P200 and N300 component. While the former component has repeatedly been linked to physical property stimulus processing, the latter has been linked to more evaluative “meaning-related” processing. A direct relationship between P200 and N300 amplitude and the number of information channels present was found. The multimodal-channel condition elicited the smallest amplitude in the P200 and N300 components, followed by an increased amplitude in each component for the bimodal-channel condition. The largest amplitude was observed for the unimodal condition. These data suggest that multimodal information induces clear facilitation in comparison to unimodal or bimodal information. The advantage of multimodal perception as reflected in the P200 and N300 components may thus reflect one of the mechanisms allowing for fast and accurate information processing in human communication.


1988 ◽  
Vol 33 (10) ◽  
pp. 920-921
Author(s):  
L. Kristine Pond
Keyword(s):  

Author(s):  
Patricia L. McDermott ◽  
Jason Luck ◽  
Laurel Allender ◽  
Alia Fisher

Sign in / Sign up

Export Citation Format

Share Document