A panoramic vision system for human-robot interaction

Author(s):  
Ester Martinez ◽  
Angel P. del Pobil
Author(s):  
Soo-Han Kang ◽  
Ji-Hyeong Han

AbstractRobot vision provides the most important information to robots so that they can read the context and interact with human partners successfully. Moreover, to allow humans recognize the robot’s visual understanding during human-robot interaction (HRI), the best way is for the robot to provide an explanation of its understanding in natural language. In this paper, we propose a new approach by which to interpret robot vision from an egocentric standpoint and generate descriptions to explain egocentric videos particularly for HRI. Because robot vision equals to egocentric video on the robot’s side, it contains as much egocentric view information as exocentric view information. Thus, we propose a new dataset, referred to as the global, action, and interaction (GAI) dataset, which consists of egocentric video clips and GAI descriptions in natural language to represent both egocentric and exocentric information. The encoder-decoder based deep learning model is trained based on the GAI dataset and its performance on description generation assessments is evaluated. We also conduct experiments in actual environments to verify whether the GAI dataset and the trained deep learning model can improve a robot vision system


Author(s):  
Yasutake Takahashi ◽  
◽  
Kyohei Yoshida ◽  
Fuminori Hibino ◽  
Yoichiro Maeda

Human-robot interaction requires intuitive interface that is not possible using devices, such as, the joystick or teaching pendant, which also require some trainings. Instruction by gesture is one example of an intuitive interfaces requiring no training, and pointing is one of the simplest gestures. We propose simple pointing recognition for a mobile robot having an upwarddirected camera system. The robot using this recognizes pointing and navigates through simple visual feedback control to where the user points. This paper explores the feasibility and utility of our proposal as shown by the results of a questionnaire on proposed and conventional interfaces.


2012 ◽  
Vol 09 (03) ◽  
pp. 1250024 ◽  
Author(s):  
MARTIN HÜLSE ◽  
SEBASTIAN McBRIDE ◽  
MARK LEE

Eye fixation and gaze fixation patterns in general play an important part when humans interact with each other. Also, gaze fixation patterns of humans are highly determined by the task they perform. Our assumption is that meaningful human–robot interaction with robots having active vision components (such a humanoids) is highly supported if the robot system is able to create task modulated fixation patterns. We present an architecture for a robot active vision system equipped with one manipulator where we demonstrate the generation of task modulated gaze control, meaning that fixation patterns are in accordance with a specific task the robot has to perform. Experiments demonstrate different strategies of multi-modal task modulation for robotic active vision where visual and nonvisual features (tactile feedback) determine gaze fixation patterns. The results are discussed in comparison to purely saliency based strategies toward visual attention and gaze control. The major advantages of our approach to multi-modal task modulation is that the active vision system can generate, first, active avoidance of objects, and second, active engagement with objects. Such behaviors cannot be generated by current approaches of visual attention which are based on saliency models only, but they are important for mimicking human-like gaze fixation patterns.


2015 ◽  
Vol 12 (04) ◽  
pp. 1550019
Author(s):  
Liyuan Li ◽  
Qianli Xu ◽  
Gang S. Wang ◽  
Xinguo Yu ◽  
Yeow Kee Tan ◽  
...  

Computational systems for human–robot interaction (HRI) could benefit from visual perceptions of social cues that are commonly employed in human–human interactions. However, existing systems focus on one or two cues for attention or intention estimation. This research investigates how social robots may exploit a wide spectrum of visual cues for multiparty interactions. It is proposed that the vision system for social cue perception should be supported by two dimensions of functionality, namely, vision functionality and cognitive functionality. A vision-based system is proposed for a robot receptionist to embrace both functionalities for multiparty interactions. The module of vision functionality consists of a suite of methods that computationally recognize potential visual cues related to social behavior understanding. The performance of the models is validated by the ground truth annotation dataset. The module of cognitive functionality consists of two computational models that (1) quantify users’ attention saliency and engagement intentions, and (2) facilitate engagement-aware behaviors for the robot to adjust its direction of attention and manage the conversational floor. The performance of the robot’s engagement-aware behaviors is evaluated in a multiparty dialog scenario. The results show that the robot’s engagement-aware behavior based on visual perceptions significantly improve the effectiveness of communication and positively affect user experience.


2007 ◽  
Vol 04 (02) ◽  
pp. 161-183 ◽  
Author(s):  
F. GUAN ◽  
L. Y. LI ◽  
S. S. GE ◽  
A. P. LOH

In this paper, robust human detection is investigated by fusing the stereo and infrared thermal images for effective interaction between humans and socially interactive robots. A scale-adaptive filter is first designed for the stereo vision system to detect human candidates. To eliminate the difficulty of the vision system in distinguishing human beings from human-like objects, the infrared thermal image is used to solve the ambiguity and reduce the illumination effect. Experimental results show that the fusion of these two types of images gives an improved vision system for robust human detection and identification, which is the most important and essential component of human robot interaction.


Sign in / Sign up

Export Citation Format

Share Document