scholarly journals Natural language guided object retrieval in images

2021 ◽  
Vol 58 (4) ◽  
pp. 243-261
Author(s):  
Ahmad Ostovar ◽  
Suna Bensch ◽  
Thomas Hellström

AbstractThe ability to understand the surrounding environment and being able to communicate with interacting humans are important functionalities for many automated systems where visual input (e.g., images, video) and natural language input (speech or text) have to be related to each other. Possible applications are automatic image caption generation, interactive surveillance systems, or human robot interaction. In this paper, we propose algorithms for automatic responses to natural language queries about an image. Our approach uses a predefined neural net for detection of bounding boxes and objects in images, spatial relations between bounding boxes are modeled with a neural net, the queries are analyzed with a syntactic parser, and algorithms to map natural language to properties in the images are introduced. The algorithms make use of semantic similarity and antonyms. We evaluate the performance of our approach with test users assessing the quality of our system’s generated answers.

2019 ◽  
Author(s):  
Cinzia Di Dio ◽  
Federico Manzi ◽  
Giulia Peretti ◽  
Angelo Cangelosi ◽  
Paul L. Harris ◽  
...  

Studying trust within human-robot interaction is of great importance given the social relevance of robotic agents in a variety of contexts. We investigated the acquisition, loss and restoration of trust when preschool and school-age children played with either a human or a humanoid robot in-vivo. The relationship between trust and the quality of attachment relationships, Theory of Mind, and executive function skills was also investigated. No differences were found in children’s trust in the play-partner as a function of agency (human or robot). Nevertheless, 3-years-olds showed a trend toward trusting the human more than the robot, while 7-years-olds displayed the reverse behavioral pattern, thus highlighting the developing interplay between affective and cognitive correlates of trust.


Author(s):  
Sergio Guadarrama ◽  
Lorenzo Riano ◽  
Dave Golland ◽  
Daniel Gouhring ◽  
Yangqing Jia ◽  
...  

2015 ◽  
Vol 13 (4) ◽  
pp. 267-278 ◽  
Author(s):  
Jiongkun Xie ◽  
Xiaoping Chen ◽  
Jianmin Ji

Author(s):  
Soo-Han Kang ◽  
Ji-Hyeong Han

AbstractRobot vision provides the most important information to robots so that they can read the context and interact with human partners successfully. Moreover, to allow humans recognize the robot’s visual understanding during human-robot interaction (HRI), the best way is for the robot to provide an explanation of its understanding in natural language. In this paper, we propose a new approach by which to interpret robot vision from an egocentric standpoint and generate descriptions to explain egocentric videos particularly for HRI. Because robot vision equals to egocentric video on the robot’s side, it contains as much egocentric view information as exocentric view information. Thus, we propose a new dataset, referred to as the global, action, and interaction (GAI) dataset, which consists of egocentric video clips and GAI descriptions in natural language to represent both egocentric and exocentric information. The encoder-decoder based deep learning model is trained based on the GAI dataset and its performance on description generation assessments is evaluated. We also conduct experiments in actual environments to verify whether the GAI dataset and the trained deep learning model can improve a robot vision system


2002 ◽  
Vol 2 (2/3) ◽  
Author(s):  
Jean Ruegg ◽  
Valerie November ◽  
Francisco Klauser

This paper focuses on the relations between different types of actors involved in both conceiving and using video-surveillance systems. More specifically, it deals with the reasons that support the growing use of video-surveillance systems, and the organisation structures and implementation schemes that are designed to cope with them. The analysis raises issues linked to the complexity of social and spatial relations that CCTV tends to produce. Based on four Swiss case studies chosen in function of different objectives (risks), different types of public spaces that are under surveillance (city centre, motorway, industrial zone, public transport), as well as different stages of completion of a CCTV project, the main results are to document new categories of actors: the definition of the relationship between CCTV-providers and end-users must be enlarged. Many more actors are playing important roles in terms of risk management and decision making while designing and implementing CCTV systems. Risks under surveillance: different types of risks are under surveillance. The study is underlining that different forms of surveillance must be distinguished, given the spatial characteristics of every risk (diffuse, located, specific and/or territorialized). The 'distancing effect': CCTV obviously creates distance between the object and the place where surveillance is actually made. To go a bit further, the paper claims that several kinds of distancing effects should be considered. These distancing effects modify both the quality of places under surveillance and the general context where mechanisms can be designed and implemented for a better public regulation of CCTV uses.


Sign in / Sign up

Export Citation Format

Share Document