Natural language guided object retrieval in images

Ahmad Ostovar; Suna Bensch; Thomas Hellström

doi:10.1007/s00236-021-00400-2

Natural language guided object retrieval in images

Acta Informatica ◽

10.1007/s00236-021-00400-2 ◽

2021 ◽

Vol 58 (4) ◽

pp. 243-261

Author(s):

Ahmad Ostovar ◽

Suna Bensch ◽

Thomas Hellström

Keyword(s):

Natural Language ◽

Spatial Relations ◽

Human Robot Interaction ◽

Surveillance Systems ◽

Neural Net ◽

Robot Interaction ◽

Surrounding Environment ◽

Bounding Boxes ◽

Image Caption

AbstractThe ability to understand the surrounding environment and being able to communicate with interacting humans are important functionalities for many automated systems where visual input (e.g., images, video) and natural language input (speech or text) have to be related to each other. Possible applications are automatic image caption generation, interactive surveillance systems, or human robot interaction. In this paper, we propose algorithms for automatic responses to natural language queries about an image. Our approach uses a predefined neural net for detection of bounding boxes and objects in images, spatial relations between bounding boxes are modeled with a neural net, the queries are analyzed with a syntactic parser, and algorithms to map natural language to properties in the images are introduced. The algorithms make use of semantic similarity and antonyms. We evaluate the performance of our approach with test users assessing the quality of our system’s generated answers.

Download Full-text

Grounding spatial relations in natural language by fuzzy representation for human-robot interaction

2014 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE) ◽

10.1109/fuzz-ieee.2014.6891797 ◽

2014 ◽

Cited By ~ 7

Author(s):

Jiacheng Tan ◽

Zhaojie Ju ◽

Honghai Liu

Keyword(s):

Natural Language ◽

Spatial Relations ◽

Human Robot Interaction ◽

Robot Interaction ◽

Fuzzy Representation

Download Full-text

Modeling dynamic spatial relations with global properties for natural language-based human-robot interaction

2013 IEEE RO-MAN ◽

10.1109/roman.2013.6628521 ◽

2013 ◽

Cited By ~ 5

Author(s):

Juan Fasola ◽

Maja J. Mataric

Keyword(s):

Natural Language ◽

Spatial Relations ◽

Human Robot Interaction ◽

Robot Interaction ◽

Global Properties

Download Full-text

Shall I trust you? From child human-robot interaction to trusting relationships

10.31234/osf.io/jqfwp ◽

2019 ◽

Author(s):

Cinzia Di Dio ◽

Federico Manzi ◽

Giulia Peretti ◽

Angelo Cangelosi ◽

Paul L. Harris ◽

...

Keyword(s):

Behavioral Pattern ◽

Human Robot Interaction ◽

School Age Children ◽

Social Relevance ◽

Robot Interaction ◽

Cognitive Correlates ◽

The Social ◽

The Relationship

Studying trust within human-robot interaction is of great importance given the social relevance of robotic agents in a variety of contexts. We investigated the acquisition, loss and restoration of trust when preschool and school-age children played with either a human or a humanoid robot in-vivo. The relationship between trust and the quality of attachment relationships, Theory of Mind, and executive function skills was also investigated. No differences were found in children’s trust in the play-partner as a function of agency (human or robot). Nevertheless, 3-years-olds showed a trend toward trusting the human more than the robot, while 7-years-olds displayed the reverse behavioral pattern, thus highlighting the developing interplay between affective and cognitive correlates of trust.

Download Full-text

Object Learning with Natural Language in a Distributed Intelligent System: A Case Study of Human-Robot Interaction

Advances in Intelligent Systems and Computing - Foundations and Practical Applications of Cognitive Systems and Information Processing ◽

10.1007/978-3-642-37835-5_70 ◽

2013 ◽

pp. 811-819 ◽

Cited By ~ 1

Author(s):

Stefan Heinrich ◽

Pascal Folleher ◽

Peer Springstübe ◽

Erik Strahl ◽

Johannes Twiefel ◽

...

Keyword(s):

Natural Language ◽

Intelligent System ◽

Human Robot Interaction ◽

Robot Interaction ◽

System A ◽

Object Learning

Download Full-text

Exploiting deep semantics and compositionality of natural language for Human-Robot-Interaction

2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) ◽

10.1109/iros.2016.7759133 ◽

2016 ◽

Cited By ~ 9

Author(s):

Manfred Eppe ◽

Sean Trott ◽

Jerome Feldman

Keyword(s):

Natural Language ◽

Human Robot Interaction ◽

Robot Interaction

Download Full-text

Grounding spatial relations for human-robot interaction

2013 IEEE/RSJ International Conference on Intelligent Robots and Systems ◽

10.1109/iros.2013.6696569 ◽

2013 ◽

Cited By ~ 50

Author(s):

Sergio Guadarrama ◽

Lorenzo Riano ◽

Dave Golland ◽

Daniel Gouhring ◽

Yangqing Jia ◽

...

Keyword(s):

Spatial Relations ◽

Human Robot Interaction ◽

Robot Interaction

Download Full-text

Multi-mode Natural Language Processing for human-robot interaction

Web Intelligence ◽

10.3233/web-150325 ◽

2015 ◽

Vol 13 (4) ◽

pp. 267-278 ◽

Cited By ~ 1

Author(s):

Jiongkun Xie ◽

Xiaoping Chen ◽

Jianmin Ji

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Human Robot Interaction ◽

Robot Interaction ◽

Multi Mode

Download Full-text

Human-robot interaction with minimal spanning natural language template for autonomous and tele-operated control

Proceedings of the 1997 IEEE/RSJ International Conference on Intelligent Robot and Systems. Innovative Robotics for Real-World Applications. IROS '97 ◽

10.1109/iros.1997.649069 ◽

2002 ◽

Cited By ~ 6

Author(s):

J.S. Zelek

Keyword(s):

Natural Language ◽

Human Robot Interaction ◽

Robot Interaction

Download Full-text

Video Captioning Based on Both Egocentric and Exocentric Views of Robot Vision for Human-Robot Interaction

International Journal of Social Robotics ◽

10.1007/s12369-021-00842-1 ◽

2021 ◽

Author(s):

Soo-Han Kang ◽

Ji-Hyeong Han

Keyword(s):

Deep Learning ◽

Natural Language ◽

Vision System ◽

Robot Vision ◽

Learning Model ◽

Human Robot Interaction ◽

Robot Interaction ◽

Video Captioning ◽

Global Action ◽

Deep Learning Model

AbstractRobot vision provides the most important information to robots so that they can read the context and interact with human partners successfully. Moreover, to allow humans recognize the robot’s visual understanding during human-robot interaction (HRI), the best way is for the robot to provide an explanation of its understanding in natural language. In this paper, we propose a new approach by which to interpret robot vision from an egocentric standpoint and generate descriptions to explain egocentric videos particularly for HRI. Because robot vision equals to egocentric video on the robot’s side, it contains as much egocentric view information as exocentric view information. Thus, we propose a new dataset, referred to as the global, action, and interaction (GAI) dataset, which consists of egocentric video clips and GAI descriptions in natural language to represent both egocentric and exocentric information. The encoder-decoder based deep learning model is trained based on the GAI dataset and its performance on description generation assessments is evaluated. We also conduct experiments in actual environments to verify whether the GAI dataset and the trained deep learning model can improve a robot vision system

Download Full-text

CCTV, Risk Management and Regulation Mechanisms in Publicly-Used Places: a Discussion Based on Swiss Examples

Surveillance & Society ◽

10.24908/ss.v2i2/3.3386 ◽

2002 ◽

Vol 2 (2/3) ◽

Cited By ~ 1

Author(s):

Jean Ruegg ◽

Valerie November ◽

Francisco Klauser

Keyword(s):

Risk Management ◽

Video Surveillance ◽

Spatial Relations ◽

City Centre ◽

General Context ◽

Surveillance Systems ◽

Different Types ◽

Regulation Mechanisms ◽

Definition Of

This paper focuses on the relations between different types of actors involved in both conceiving and using video-surveillance systems. More specifically, it deals with the reasons that support the growing use of video-surveillance systems, and the organisation structures and implementation schemes that are designed to cope with them. The analysis raises issues linked to the complexity of social and spatial relations that CCTV tends to produce. Based on four Swiss case studies chosen in function of different objectives (risks), different types of public spaces that are under surveillance (city centre, motorway, industrial zone, public transport), as well as different stages of completion of a CCTV project, the main results are to document new categories of actors: the definition of the relationship between CCTV-providers and end-users must be enlarged. Many more actors are playing important roles in terms of risk management and decision making while designing and implementing CCTV systems. Risks under surveillance: different types of risks are under surveillance. The study is underlining that different forms of surveillance must be distinguished, given the spatial characteristics of every risk (diffuse, located, specific and/or territorialized). The 'distancing effect': CCTV obviously creates distance between the object and the place where surveillance is actually made. To go a bit further, the paper claims that several kinds of distancing effects should be considered. These distancing effects modify both the quality of places under surveillance and the general context where mechanisms can be designed and implemented for a better public regulation of CCTV uses.

Download Full-text