scholarly journals Visual Robotic Perception System with Incremental Learning for Child–Robot Interaction Scenarios

Technologies ◽  
2021 ◽  
Vol 9 (4) ◽  
pp. 86
Author(s):  
Niki Efthymiou ◽  
Panagiotis Paraskevas Filntisis ◽  
Gerasimos Potamianos ◽  
Petros Maragos

This paper proposes a novel lightweight visual perception system with Incremental Learning (IL), tailored to child–robot interaction scenarios. Specifically, this encompasses both an action and emotion recognition module, with the former wrapped around an IL system, allowing novel actions to be easily added. This IL system enables the tutor aspiring to use robotic agents in interaction scenarios to further customize the system according to children’s needs. We perform extensive evaluations of the developed modules, achieving state-of-the-art results on both the children’s action BabyRobot dataset and the children’s emotion EmoReact dataset. Finally, we demonstrate the robustness and effectiveness of the IL system for action recognition by conducting a thorough experimental analysis for various conditions and parameters.

Electronics ◽  
2021 ◽  
Vol 10 (3) ◽  
pp. 325
Author(s):  
Zhihao Wu ◽  
Baopeng Zhang ◽  
Tianchen Zhou ◽  
Yan Li ◽  
Jianping Fan

In this paper, we developed a practical approach for automatic detection of discrimination actions from social images. Firstly, an image set is established, in which various discrimination actions and relations are manually labeled. To the best of our knowledge, this is the first work to create a dataset for discrimination action recognition and relationship identification. Secondly, a practical approach is developed to achieve automatic detection and identification of discrimination actions and relationships from social images. Thirdly, the task of relationship identification is seamlessly integrated with the task of discrimination action recognition into one single network called the Co-operative Visual Translation Embedding++ network (CVTransE++). We also compared our proposed method with numerous state-of-the-art methods, and our experimental results demonstrated that our proposed methods can significantly outperform state-of-the-art approaches.


Sensors ◽  
2021 ◽  
Vol 21 (12) ◽  
pp. 4233
Author(s):  
Bogdan Mocanu ◽  
Ruxandra Tapu ◽  
Titus Zaharia

Emotion is a form of high-level paralinguistic information that is intrinsically conveyed by human speech. Automatic speech emotion recognition is an essential challenge for various applications; including mental disease diagnosis; audio surveillance; human behavior understanding; e-learning and human–machine/robot interaction. In this paper, we introduce a novel speech emotion recognition method, based on the Squeeze and Excitation ResNet (SE-ResNet) model and fed with spectrogram inputs. In order to overcome the limitations of the state-of-the-art techniques, which fail in providing a robust feature representation at the utterance level, the CNN architecture is extended with a trainable discriminative GhostVLAD clustering layer that aggregates the audio features into compact, single-utterance vector representation. In addition, an end-to-end neural embedding approach is introduced, based on an emotionally constrained triplet loss function. The loss function integrates the relations between the various emotional patterns and thus improves the latent space data representation. The proposed methodology achieves 83.35% and 64.92% global accuracy rates on the RAVDESS and CREMA-D publicly available datasets, respectively. When compared with the results provided by human observers, the gains in global accuracy scores are superior to 24%. Finally, the objective comparative evaluation with state-of-the-art techniques demonstrates accuracy gains of more than 3%.


Author(s):  
Xinmeng Li ◽  
Mamoun Alazab ◽  
Qian Li ◽  
Keping Yu ◽  
Quanjun Yin

AbstractKnowledge graph question answering is an important technology in intelligent human–robot interaction, which aims at automatically giving answer to human natural language question with the given knowledge graph. For the multi-relation question with higher variety and complexity, the tokens of the question have different priority for the triples selection in the reasoning steps. Most existing models take the question as a whole and ignore the priority information in it. To solve this problem, we propose question-aware memory network for multi-hop question answering, named QA2MN, to update the attention on question timely in the reasoning process. In addition, we incorporate graph context information into knowledge graph embedding model to increase the ability to represent entities and relations. We use it to initialize the QA2MN model and fine-tune it in the training process. We evaluate QA2MN on PathQuestion and WorldCup2014, two representative datasets for complex multi-hop question answering. The result demonstrates that QA2MN achieves state-of-the-art Hits@1 accuracy on the two datasets, which validates the effectiveness of our model.


2020 ◽  
Vol 12 (1) ◽  
pp. 58-73
Author(s):  
Sofia Thunberg ◽  
Tom Ziemke

AbstractInteraction between humans and robots will benefit if people have at least a rough mental model of what a robot knows about the world and what it plans to do. But how do we design human-robot interactions to facilitate this? Previous research has shown that one can change people’s mental models of robots by manipulating the robots’ physical appearance. However, this has mostly not been done in a user-centred way, i.e. without a focus on what users need and want. Starting from theories of how humans form and adapt mental models of others, we investigated how the participatory design method, PICTIVE, can be used to generate design ideas about how a humanoid robot could communicate. Five participants went through three phases based on eight scenarios from the state-of-the-art tasks in the RoboCup@Home social robotics competition. The results indicate that participatory design can be a suitable method to generate design concepts for robots’ communication in human-robot interaction.


Nanophotonics ◽  
2020 ◽  
Vol 10 (1) ◽  
pp. 41-74
Author(s):  
Bernard C. Kress ◽  
Ishan Chatterjee

AbstractThis paper is a review and analysis of the various implementation architectures of diffractive waveguide combiners for augmented reality (AR), mixed reality (MR) headsets, and smart glasses. Extended reality (XR) is another acronym frequently used to refer to all variants across the MR spectrum. Such devices have the potential to revolutionize how we work, communicate, travel, learn, teach, shop, and are entertained. Already, market analysts show very optimistic expectations on return on investment in MR, for both enterprise and consumer applications. Hardware architectures and technologies for AR and MR have made tremendous progress over the past five years, fueled by recent investment hype in start-ups and accelerated mergers and acquisitions by larger corporations. In order to meet such high market expectations, several challenges must be addressed: first, cementing primary use cases for each specific market segment and, second, achieving greater MR performance out of increasingly size-, weight-, cost- and power-constrained hardware. One such crucial component is the optical combiner. Combiners are often considered as critical optical elements in MR headsets, as they are the direct window to both the digital content and the real world for the user’s eyes.Two main pillars defining the MR experience are comfort and immersion. Comfort comes in various forms: –wearable comfort—reducing weight and size, pushing back the center of gravity, addressing thermal issues, and so on–visual comfort—providing accurate and natural 3-dimensional cues over a large field of view and a high angular resolution–vestibular comfort—providing stable and realistic virtual overlays that spatially agree with the user’s motion–social comfort—allowing for true eye contact, in a socially acceptable form factor.Immersion can be defined as the multisensory perceptual experience (including audio, display, gestures, haptics) that conveys to the user a sense of realism and envelopment. In order to effectively address both comfort and immersion challenges through improved hardware architectures and software developments, a deep understanding of the specific features and limitations of the human visual perception system is required. We emphasize the need for a human-centric optical design process, which would allow for the most comfortable headset design (wearable, visual, vestibular, and social comfort) without compromising the user’s sense of immersion (display, sensing, and interaction). Matching the specifics of the display architecture to the human visual perception system is key to bound the constraints of the hardware allowing for headset development and mass production at reasonable costs, while providing a delightful experience to the end user.


Sign in / Sign up

Export Citation Format

Share Document