Toward a Vietnamese facial expression recognition system for human-robot interaction

An intriguing challenge in the human–robot interaction field is the prospect of endowing robots with emotional intelligence to make the interaction more genuine, intuitive, and natural. A crucial aspect in achieving this goal is the robot’s capability to infer and interpret human emotions. Thanks to its design and open programming platform, the NAO humanoid robot is one of the most widely used agents for human interaction. As with person-to-person communication, facial expressions are the privileged channel for recognizing the interlocutor’s emotional expressions. Although NAO is equipped with a facial expression recognition module, specific use cases may require additional features and affective computing capabilities that are not currently available. This study proposes a highly accurate convolutional-neural-network-based facial expression recognition model that is able to further enhance the NAO robot’ awareness of human facial expressions and provide the robot with an interlocutor’s arousal level detection capability. Indeed, the model tested during human–robot interactions was 91% and 90% accurate in recognizing happy and sad facial expressions, respectively; 75% accurate in recognizing surprised and scared expressions; and less accurate in recognizing neutral and angry expressions. Finally, the model was successfully integrated into the NAO SDK, thus allowing for high-performing facial expression classification with an inference time of 0.34 ± 0.04 s.

Download Full-text

cGAN Based Facial Expression Recognition for Human-Robot Interaction

IEEE Access ◽

10.1109/access.2019.2891668 ◽

2019 ◽

Vol 7 ◽

pp. 9848-9859 ◽

Cited By ~ 19

Author(s):

Jia Deng ◽

Gaoyang Pang ◽

Zhiyu Zhang ◽

Zhibo Pang ◽

Huayong Yang ◽

...

Keyword(s):

Facial Expression ◽

Facial Expression Recognition ◽

Human Robot Interaction ◽

Expression Recognition ◽

Robot Interaction

Download Full-text

CNN-Based Facial Expression Recognition from Annotated RGB-D Images for Human–Robot Interaction

International Journal of Humanoid Robotics ◽

10.1142/s0219843619410020 ◽

2019 ◽

Vol 16 (04) ◽

pp. 1941002 ◽

Cited By ~ 7

Author(s):

Jing Li ◽

Yang Mi ◽

Gongfa Li ◽

Zhaojie Ju

Keyword(s):

Facial Expression ◽

Facial Expression Recognition ◽

Recognition Task ◽

Recognition System ◽

Human Robot Interaction ◽

Microsoft Kinect ◽

Depth Information ◽

Expression Recognition ◽

Stream Network ◽

Depth Images

Facial expression recognition has been widely used in human computer interaction (HCI) systems. Over the years, researchers have proposed different feature descriptors, implemented different classification methods, and carried out a number of experiments on various datasets for automatic facial expression recognition. However, most of them used 2D static images or 2D video sequences for the recognition task. The main limitations of 2D-based analysis are problems associated with variations in pose and illumination, which reduce the recognition accuracy. Therefore, an alternative way is to incorporate depth information acquired by 3D sensor, because it is invariant in both pose and illumination. In this paper, we present a two-stream convolutional neural network (CNN)-based facial expression recognition system and test it on our own RGB-D facial expression dataset collected by Microsoft Kinect for XBOX in unspontaneous scenarios since Kinect is an inexpensive and portable device to capture both RGB and depth information. Our fully annotated dataset includes seven expressions (i.e., neutral, sadness, disgust, fear, happiness, anger, and surprise) for 15 subjects (9 males and 6 females) aged from 20 to 25. The two individual CNNs are identical in architecture but do not share parameters. To combine the detection results produced by these two CNNs, we propose the late fusion approach. The experimental results demonstrate that the proposed two-stream network using RGB-D images is superior to that of using only RGB images or depth images.

Download Full-text

Combining 2D Gabor and Local Binary Pattern for Facial Expression Recognition Using Extreme Learning Machine

Journal of Advanced Computational Intelligence and Intelligent Informatics ◽

10.20965/jaciii.2019.p0444 ◽

2019 ◽

Vol 23 (3) ◽

pp. 444-455 ◽

Cited By ~ 5

Author(s):

Zhen-Tao Liu ◽

Si-Han Li ◽

Wei-Hua Cao ◽

Dan-Yun Li ◽

Man Hao ◽

...

Keyword(s):

Facial Expression ◽

Extreme Learning Machine ◽

Facial Expression Recognition ◽

Local Binary Pattern ◽

Human Robot Interaction ◽

Support Vector ◽

Expression Recognition ◽

Robot Interaction ◽

Interaction Detection ◽

Learning Machine

The efficiency of facial expression recognition (FER) is important for human-robot interaction. Detection of the facial region, extraction of discriminative facial expression features, and identification of categories of facial expressions are all related to the recognition accuracy and time-efficiency. An FER framework is proposed, in which 2D Gabor and local binary pattern (LBP) are combined to extract discriminative features of salient facial expression patches, and extreme learning machine (ELM) is adopted to identify facial expression categories. The combination of 2D Gabor and LBP can not only describe multiscale and multidirectional textural features, but also capture small local details. The FER of ELM and support vector machine (SVM) is performed using the Japanese female facial expression database and extended Cohn-Kanade database, respectively, in which both ELM and SVM achieve an accuracy of more than 85%, and the computational efficiency of ELM is higher than that of SVM. The proposed framework has been used in the multimodal emotional communication based humans-robots interaction system, in which FER within 2 seconds enables real-time human-robot interaction.

Download Full-text

ExGenNet: Learning to Generate Robotic Facial Expression Using Facial Expression Recognition

Frontiers in Robotics and AI ◽

10.3389/frobt.2021.730317 ◽

2022 ◽

Vol 8 ◽

Author(s):

Niyati Rawal ◽

Dorothea Koert ◽

Cigdem Turan ◽

Kristian Kersting ◽

Jan Peters ◽

...

Keyword(s):

Facial Expression ◽

Facial Expressions ◽

Facial Expression Recognition ◽

Human Subjects ◽

Humanoid Robots ◽

Human Robot Interaction ◽

Expression Recognition ◽

Robot Interaction ◽

Facial Images ◽

Intended Expression

The ability of a robot to generate appropriate facial expressions is a key aspect of perceived sociability in human-robot interaction. Yet many existing approaches rely on the use of a set of fixed, preprogrammed joint configurations for expression generation. Automating this process provides potential advantages to scale better to different robot types and various expressions. To this end, we introduce ExGenNet, a novel deep generative approach for facial expressions on humanoid robots. ExGenNets connect a generator network to reconstruct simplified facial images from robot joint configurations with a classifier network for state-of-the-art facial expression recognition. The robots’ joint configurations are optimized for various expressions by backpropagating the loss between the predicted expression and intended expression through the classification network and the generator network. To improve the transfer between human training images and images of different robots, we propose to use extracted features in the classifier as well as in the generator network. Unlike most studies on facial expression generation, ExGenNets can produce multiple configurations for each facial expression and be transferred between robots. Experimental evaluations on two robots with highly human-like faces, Alfie (Furhat Robot) and the android robot Elenoide, show that ExGenNet can successfully generate sets of joint configurations for predefined facial expressions on both robots. This ability of ExGenNet to generate realistic facial expressions was further validated in a pilot study where the majority of human subjects could accurately recognize most of the generated facial expressions on both the robots.

Download Full-text