Context-aware Cascade Attention-based RNN for Video Emotion Recognition

Emotion recognition plays an important role in human–computer interactions. Recent studies have focused on video emotion recognition in the wild and have run into difficulties related to occlusion, illumination, complex behavior over time, and auditory cues. State-of-the-art methods use multiple modalities, such as frame-level, spatiotemporal, and audio approaches. However, such methods have difficulties in exploiting long-term dependencies in temporal information, capturing contextual information, and integrating multi-modal information. In this paper, we introduce a multi-modal flexible system for video-based emotion recognition in the wild. Our system tracks and votes on significant faces corresponding to persons of interest in a video to classify seven basic emotions. The key contribution of this study is that it proposes the use of face feature extraction with context-aware and statistical information for emotion recognition. We also build two model architectures to effectively exploit long-term dependencies in temporal information with a temporal-pyramid model and a spatiotemporal model with “Conv2D+LSTM+3DCNN+Classify” architecture. Finally, we propose the best selection ensemble to improve the accuracy of multi-modal fusion. The best selection ensemble selects the best combination from spatiotemporal and temporal-pyramid models to achieve the best accuracy for classifying the seven basic emotions. In our experiment, we take benchmark measurement on the AFEW dataset with high accuracy.

Download Full-text

Context-Aware Based Visual-Audio Feature Fusion for Emotion Recognition

10.1109/ijcnn52387.2021.9533473 ◽

2021 ◽

Author(s):

Huijie Cheng ◽

Yun Tie ◽

Lin Qi ◽

Cong Jin

Keyword(s):

Emotion Recognition ◽

Feature Fusion ◽

Context Aware ◽

Audio Feature

Download Full-text

Sequential Interactive Biased Network for Context-Aware Emotion Recognition

2021 IEEE International Joint Conference on Biometrics (IJCB) ◽

10.1109/ijcb52358.2021.9484370 ◽

2021 ◽

Author(s):

Xinpeng Li ◽

Xiaojiang Peng ◽

Changxing Ding

Keyword(s):

Emotion Recognition ◽

Context Aware

Download Full-text

Context-Aware Attention Network for Human Emotion Recognition in Video

Advances in Multimedia ◽

10.1155/2020/8843413 ◽

2020 ◽

Vol 2020 ◽

pp. 1-10

Author(s):

Xiaodong Liu ◽

Miao Wang

Keyword(s):

Facial Expression ◽

Emotion Recognition ◽

Recognition Accuracy ◽

Experimental Results ◽

The Other ◽

Context Information ◽

Context Aware ◽

Attention Network ◽

Human Emotion ◽

Video Frames

Recognition of human emotion from facial expression is affected by distortions of pictorial quality and facial pose, which is often ignored by traditional video emotion recognition methods. On the other hand, context information can also provide different degrees of extra clues, which can further improve the recognition accuracy. In this paper, we first build a video dataset with seven categories of human emotion, named human emotion in the video (HEIV). With the HEIV dataset, we trained a context-aware attention network (CAAN) to recognize human emotion. The network consists of two subnetworks to process both face and context information. Features from facial expression and context clues are fused to represent the emotion of video frames, which will be then passed through an attention network and generate emotion scores. Then, the emotion features of all frames will be aggregated according to their emotional score. Experimental results show that our proposed method is effective on HEIV dataset.

Download Full-text