A multimodal emotion recognition system from video

Emotion Recognition is attracting the attention of the research community due to the multiple areas where it can be applied, such as in healthcare or in road safety systems. In this paper, we propose a multimodal emotion recognition system that relies on speech and facial information. For the speech-based modality, we evaluated several transfer-learning techniques, more specifically, embedding extraction and Fine-Tuning. The best accuracy results were achieved when we fine-tuned the CNN-14 of the PANNs framework, confirming that the training was more robust when it did not start from scratch and the tasks were similar. Regarding the facial emotion recognizers, we propose a framework that consists of a pre-trained Spatial Transformer Network on saliency maps and facial images followed by a bi-LSTM with an attention mechanism. The error analysis reported that the frame-based systems could present some problems when they were used directly to solve a video-based task despite the domain adaptation, which opens a new line of research to discover new ways to correct this mismatch and take advantage of the embedded knowledge of these pre-trained models. Finally, from the combination of these two modalities with a late fusion strategy, we achieved 80.08% accuracy on the RAVDESS dataset on a subject-wise 5-CV evaluation, classifying eight emotions. The results revealed that these modalities carry relevant information to detect users’ emotional state and their combination enables improvement of system performance.

Download Full-text

A Multimodal Emotion Recognition System Using Facial Landmark Analysis

Iranian Journal of Science and Technology Transactions of Electrical Engineering ◽

10.1007/s40998-018-0142-9 ◽

2018 ◽

Vol 43 (S1) ◽

pp. 171-189 ◽

Cited By ~ 1

Author(s):

Farhad Rahdari ◽

Esmat Rashedi ◽

Mahdi Eftekhari

Keyword(s):

Emotion Recognition ◽

Recognition System ◽

Landmark Analysis ◽

Facial Landmark ◽

Multimodal Emotion Recognition

Download Full-text

Multimodal Emotion Recognition System Using Machine Learning and Psychological Signals: A Review

Advances in Intelligent Systems and Computing - Soft Computing: Theories and Applications ◽

10.1007/978-981-16-1740-9_54 ◽

2021 ◽

pp. 657-666

Author(s):

Rishu ◽

Jaiteg Singh ◽

Rupali Gill

Keyword(s):

Machine Learning ◽

Emotion Recognition ◽

Recognition System ◽

Multimodal Emotion Recognition

Download Full-text

Implementation of A New Speech Negative Emotion Recognition System

Journal of Physics Conference Series ◽

10.1088/1742-6596/1757/1/012021 ◽

2021 ◽

Vol 1757 (1) ◽

pp. 012021

Author(s):

Yuqiong Wang ◽

Zehui Zhao ◽

Zhiwei Huang

Keyword(s):

Emotion Recognition ◽

Negative Emotion ◽

Recognition System

Download Full-text

Emotion Recognition System featuring a fusion of Electrocardiogram and Photoplethysmogram Features

2020 14th International Conference on Open Source Systems and Technologies (ICOSST) ◽

10.1109/icosst51357.2020.9333021 ◽

2020 ◽

Author(s):

Hira Shahid ◽

Aqsa Butt ◽

Sumair Aziz ◽

Muhammad Umar Khan ◽

Syed Zohaib Hassan Naqvi

Keyword(s):

Emotion Recognition ◽

Recognition System

Download Full-text

Multimodal emotion recognition with hierarchical memory networks

Intelligent Data Analysis ◽

10.3233/ida-205183 ◽

2021 ◽

Vol 25 (4) ◽

pp. 1031-1045

Author(s):

Helang Lai ◽

Keke Wu ◽

Lingli Li

Keyword(s):

Emotion Recognition ◽

The Self ◽

Global Memory ◽

Human Computer Interactions ◽

Accuracy Improvement ◽

Local Memory ◽

Hierarchical Memory ◽

Multimodal Emotion Recognition ◽

Novel Model ◽

Computer Interactions

Emotion recognition in conversations is crucial as there is an urgent need to improve the overall experience of human-computer interactions. A promising improvement in this field is to develop a model that can effectively extract adequate contexts of a test utterance. We introduce a novel model, termed hierarchical memory networks (HMN), to address the issues of recognizing utterance level emotions. HMN divides the contexts into different aspects and employs different step lengths to represent the weights of these aspects. To model the self dependencies, HMN takes independent local memory networks to model these aspects. Further, to capture the interpersonal dependencies, HMN employs global memory networks to integrate the local outputs into global storages. Such storages can generate contextual summaries and help to find the emotional dependent utterance that is most relevant to the test utterance. With an attention-based multi-hops scheme, these storages are then merged with the test utterance using an addition operation in the iterations. Experiments on the IEMOCAP dataset show our model outperforms the compared methods with accuracy improvement.

Download Full-text