Multimodal Emotion Recognition on RAVDESS Dataset Using Transfer Learning

Emotion Recognition is attracting the attention of the research community due to the multiple areas where it can be applied, such as in healthcare or in road safety systems. In this paper, we propose a multimodal emotion recognition system that relies on speech and facial information. For the speech-based modality, we evaluated several transfer-learning techniques, more specifically, embedding extraction and Fine-Tuning. The best accuracy results were achieved when we fine-tuned the CNN-14 of the PANNs framework, confirming that the training was more robust when it did not start from scratch and the tasks were similar. Regarding the facial emotion recognizers, we propose a framework that consists of a pre-trained Spatial Transformer Network on saliency maps and facial images followed by a bi-LSTM with an attention mechanism. The error analysis reported that the frame-based systems could present some problems when they were used directly to solve a video-based task despite the domain adaptation, which opens a new line of research to discover new ways to correct this mismatch and take advantage of the embedded knowledge of these pre-trained models. Finally, from the combination of these two modalities with a late fusion strategy, we achieved 80.08% accuracy on the RAVDESS dataset on a subject-wise 5-CV evaluation, classifying eight emotions. The results revealed that these modalities carry relevant information to detect users’ emotional state and their combination enables improvement of system performance.

Download Full-text

A Proposal for Multimodal Emotion Recognition Using Aural Transformers and Action Units on RAVDESS Dataset

Applied Sciences ◽

10.3390/app12010327 ◽

2021 ◽

Vol 12 (1) ◽

pp. 327

Author(s):

Cristina Luna-Jiménez ◽

Ricardo Kleinlein ◽

David Griol ◽

Zoraida Callejas ◽

Juan M. Montero ◽

...

Keyword(s):

Emotion Recognition ◽

Autonomous Driving ◽

Relevant Information ◽

Fine Tuning ◽

Facial Emotion ◽

Action Units ◽

Learning Techniques ◽

Static Models ◽

Multimodal Emotion Recognition ◽

Sequential Models

Emotion recognition is attracting the attention of the research community due to its multiple applications in different fields, such as medicine or autonomous driving. In this paper, we proposed an automatic emotion recognizer system that consisted of a speech emotion recognizer (SER) and a facial emotion recognizer (FER). For the SER, we evaluated a pre-trained xlsr-Wav2Vec2.0 transformer using two transfer-learning techniques: embedding extraction and fine-tuning. The best accuracy results were achieved when we fine-tuned the whole model by appending a multilayer perceptron on top of it, confirming that the training was more robust when it did not start from scratch and the previous knowledge of the network was similar to the task to adapt. Regarding the facial emotion recognizer, we extracted the Action Units of the videos and compared the performance between employing static models against sequential models. Results showed that sequential models beat static models by a narrow difference. Error analysis reported that the visual systems could improve with a detector of high-emotional load frames, which opened a new line of research to discover new ways to learn from videos. Finally, combining these two modalities with a late fusion strategy, we achieved 86.70% accuracy on the RAVDESS dataset on a subject-wise 5-CV evaluation, classifying eight emotions. Results demonstrated that these modalities carried relevant information to detect users’ emotional state and their combination allowed to improve the final system performance.

Download Full-text

Audio-Visual Emotion Recognition System for Variable Length Spatio-Temporal Samples Using Deep Transfer-Learning

Business Information Systems - Lecture Notes in Business Information Processing ◽

10.1007/978-3-030-53337-3_32 ◽

2020 ◽

pp. 434-446

Author(s):

Antonio Cano Montes ◽

Luis A. Hernández Gómez

Keyword(s):

Emotion Recognition ◽

Transfer Learning ◽

Recognition System ◽

Variable Length ◽

Spatio Temporal

Download Full-text

A multimodal emotion recognition system from video

2016 International Conference on Circuit, Power and Computing Technologies (ICCPCT) ◽

10.1109/iccpct.2016.7530161 ◽

2016 ◽

Cited By ~ 5

Author(s):

S Thushara ◽

S Veni

Keyword(s):

Emotion Recognition ◽

Recognition System ◽

Multimodal Emotion Recognition

Download Full-text

Transfer Learning and Deep Domain Adaptation

Advances and Applications in Deep Learning ◽

10.5772/intechopen.94072 ◽

2020 ◽

Author(s):

Wen Xu ◽

Jing He ◽

Yanfeng Shu

Keyword(s):

Machine Learning ◽

Neural Networks ◽

Transfer Learning ◽

Real World ◽

Deep Neural Networks ◽

Domain Adaptation ◽

Fine Tuning ◽

Real World Applications ◽

Comprehensive Survey ◽

Sample Reconstruction

Transfer learning is an emerging technique in machine learning, by which we can solve a new task with the knowledge obtained from an old task in order to address the lack of labeled data. In particular deep domain adaptation (a branch of transfer learning) gets the most attention in recently published articles. The intuition behind this is that deep neural networks usually have a large capacity to learn representation from one dataset and part of the information can be further used for a new task. In this research, we firstly present the complete scenarios of transfer learning according to the domains and tasks. Secondly, we conduct a comprehensive survey related to deep domain adaptation and categorize the recent advances into three types based on implementing approaches: fine-tuning networks, adversarial domain adaptation, and sample-reconstruction approaches. Thirdly, we discuss the details of these methods and introduce some typical real-world applications. Finally, we conclude our work and explore some potential issues to be further addressed.

Download Full-text

Does BERT need domain adaptation for clinical negation detection?

Journal of the American Medical Informatics Association ◽

10.1093/jamia/ocaa001 ◽

2020 ◽

Vol 27 (4) ◽

pp. 584-591 ◽

Cited By ~ 2

Author(s):

Chen Lin ◽

Steven Bethard ◽

Dmitriy Dligach ◽

Farig Sadeque ◽

Guergana Savova ◽

...

Keyword(s):

Transfer Learning ◽

Domain Adaptation ◽

Fine Tuning ◽

Adaptation Algorithm ◽

Learning Methods ◽

Clinical Text ◽

Unsupervised Domain Adaptation ◽

Adversarial Training ◽

Negation Detection ◽

Adaptation Method

Abstract Introduction Classifying whether concepts in an unstructured clinical text are negated is an important unsolved task. New domain adaptation and transfer learning methods can potentially address this issue. Objective We examine neural unsupervised domain adaptation methods, introducing a novel combination of domain adaptation with transformer-based transfer learning methods to improve negation detection. We also want to better understand the interaction between the widely used bidirectional encoder representations from transformers (BERT) system and domain adaptation methods. Materials and Methods We use 4 clinical text datasets that are annotated with negation status. We evaluate a neural unsupervised domain adaptation algorithm and BERT, a transformer-based model that is pretrained on massive general text datasets. We develop an extension to BERT that uses domain adversarial training, a neural domain adaptation method that adds an objective to the negation task, that the classifier should not be able to distinguish between instances from 2 different domains. Results The domain adaptation methods we describe show positive results, but, on average, the best performance is obtained by plain BERT (without the extension). We provide evidence that the gains from BERT are likely not additive with the gains from domain adaptation. Discussion Our results suggest that, at least for the task of clinical negation detection, BERT subsumes domain adaptation, implying that BERT is already learning very general representations of negation phenomena such that fine-tuning even on a specific corpus does not lead to much overfitting. Conclusion Despite being trained on nonclinical text, the large training sets of models like BERT lead to large gains in performance for the clinical negation detection task.

Download Full-text

Real-time multimodal emotion recognition system based on elderly accompanying robot

Journal of Physics Conference Series ◽

10.1088/1742-6596/1453/1/012093 ◽

2020 ◽

Vol 1453 ◽

pp. 012093

Author(s):

Shaosong Dou ◽

Zhiquan Feng ◽

Xiaohui Yang ◽

Jinglan Tian

Keyword(s):

Emotion Recognition ◽

Real Time ◽

Recognition System ◽

Multimodal Emotion Recognition

Download Full-text

Multimodal Emotion Recognition Using Transfer Learning on Audio and Text Data

10.1007/978-3-030-86970-0_39 ◽

2021 ◽

pp. 552-563

Author(s):

James J. Deng ◽

Clement H. C. Leung ◽

Yuanxi Li

Keyword(s):

Emotion Recognition ◽

Transfer Learning ◽

Text Data ◽

Multimodal Emotion Recognition

Download Full-text

Smart IoT Multimodal Emotion Recognition System Using Deep Learning Networks

Studies in Big Data - Artificial Intelligence and IoT ◽

10.1007/978-981-33-6400-4_1 ◽

2021 ◽

pp. 3-19

Author(s):

V. J. Aiswaryadevi ◽

G. Priyanka ◽

S. Sathya Bama ◽

S. Kiruthika ◽

S. Soundarya ◽

...

Keyword(s):

Deep Learning ◽

Emotion Recognition ◽

Recognition System ◽

Learning Networks ◽

Multimodal Emotion Recognition

Download Full-text

Real Time Multimodal Emotion Recognition System using Facial Landmarks and Hand over Face Gestures

International Journal of Machine Learning and Computing ◽

10.18178/ijmlc.2017.7.2.615 ◽

2017 ◽

Vol 7 (2) ◽

pp. 30-34 ◽

Cited By ~ 3

Author(s):

Mahesh Krishnananda Prabhu Prabhu ◽

◽

Dinesh Babu Jayagopi ◽

Keyword(s):

Emotion Recognition ◽

Real Time ◽

Recognition System ◽

Facial Landmarks ◽

Multimodal Emotion Recognition

Download Full-text

Transfer learning using AlexNet Convolutional Neural Network for Face Recognition

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.k7776.0991120 ◽

2020 ◽

Vol 9 (11) ◽

pp. 285-294

Keyword(s):

Neural Network ◽

Face Recognition ◽

Transfer Learning ◽

Data Augmentation ◽

Recognition System ◽

Training Data ◽

Fine Tuning ◽

Data Sets ◽

Learning Method ◽

Face Recognition System

This research is aimed to achieve high-precision accuracy and for face recognition system. Convolution Neural Network is one of the Deep Learning approaches and has demonstrated excellent performance in many fields, including image recognition of a large amount of training data (such as ImageNet). In fact, hardware limitations and insufficient training data-sets are the challenges of getting high performance. Therefore, in this work the Deep Transfer Learning method using AlexNet pre-trained CNN is proposed to improve the performance of the face-recognition system even for a smaller number of images. The transfer learning method is used to fine-tuning on the last layer of AlexNet CNN model for new classification tasks. The data augmentation (DA) technique also proposed to minimize the over-fitting problem during Deep transfer learning training and to improve accuracy. The results proved the improvement in over-fitting and in performance after using the data augmentation technique. All the experiments were tested on UTeMFD, GTFD, and CASIA-Face V5 small data-sets. As a result, the proposed system achieved a high accuracy as 100% on UTeMFD, 96.67% on GTFD, and 95.60% on CASIA-Face V5 in less than 0.05 seconds of recognition time.

Download Full-text