scholarly journals Multimodal Emotion Recognition on RAVDESS Dataset Using Transfer Learning

Sensors ◽  
2021 ◽  
Vol 21 (22) ◽  
pp. 7665
Author(s):  
Cristina Luna-Jiménez ◽  
David Griol ◽  
Zoraida Callejas ◽  
Ricardo Kleinlein ◽  
Juan M. Montero ◽  
...  

Emotion Recognition is attracting the attention of the research community due to the multiple areas where it can be applied, such as in healthcare or in road safety systems. In this paper, we propose a multimodal emotion recognition system that relies on speech and facial information. For the speech-based modality, we evaluated several transfer-learning techniques, more specifically, embedding extraction and Fine-Tuning. The best accuracy results were achieved when we fine-tuned the CNN-14 of the PANNs framework, confirming that the training was more robust when it did not start from scratch and the tasks were similar. Regarding the facial emotion recognizers, we propose a framework that consists of a pre-trained Spatial Transformer Network on saliency maps and facial images followed by a bi-LSTM with an attention mechanism. The error analysis reported that the frame-based systems could present some problems when they were used directly to solve a video-based task despite the domain adaptation, which opens a new line of research to discover new ways to correct this mismatch and take advantage of the embedded knowledge of these pre-trained models. Finally, from the combination of these two modalities with a late fusion strategy, we achieved 80.08% accuracy on the RAVDESS dataset on a subject-wise 5-CV evaluation, classifying eight emotions. The results revealed that these modalities carry relevant information to detect users’ emotional state and their combination enables improvement of system performance.

2021 ◽  
Vol 12 (1) ◽  
pp. 327
Author(s):  
Cristina Luna-Jiménez ◽  
Ricardo Kleinlein ◽  
David Griol ◽  
Zoraida Callejas ◽  
Juan M. Montero ◽  
...  

Emotion recognition is attracting the attention of the research community due to its multiple applications in different fields, such as medicine or autonomous driving. In this paper, we proposed an automatic emotion recognizer system that consisted of a speech emotion recognizer (SER) and a facial emotion recognizer (FER). For the SER, we evaluated a pre-trained xlsr-Wav2Vec2.0 transformer using two transfer-learning techniques: embedding extraction and fine-tuning. The best accuracy results were achieved when we fine-tuned the whole model by appending a multilayer perceptron on top of it, confirming that the training was more robust when it did not start from scratch and the previous knowledge of the network was similar to the task to adapt. Regarding the facial emotion recognizer, we extracted the Action Units of the videos and compared the performance between employing static models against sequential models. Results showed that sequential models beat static models by a narrow difference. Error analysis reported that the visual systems could improve with a detector of high-emotional load frames, which opened a new line of research to discover new ways to learn from videos. Finally, combining these two modalities with a late fusion strategy, we achieved 86.70% accuracy on the RAVDESS dataset on a subject-wise 5-CV evaluation, classifying eight emotions. Results demonstrated that these modalities carried relevant information to detect users’ emotional state and their combination allowed to improve the final system performance.


Author(s):  
Wen Xu ◽  
Jing He ◽  
Yanfeng Shu

Transfer learning is an emerging technique in machine learning, by which we can solve a new task with the knowledge obtained from an old task in order to address the lack of labeled data. In particular deep domain adaptation (a branch of transfer learning) gets the most attention in recently published articles. The intuition behind this is that deep neural networks usually have a large capacity to learn representation from one dataset and part of the information can be further used for a new task. In this research, we firstly present the complete scenarios of transfer learning according to the domains and tasks. Secondly, we conduct a comprehensive survey related to deep domain adaptation and categorize the recent advances into three types based on implementing approaches: fine-tuning networks, adversarial domain adaptation, and sample-reconstruction approaches. Thirdly, we discuss the details of these methods and introduce some typical real-world applications. Finally, we conclude our work and explore some potential issues to be further addressed.


2020 ◽  
Vol 27 (4) ◽  
pp. 584-591 ◽  
Author(s):  
Chen Lin ◽  
Steven Bethard ◽  
Dmitriy Dligach ◽  
Farig Sadeque ◽  
Guergana Savova ◽  
...  

Abstract Introduction Classifying whether concepts in an unstructured clinical text are negated is an important unsolved task. New domain adaptation and transfer learning methods can potentially address this issue. Objective We examine neural unsupervised domain adaptation methods, introducing a novel combination of domain adaptation with transformer-based transfer learning methods to improve negation detection. We also want to better understand the interaction between the widely used bidirectional encoder representations from transformers (BERT) system and domain adaptation methods. Materials and Methods We use 4 clinical text datasets that are annotated with negation status. We evaluate a neural unsupervised domain adaptation algorithm and BERT, a transformer-based model that is pretrained on massive general text datasets. We develop an extension to BERT that uses domain adversarial training, a neural domain adaptation method that adds an objective to the negation task, that the classifier should not be able to distinguish between instances from 2 different domains. Results The domain adaptation methods we describe show positive results, but, on average, the best performance is obtained by plain BERT (without the extension). We provide evidence that the gains from BERT are likely not additive with the gains from domain adaptation. Discussion Our results suggest that, at least for the task of clinical negation detection, BERT subsumes domain adaptation, implying that BERT is already learning very general representations of negation phenomena such that fine-tuning even on a specific corpus does not lead to much overfitting. Conclusion Despite being trained on nonclinical text, the large training sets of models like BERT lead to large gains in performance for the clinical negation detection task.


Author(s):  
V. J. Aiswaryadevi ◽  
G. Priyanka ◽  
S. Sathya Bama ◽  
S. Kiruthika ◽  
S. Soundarya ◽  
...  

This research is aimed to achieve high-precision accuracy and for face recognition system. Convolution Neural Network is one of the Deep Learning approaches and has demonstrated excellent performance in many fields, including image recognition of a large amount of training data (such as ImageNet). In fact, hardware limitations and insufficient training data-sets are the challenges of getting high performance. Therefore, in this work the Deep Transfer Learning method using AlexNet pre-trained CNN is proposed to improve the performance of the face-recognition system even for a smaller number of images. The transfer learning method is used to fine-tuning on the last layer of AlexNet CNN model for new classification tasks. The data augmentation (DA) technique also proposed to minimize the over-fitting problem during Deep transfer learning training and to improve accuracy. The results proved the improvement in over-fitting and in performance after using the data augmentation technique. All the experiments were tested on UTeMFD, GTFD, and CASIA-Face V5 small data-sets. As a result, the proposed system achieved a high accuracy as 100% on UTeMFD, 96.67% on GTFD, and 95.60% on CASIA-Face V5 in less than 0.05 seconds of recognition time.


Sign in / Sign up

Export Citation Format

Share Document