A Proposal for Multimodal Emotion Recognition Using Aural Transformers and Action Units on RAVDESS Dataset

Cristina Luna-Jiménez; Ricardo Kleinlein; David Griol; Zoraida Callejas; Juan M. Montero; Fernando Fernández-Martínez

doi:10.3390/app12010327

A Proposal for Multimodal Emotion Recognition Using Aural Transformers and Action Units on RAVDESS Dataset

Applied Sciences ◽

10.3390/app12010327 ◽

2021 ◽

Vol 12 (1) ◽

pp. 327

Author(s):

Cristina Luna-Jiménez ◽

Ricardo Kleinlein ◽

David Griol ◽

Zoraida Callejas ◽

Juan M. Montero ◽

...

Keyword(s):

Emotion Recognition ◽

Autonomous Driving ◽

Relevant Information ◽

Fine Tuning ◽

Facial Emotion ◽

Action Units ◽

Learning Techniques ◽

Static Models ◽

Multimodal Emotion Recognition ◽

Sequential Models

Emotion recognition is attracting the attention of the research community due to its multiple applications in different fields, such as medicine or autonomous driving. In this paper, we proposed an automatic emotion recognizer system that consisted of a speech emotion recognizer (SER) and a facial emotion recognizer (FER). For the SER, we evaluated a pre-trained xlsr-Wav2Vec2.0 transformer using two transfer-learning techniques: embedding extraction and fine-tuning. The best accuracy results were achieved when we fine-tuned the whole model by appending a multilayer perceptron on top of it, confirming that the training was more robust when it did not start from scratch and the previous knowledge of the network was similar to the task to adapt. Regarding the facial emotion recognizer, we extracted the Action Units of the videos and compared the performance between employing static models against sequential models. Results showed that sequential models beat static models by a narrow difference. Error analysis reported that the visual systems could improve with a detector of high-emotional load frames, which opened a new line of research to discover new ways to learn from videos. Finally, combining these two modalities with a late fusion strategy, we achieved 86.70% accuracy on the RAVDESS dataset on a subject-wise 5-CV evaluation, classifying eight emotions. Results demonstrated that these modalities carried relevant information to detect users’ emotional state and their combination allowed to improve the final system performance.

Download Full-text

Multimodal Emotion Recognition on RAVDESS Dataset Using Transfer Learning

Sensors ◽

10.3390/s21227665 ◽

2021 ◽

Vol 21 (22) ◽

pp. 7665

Author(s):

Cristina Luna-Jiménez ◽

David Griol ◽

Zoraida Callejas ◽

Ricardo Kleinlein ◽

Juan M. Montero ◽

...

Keyword(s):

Emotion Recognition ◽

Transfer Learning ◽

Domain Adaptation ◽

Relevant Information ◽

Recognition System ◽

Fine Tuning ◽

Saliency Maps ◽

Embedded Knowledge ◽

Multimodal Emotion Recognition ◽

Safety Systems

Emotion Recognition is attracting the attention of the research community due to the multiple areas where it can be applied, such as in healthcare or in road safety systems. In this paper, we propose a multimodal emotion recognition system that relies on speech and facial information. For the speech-based modality, we evaluated several transfer-learning techniques, more specifically, embedding extraction and Fine-Tuning. The best accuracy results were achieved when we fine-tuned the CNN-14 of the PANNs framework, confirming that the training was more robust when it did not start from scratch and the tasks were similar. Regarding the facial emotion recognizers, we propose a framework that consists of a pre-trained Spatial Transformer Network on saliency maps and facial images followed by a bi-LSTM with an attention mechanism. The error analysis reported that the frame-based systems could present some problems when they were used directly to solve a video-based task despite the domain adaptation, which opens a new line of research to discover new ways to correct this mismatch and take advantage of the embedded knowledge of these pre-trained models. Finally, from the combination of these two modalities with a late fusion strategy, we achieved 80.08% accuracy on the RAVDESS dataset on a subject-wise 5-CV evaluation, classifying eight emotions. The results revealed that these modalities carry relevant information to detect users’ emotional state and their combination enables improvement of system performance.

Download Full-text

Multimodal emotion recognition using deep learning techniques

10.5204/thesis.eprints.180753 ◽

2020 ◽

Author(s):

Tien Dung Nguyen

Keyword(s):

Deep Learning ◽

Emotion Recognition ◽

Learning Techniques ◽

Multimodal Emotion Recognition

Download Full-text

Deep Learning Technique-Based Steering of Autonomous Car

International Journal of Computational Intelligence and Applications ◽

10.1142/s1469026818500062 ◽

2018 ◽

Vol 17 (02) ◽

pp. 1850006 ◽

Cited By ~ 1

Author(s):

Yiqin Yang ◽

Zhe Wu ◽

Qingyang Xu ◽

Fabao Yan

Keyword(s):

Neural Network ◽

Deep Learning ◽

Autonomous Driving ◽

Fine Tuning ◽

Raspberry Pi ◽

Camera Module ◽

Learning Techniques ◽

Proposed Model ◽

Autonomous Mode ◽

Two Stages

Deep neural network (DNN) has many advantages. Autonomous driving has become a popular topic now. In this paper, an improved stack autoencoder based on the deep learning techniques is proposed to learn the driving characteristics of an autonomous car. These techniques realize the input data adjustment and solving diffusion gradient problem. A Raspberry Pi and a camera module are mounted on the top of the car. The camera module provides the images needed for training the DNN. There are two stages in the training. In the pre-training process, an improved autoencoder is trained by the unsupervised learning mechanism, and the characterization of the track is extracted. In the fine-tuning stage, the whole network is trained according to the labeled data, and then this model learns the driving characteristics better according to the samples. In the experimental stage, the car will predict the action of the car by the trained model in the autonomous mode. The experiment exhibits the effectiveness of the proposed model. Compared with the traditional neural network, the improved stack autoencoder has a better generalization ability and faster convergence speed.

Download Full-text

Beeinträchtigtes Erkennen emotionaler Gesichtsausdrücke durch psychiatrische Patientinnen bei Alexithymie

Zeitschrift für Psychiatrie Psychologie und Psychotherapie ◽

10.1024/1661-4747/a000135 ◽

2013 ◽

Vol 61 (1) ◽

pp. 7-15 ◽

Cited By ~ 1

Author(s):

Daniel Dittrich ◽

Gregor Domes ◽

Susi Loebel ◽

Christoph Berger ◽

Carsten Spitzer ◽

...

Keyword(s):

Emotion Recognition ◽

Emotional Expression ◽

Depression Scale ◽

Facial Emotion Recognition ◽

Facial Emotion ◽

Check List ◽

Psychosomatische Erkrankungen ◽

Psychiatrische Patienten ◽

Symptom Check List ◽

Tas 20

Die vorliegende Studie untersucht die Hypothese eines mit Alexithymie assoziierten Defizits beim Erkennen emotionaler Gesichtsaudrücke an einer klinischen Population. Darüber hinaus werden Hypothesen zur Bedeutung spezifischer Emotionsqualitäten sowie zu Gender-Unterschieden getestet. 68 ambulante und stationäre psychiatrische Patienten (44 Frauen und 24 Männer) wurden mit der Toronto-Alexithymie-Skala (TAS-20), der Montgomery-Åsberg Depression Scale (MADRS), der Symptom-Check-List (SCL-90-R) und der Emotional Expression Multimorph Task (EEMT) untersucht. Als Stimuli des Gesichtererkennungsparadigmas dienten Gesichtsausdrücke von Basisemotionen nach Ekman und Friesen, die zu Sequenzen mit sich graduell steigernder Ausdrucksstärke angeordnet waren. Mittels multipler Regressionsanalyse untersuchten wir die Assoziation von TAS-20 Punktzahl und facial emotion recognition (FER). Während sich für die Gesamtstichprobe und den männlichen Stichprobenteil kein signifikanter Zusammenhang zwischen TAS-20-Punktzahl und FER zeigte, sahen wir im weiblichen Stichprobenteil durch die TAS-20 Punktzahl eine signifikante Prädiktion der Gesamtfehlerzahl (β = .38, t = 2.055, p < 0.05) und den Fehlern im Erkennen der Emotionen Wut und Ekel (Wut: β = .40, t = 2.240, p < 0.05, Ekel: β = .41, t = 2.214, p < 0.05). Für wütende Gesichter betrug die Varianzaufklärung durch die TAS-20-Punktzahl 13.3 %, für angeekelte Gesichter 19.7 %. Kein Zusammenhang bestand zwischen der Zeit, nach der die Probanden die emotionalen Sequenzen stoppten, um ihre Bewertung abzugeben (Antwortlatenz) und Alexithymie. Die Ergebnisse der Arbeit unterstützen das Vorliegen eines mit Alexithymie assoziierten Defizits im Erkennen emotionaler Gesichtsausdrücke bei weiblchen Probanden in einer heterogenen, klinischen Stichprobe. Dieses Defizit könnte die Schwierigkeiten Hochalexithymer im Bereich sozialer Interaktionen zumindest teilweise begründen und so eine Prädisposition für psychische sowie psychosomatische Erkrankungen erklären.

Download Full-text

Facial Emotion Recognition Difficulties in Individuals With PTSD Symptoms

PsycEXTRA Dataset ◽

10.1037/e517292011-132 ◽

2009 ◽

Author(s):

Mark Gapen

Keyword(s):

Emotion Recognition ◽

Facial Emotion Recognition ◽

Facial Emotion ◽

Ptsd Symptoms

Download Full-text

Supplemental Material for Music to My Ears: Age-Related Decline in Musical and Facial Emotion Recognition

Psychology and Aging ◽

10.1037/pag0000203.supp ◽

2017 ◽

Keyword(s):

Emotion Recognition ◽

Facial Emotion Recognition ◽

Facial Emotion ◽

Age Related

Download Full-text

Music to my ears: Age-related decline in musical and facial emotion recognition.

Psychology and Aging ◽

10.1037/pag0000203 ◽

2017 ◽

Vol 32 (8) ◽

pp. 698-709 ◽

Cited By ~ 6

Author(s):

Ryan Sutcliffe ◽

Peter G. Rendell ◽

Julie D. Henry ◽

Phoebe E. Bailey ◽

Ted Ruffman

Keyword(s):

Emotion Recognition ◽

Facial Emotion Recognition ◽

Facial Emotion ◽

Age Related

Download Full-text

Task characteristics influence facial emotion recognition age-effects: A meta-analytic review.

Psychology and Aging ◽

10.1037/pag0000441 ◽

2020 ◽

Vol 35 (2) ◽

pp. 295-315 ◽

Cited By ~ 6

Author(s):

Grace S. Hayes ◽

Skye N. McLennan ◽

Julie D. Henry ◽

Louise H. Phillips ◽

Gill Terrett ◽

...

Keyword(s):

Emotion Recognition ◽

Age Effects ◽

Facial Emotion Recognition ◽

Facial Emotion ◽

Task Characteristics ◽

Analytic Review

Download Full-text

Supplemental Material for Confidence in Facial Emotion Recognition in Borderline Personality Disorder

Personality Disorders Theory Research and Treatment ◽

10.1037/per0000142.supp ◽

2015 ◽

Keyword(s):

Borderline Personality Disorder ◽

Personality Disorder ◽

Emotion Recognition ◽

Borderline Personality ◽

Facial Emotion Recognition ◽

Facial Emotion

Download Full-text

Social Exclusion and Facial Emotion Recognition in Social Anxiety

The Korean Journal of Clinical Psychology ◽

10.15842/kjcp.2018.37.1.002 ◽

2018 ◽

Vol 37 (1) ◽

pp. 18-30 ◽

Cited By ~ 1

Author(s):

Jisu Choi ◽

Jae-Won Yang

Keyword(s):

Social Anxiety ◽

Emotion Recognition ◽

Social Exclusion ◽

Facial Emotion Recognition ◽

Facial Emotion

Download Full-text