Learning an Arousal-Valence Speech Front-End Network using Media Data In-the-Wild for Emotion Recognition

Facial expression recognition has been well studied for its great importance in the areas of human–computer interaction and social sciences. With the evolution of deep learning, there have been significant advances in this area that also surpass human-level accuracy. Although these methods have achieved good accuracy, they are still suffering from two constraints (high computational power and memory), which are incredibly critical for small hardware-constrained devices. To alleviate this issue, we propose a new Convolutional Neural Network (CNN) architecture eXnet (Expression Net) based on parallel feature extraction which surpasses current methods in accuracy and contains a much smaller number of parameters (eXnet: 4.57 million, VGG19: 14.72 million), making it more efficient and lightweight for real-time systems. Several modern data augmentation techniques are applied for generalization of eXnet; these techniques improve the accuracy of the network by overcoming the problem of overfitting while containing the same size. We provide an extensive evaluation of our network against key methods on Facial Expression Recognition 2013 (FER-2013), Extended Cohn-Kanade Dataset (CK+), and Real-world Affective Faces Database (RAF-DB) benchmark datasets. We also perform ablation evaluation to show the importance of different components of our architecture. To evaluate the efficiency of eXnet on embedded systems, we deploy it on Raspberry Pi 4B. All these evaluations show the superiority of eXnet for emotion recognition in the wild in terms of accuracy, the number of parameters, and size on disk.

Download Full-text

Front-End Feature Compensation for Noise Robust Speech Emotion Recognition

2019 27th European Signal Processing Conference (EUSIPCO) ◽

10.23919/eusipco.2019.8902981 ◽

2019 ◽

Author(s):

Meghna Pandharipande ◽

Rupayan Chakraborty ◽

Ashish Panda ◽

Biswajit Das ◽

Sunil Kumar Kopparapu

Keyword(s):

Emotion Recognition ◽

Speech Emotion Recognition ◽

Front End ◽

Noise Robust

Download Full-text

Combining modality-specific extreme learning machines for emotion recognition in the wild

Journal on Multimodal User Interfaces ◽

10.1007/s12193-015-0175-6 ◽

2015 ◽

Vol 10 (2) ◽

pp. 139-149 ◽

Cited By ~ 15

Author(s):

Heysem Kaya ◽

Albert Ali Salah

Keyword(s):

Emotion Recognition ◽

Extreme Learning Machines ◽

Learning Machines ◽

In The Wild

Download Full-text

Investigating the Role of Culture on Negative Emotion Expressions in the Wild

Frontiers in Integrative Neuroscience ◽

10.3389/fnint.2021.699667 ◽

2021 ◽

Vol 15 ◽

Author(s):

Emma Hughson ◽

Roya Javadi ◽

James Thompson ◽

Angelica Lim

Keyword(s):

Emotion Recognition ◽

North American ◽

Affective Computing ◽

Negative Emotion ◽

Negative Emotions ◽

Support Vector ◽

Data Sets ◽

Social Signals ◽

Data Set ◽

In The Wild

Even though culture has been found to play some role in negative emotion expression, affective computing research primarily takes on a basic emotion approach when analyzing social signals for automatic emotion recognition technologies. Furthermore, automatic negative emotion recognition systems still train data that originates primarily from North America and contains a majority of Caucasian training samples. As such, the current study aims to address this problem by analyzing what the differences are of the underlying social signals by leveraging machine learning models to classify 3 negative emotions, contempt, anger and disgust (CAD) amongst 3 different cultures: North American, Persian, and Filipino. Using a curated data set compiled from YouTube videos, a support vector machine (SVM) was used to predict negative emotions amongst differing cultures. In addition a one-way ANOVA was used to analyse the differences that exist between each culture group in-terms of level of activation of underlying social signal. Our results not only highlighted the significant differences in the associated social signals that were activated for each culture, but also indicated the specific underlying social signals that differ in our cross-cultural data sets. Furthermore, the automatic classification methods showed North American expressions of CAD to be well-recognized, while Filipino and Persian expressions were recognized at near chance levels.

Download Full-text