Learning an Arousal-Valence Speech Front-End Network using Media Data In-the-Wild for Emotion Recognition

Author(s):  
Chih-Chuan Lu ◽  
Jeng-Lin Li ◽  
Chi-Chun Lee
2021 ◽  
pp. 1-1
Author(s):  
Shao-Yen Tseng ◽  
Shrikanth Narayanan ◽  
Panayiotis Georgiou

2021 ◽  
Author(s):  
Yibo Huang ◽  
Hongqian Wen ◽  
Linbo Qing ◽  
Rulong Jin ◽  
Leiming Xiao

Sensors ◽  
2020 ◽  
Vol 20 (4) ◽  
pp. 1087
Author(s):  
Muhammad Naveed Riaz ◽  
Yao Shen ◽  
Muhammad Sohail ◽  
Minyi Guo

Facial expression recognition has been well studied for its great importance in the areas of human–computer interaction and social sciences. With the evolution of deep learning, there have been significant advances in this area that also surpass human-level accuracy. Although these methods have achieved good accuracy, they are still suffering from two constraints (high computational power and memory), which are incredibly critical for small hardware-constrained devices. To alleviate this issue, we propose a new Convolutional Neural Network (CNN) architecture eXnet (Expression Net) based on parallel feature extraction which surpasses current methods in accuracy and contains a much smaller number of parameters (eXnet: 4.57 million, VGG19: 14.72 million), making it more efficient and lightweight for real-time systems. Several modern data augmentation techniques are applied for generalization of eXnet; these techniques improve the accuracy of the network by overcoming the problem of overfitting while containing the same size. We provide an extensive evaluation of our network against key methods on Facial Expression Recognition 2013 (FER-2013), Extended Cohn-Kanade Dataset (CK+), and Real-world Affective Faces Database (RAF-DB) benchmark datasets. We also perform ablation evaluation to show the importance of different components of our architecture. To evaluate the efficiency of eXnet on embedded systems, we deploy it on Raspberry Pi 4B. All these evaluations show the superiority of eXnet for emotion recognition in the wild in terms of accuracy, the number of parameters, and size on disk.


Author(s):  
Meghna Pandharipande ◽  
Rupayan Chakraborty ◽  
Ashish Panda ◽  
Biswajit Das ◽  
Sunil Kumar Kopparapu

2021 ◽  
Vol 15 ◽  
Author(s):  
Emma Hughson ◽  
Roya Javadi ◽  
James Thompson ◽  
Angelica Lim

Even though culture has been found to play some role in negative emotion expression, affective computing research primarily takes on a basic emotion approach when analyzing social signals for automatic emotion recognition technologies. Furthermore, automatic negative emotion recognition systems still train data that originates primarily from North America and contains a majority of Caucasian training samples. As such, the current study aims to address this problem by analyzing what the differences are of the underlying social signals by leveraging machine learning models to classify 3 negative emotions, contempt, anger and disgust (CAD) amongst 3 different cultures: North American, Persian, and Filipino. Using a curated data set compiled from YouTube videos, a support vector machine (SVM) was used to predict negative emotions amongst differing cultures. In addition a one-way ANOVA was used to analyse the differences that exist between each culture group in-terms of level of activation of underlying social signal. Our results not only highlighted the significant differences in the associated social signals that were activated for each culture, but also indicated the specific underlying social signals that differ in our cross-cultural data sets. Furthermore, the automatic classification methods showed North American expressions of CAD to be well-recognized, while Filipino and Persian expressions were recognized at near chance levels.


Author(s):  
Anderson Raymundo Avila ◽  
Zahid Akhtar Momin ◽  
Joao Felipe Santos ◽  
Douglas OShaughnessy ◽  
Tiago Henrique Falk

Sign in / Sign up

Export Citation Format

Share Document