scholarly journals Improving the Accuracy of Automatic Facial Expression Recognition in Speaking Subjects with Deep Learning

2020 ◽  
Vol 10 (11) ◽  
pp. 4002
Author(s):  
Sathya Bursic ◽  
Giuseppe Boccignone ◽  
Alfio Ferrara ◽  
Alessandro D’Amelio ◽  
Raffaella Lanzarotti

When automatic facial expression recognition is applied to video sequences of speaking subjects, the recognition accuracy has been noted to be lower than with video sequences of still subjects. This effect known as the speaking effect arises during spontaneous conversations, and along with the affective expressions the speech articulation process influences facial configurations. In this work we question whether, aside from facial features, other cues relating to the articulation process would increase emotion recognition accuracy when added in input to a deep neural network model. We develop two neural networks that classify facial expressions in speaking subjects from the RAVDESS dataset, a spatio-temporal CNN and a GRU cell RNN. They are first trained on facial features only, and afterwards both on facial features and articulation related cues extracted from a model trained for lip reading, while varying the number of consecutive frames provided in input as well. We show that using DNNs the addition of features related to articulation increases classification accuracy up to 12%, the increase being greater with more consecutive frames provided in input to the model.

Author(s):  
Yi Ji ◽  
Khalid Idrissi

This paper proposes an automatic facial expression recognition system, which uses new methods in both face detection and feature extraction. In this system, considering that facial expressions are related to a small set of muscles and limited ranges of motions, the facial expressions are recognized by these changes in video sequences. First, the differences between neutral and emotional states are detected. Faces can be automatically located from changing facial organs. Then, LBP features are applied and AdaBoost is used to find the most important features for each expression on essential facial parts. At last, SVM with polynomial kernel is used to classify expressions. The method is evaluated on JAFFE and MMI databases. The performances are better than other automatic or manual annotated systems.


2019 ◽  
Author(s):  
Sofia Volynets ◽  
Dmitry Smirnov ◽  
Heini Saarimäki ◽  
Lauri Nummenmaa

AbstractHuman neuroimaging and behavioural studies suggest that somatomotor “mirroring” of seen facial expressions may support their recognition. Here we show that viewing specific facial expressions triggers the representation corresponding to that expression in the observer’s brain. Twelve healthy female volunteers underwent two separate fMRI sessions: one where they observed and another where they displayed three types of basic facial expressions (joy, anger and disgust). Pattern classifier based on Bayesian logistic regression was trained to classify facial expressions i) within modality (trained and tested with data recorded while observing or displaying expressions) and ii) between modalities (trained with data recorded while displaying expressions and tested with data recorded while observing the expressions). Cross-modal classification was performed in two ways: with and without functional realignment of the data across observing / displaying conditions. All expressions could be accurately classified within and also across modalities. Brain regions contributing most to cross-modal classification accuracy included primary motor and somatosensory cortices. Functional realignment led to only minor increases in cross-modal classification accuracy for most of the examined ROIs. Substantial improvement was observed in the occipito-ventral components of the core system for facial expression recognition. Altogether these results support the embodied emotion recognition model and show that expression-specific somatomotor neural signatures could support facial expression recognition.


2020 ◽  
Vol 8 (2) ◽  
pp. 68-84
Author(s):  
Naoki Imamura ◽  
Hiroki Nomiya ◽  
Teruhisa Hochin

Facial expression intensity has been proposed to digitize the degree of facial expressions in order to retrieve impressive scenes from lifelog videos. The intensity is calculated based on the correlation of facial features compared to each facial expression. However, the correlation is not determined objectively. It should be determined statistically based on the contribution score of the facial features necessary for expression recognition. Therefore, the proposed method recognizes facial expressions by using a neural network and calculates the contribution score of input toward the output. First, the authors improve some facial features. After that, they verify the score correctly by comparing the accuracy transitions depending on reducing useful and useless features and process the score statistically. As a result, they extract useful facial features from the neural network.


2020 ◽  
Vol 15 (8) ◽  
pp. 803-813
Author(s):  
Sofia Volynets ◽  
Dmitry Smirnov ◽  
Heini Saarimäki ◽  
Lauri Nummenmaa

Abstract Human neuroimaging and behavioural studies suggest that somatomotor ‘mirroring’ of seen facial expressions may support their recognition. Here we show that viewing specific facial expressions triggers the representation corresponding to that expression in the observer’s brain. Twelve healthy female volunteers underwent two separate fMRI sessions: one where they observed and another where they displayed three types of facial expressions (joy, anger and disgust). Pattern classifier based on Bayesian logistic regression was trained to classify facial expressions (i) within modality (trained and tested with data recorded while observing or displaying expressions) and (ii) between modalities (trained with data recorded while displaying expressions and tested with data recorded while observing the expressions). Cross-modal classification was performed in two ways: with and without functional realignment of the data across observing/displaying conditions. All expressions could be accurately classified within and also across modalities. Brain regions contributing most to cross-modal classification accuracy included primary motor and somatosensory cortices. Functional realignment led to only minor increases in cross-modal classification accuracy for most of the examined ROIs. Substantial improvement was observed in the occipito-ventral components of the core system for facial expression recognition. Altogether these results support the embodied emotion recognition model and show that expression-specific somatomotor neural signatures could support facial expression recognition.


2012 ◽  
Vol 110 (1) ◽  
pp. 338-350 ◽  
Author(s):  
Mariano Chóliz ◽  
Enrique G. Fernández-Abascal

Recognition of emotional facial expressions is a central area in the psychology of emotion. This study presents two experiments. The first experiment analyzed recognition accuracy for basic emotions including happiness, anger, fear, sadness, surprise, and disgust. 30 pictures (5 for each emotion) were displayed to 96 participants to assess recognition accuracy. The results showed that recognition accuracy varied significantly across emotions. The second experiment analyzed the effects of contextual information on recognition accuracy. Information congruent and not congruent with a facial expression was displayed before presenting pictures of facial expressions. The results of the second experiment showed that congruent information improved facial expression recognition, whereas incongruent information impaired such recognition.


Complexity ◽  
2020 ◽  
Vol 2020 ◽  
pp. 1-10
Author(s):  
Qing Lin ◽  
Ruili He ◽  
Peihe Jiang

State-of-the-art facial expression methods outperform human beings, especially, thanks to the success of convolutional neural networks (CNNs). However, most of the existing works focus mainly on analyzing an adult’s face and ignore the important problems: how can we recognize facial expression from a baby’s face image and how difficult is it? In this paper, we first introduce a new face image database, named BabyExp, which contains 12,000 images from babies younger than two years old, and each image is with one of three facial expressions (i.e., happy, sad, and normal). To the best of our knowledge, the proposed dataset is the first baby face dataset for analyzing a baby’s face image, which is complementary to the existing adult face datasets and can shed some light on exploring baby face analysis. We also propose a feature guided CNN method with a new loss function, called distance loss, to optimize interclass distance. In order to facilitate further research, we provide the benchmark of expression recognition on the BabyExp dataset. Experimental results show that the proposed network achieves the recognition accuracy of 87.90% on BabyExp.


2021 ◽  
Vol 11 (4) ◽  
pp. 1428
Author(s):  
Haopeng Wu ◽  
Zhiying Lu ◽  
Jianfeng Zhang ◽  
Xin Li ◽  
Mingyue Zhao ◽  
...  

This paper addresses the problem of Facial Expression Recognition (FER), focusing on unobvious facial movements. Traditional methods often cause overfitting problems or incomplete information due to insufficient data and manual selection of features. Instead, our proposed network, which is called the Multi-features Cooperative Deep Convolutional Network (MC-DCN), maintains focus on the overall feature of the face and the trend of key parts. The processing of video data is the first stage. The method of ensemble of regression trees (ERT) is used to obtain the overall contour of the face. Then, the attention model is used to pick up the parts of face that are more susceptible to expressions. Under the combined effect of these two methods, the image which can be called a local feature map is obtained. After that, the video data are sent to MC-DCN, containing parallel sub-networks. While the overall spatiotemporal characteristics of facial expressions are obtained through the sequence of images, the selection of keys parts can better learn the changes in facial expressions brought about by subtle facial movements. By combining local features and global features, the proposed method can acquire more information, leading to better performance. The experimental results show that MC-DCN can achieve recognition rates of 95%, 78.6% and 78.3% on the three datasets SAVEE, MMI, and edited GEMEP, respectively.


Sensors ◽  
2021 ◽  
Vol 21 (6) ◽  
pp. 2003 ◽  
Author(s):  
Xiaoliang Zhu ◽  
Shihao Ye ◽  
Liang Zhao ◽  
Zhicheng Dai

As a sub-challenge of EmotiW (the Emotion Recognition in the Wild challenge), how to improve performance on the AFEW (Acted Facial Expressions in the wild) dataset is a popular benchmark for emotion recognition tasks with various constraints, including uneven illumination, head deflection, and facial posture. In this paper, we propose a convenient facial expression recognition cascade network comprising spatial feature extraction, hybrid attention, and temporal feature extraction. First, in a video sequence, faces in each frame are detected, and the corresponding face ROI (range of interest) is extracted to obtain the face images. Then, the face images in each frame are aligned based on the position information of the facial feature points in the images. Second, the aligned face images are input to the residual neural network to extract the spatial features of facial expressions corresponding to the face images. The spatial features are input to the hybrid attention module to obtain the fusion features of facial expressions. Finally, the fusion features are input in the gate control loop unit to extract the temporal features of facial expressions. The temporal features are input to the fully connected layer to classify and recognize facial expressions. Experiments using the CK+ (the extended Cohn Kanade), Oulu-CASIA (Institute of Automation, Chinese Academy of Sciences) and AFEW datasets obtained recognition accuracy rates of 98.46%, 87.31%, and 53.44%, respectively. This demonstrated that the proposed method achieves not only competitive performance comparable to state-of-the-art methods but also greater than 2% performance improvement on the AFEW dataset, proving the significant outperformance of facial expression recognition in the natural environment.


Sign in / Sign up

Export Citation Format

Share Document