scholarly journals Generalisation and robustness investigation for facial and speech emotion recognition using bio-inspired spiking neural networks

2021 ◽  
Vol 25 (3) ◽  
pp. 1717-1730
Author(s):  
Esma Mansouri-Benssassi ◽  
Juan Ye

AbstractEmotion recognition through facial expression and non-verbal speech represents an important area in affective computing. They have been extensively studied from classical feature extraction techniques to more recent deep learning approaches. However, most of these approaches face two major challenges: (1) robustness—in the face of degradation such as noise, can a model still make correct predictions? and (2) cross-dataset generalisation—when a model is trained on one dataset, can it be used to make inference on another dataset?. To directly address these challenges, we first propose the application of a spiking neural network (SNN) in predicting emotional states based on facial expression and speech data, then investigate, and compare their accuracy when facing data degradation or unseen new input. We evaluate our approach on third-party, publicly available datasets and compare to the state-of-the-art techniques. Our approach demonstrates robustness to noise, where it achieves an accuracy of 56.2% for facial expression recognition (FER) compared to 22.64% and 14.10% for CNN and SVM, respectively, when input images are degraded with the noise intensity of 0.5, and the highest accuracy of 74.3% for speech emotion recognition (SER) compared to 21.95% of CNN and 14.75% for SVM when audio white noise is applied. For generalisation, our approach achieves consistently high accuracy of 89% for FER and 70% for SER in cross-dataset evaluation and suggests that it can learn more effective feature representations, which lead to good generalisation of facial features and vocal characteristics across subjects.

Author(s):  
Vishal P. Tank ◽  
S. K. Hadia

In the last couple of years emotion recognition has proven its significance in the area of artificial intelligence and man machine communication. Emotion recognition can be done using speech and image (facial expression), this paper deals with SER (speech emotion recognition) only. For emotion recognition emotional speech database is essential. In this paper we have proposed emotional database which is developed in Gujarati language, one of the official’s language of India. The proposed speech corpus bifurcate six emotional states as: sadness, surprise, anger, disgust, fear, happiness. To observe effect of different emotions, analysis of proposed Gujarati speech database is carried out using efficient speech parameters like pitch, energy and MFCC using MATLAB Software.


2016 ◽  
Vol 6 (1) ◽  
pp. 13-28 ◽  
Author(s):  
Sheng-Hsiung Su ◽  
Hao-Chiang Koong Lin ◽  
Cheng-Hung Wang ◽  
Zu-Ching Huang

In this paper, the authors are using emotion recognition in two ways: facial expression recognition and emotion recognition from text. Through this dual-mode operation, not only can strength the effects of recognition, but also increase the types of emotion recognition to handle the learning situation smoothly. Through the training of image processing to identify facial expression, the emotion from text is identifying by emotional keywords, syntax, semantics and calculus with logic. The system identify learns' emotions and learning situations by analyzing, choosing the appropriate instructional strategies and curriculum content, and through agents to communicate between user and system, so that learners can get a well learning. This study uses triangular system evaluation methods, observation, questionnaires and interviews. Experimental design to the subjects by the level of awareness on art and non-art to explore the traditional teaching, affective tutoring system and no emotional factors learning course website these three kinds of ways to get results, analysis and evaluate the data.


Author(s):  
Lei Huang ◽  
Fei Xie ◽  
Jing Zhao ◽  
Shibin Shen ◽  
Weiran Guang ◽  
...  

The human emotion recognition based on facial expression has a significant meaning in the application of intelligent man–machine interaction. However, the human face images vary largely in real environments due to the complex backgrounds and luminance. To solve this problem, this paper proposes a robust face detection method based on skin color enhancement model and a facial expression recognition algorithm with block principal component analysis (PCA). First, the luminance range of human face image is broadened and the contrast ratio of skin color is strengthened by the homomorphic filter. Second, the skin color enhancement model is established using YCbCr color space components to locate the face area. Third, the feature based on differential horizontal integral projection is extracted from the face. Finally, the block PCA with deep neural network is used to accomplish the facial expression recognition. The experimental results indicate that in the case of weaker illumination and more complicated backgrounds, both the face detection and facial expression recognition can be achieved effectively by the proposed algorithm, meanwhile the mean recognition rate obtained by the facial expression recognition method is improved by 2.7% comparing with the traditional Local Binary Patterns (LBPs) method.


2012 ◽  
Vol 241-244 ◽  
pp. 1677-1681
Author(s):  
Yu Tai Wang ◽  
Jie Han ◽  
Xiao Qing Jiang ◽  
Jing Zou ◽  
Hui Zhao

The present status of speech emotion recognition was introduced in the paper. The emotional databases of Chinese speech and facial expressions were established with the noise stimulus and movies evoking subjects' emtion. For different emotional states, we analyzed the single-mode speech emotion recognitions based the prosodic features and the geometric features of facial expression. Then, we discussed the bimodal emotion recognition by the use of Gaussian Mixture Model. The experimental results show that, the bimodal emotion recognition rate combined with facial expression is about 6% higher than the single model recognition rate merely using prosodic features.


Author(s):  
Anju Yadav ◽  
Venkatesh Gauri Shankar ◽  
Vivek Kumar Verma

In this chapter, machine learning application on facial expression recognition (FER) is studied for seven emotional states (disgust, joy, surprise, anger, sadness, contempt, and fear) based on FER describing coefficient. FER has many practical importance in various area like social network, robotics, healthcare, etc. Further, a literature review of existing machine learning approaches for FER is discussed, and a novel approach for FER is given for static and dynamic images. Then the results are compared with the other existing approaches. The chapter also covers additional related issues of applications, various challenges, and opportunities in future FER. For security-based face detection systems that can identify an individual, in any form of expression he introduces himself. Doctors will use this system to find the intensity of illness or pain of a deaf and dumb patient. The proposed model is based on machine learning application with three types of prototypes, which are pre-trained model, single layer augmented model, and multi-layered augmented model, having a combined accuracy of approx. 99%.


Facial expression is a standout amongst the most imperative features of human emotion recognition. For demonstrating the emotional states facial expressions are utilized by the people. In any case, recognition of facial expressions has persisted a testing and intriguing issue with regards to PC vision. Recognizing the Micro-Facial expression in video sequence is the main objective of the proposed approach. For efficient recognition, the proposed method utilizes the optimal convolution neural network. Here the proposed method considering the input dataset is the CK+ dataset. At first, by means of Adaptive median filtering preprocessing is performed in the input image. From the preprocessed output, the extracted features are Geometric features, Histogram of Oriented Gradients features and Local binary pattern features. The novelty of the proposed method is, with the help of Modified Lion Optimization (MLO) algorithm, the optimal features are selected from the extracted features. In a shorter computational time it has the benefits of rapidly focalizing and effectively acknowledging with the aim of getting an overall arrangement or idea. Finally the recognition is done by Convolution Neural network (CNN). Then the performance of the proposed MFEOCNN method is analyzed in terms of false measures and recognition accuracy. This kind of emotion recognition is mainly used in medicine, marketing, E-learning, entertainment, law and monitoring. From the simulation, we know that the proposed approach achieves maximum recognition accuracy of 99.2% with minimum Mean Absolute Error (MAE) value. This results are compared with the existing for MicroFacial Expression Based Deep-Rooted Learning (MFEDRL), Convolutional Neural Network with Lion Optimization (CNN+LO) and Convolutional Neural Network (CNN) without optimization. The simulation of the proposed method is done in the working platform of MATLAB.


2016 ◽  
Vol 7 (2) ◽  
pp. 45-61 ◽  
Author(s):  
Ilaria Sergi ◽  
Chiara Fiorentini ◽  
Stéphanie Trznadel ◽  
Klaus R. Scherer

Facial expression research largely relies on forced-choice paradigms that ask observers to choose a label to describe the emotion expressed, assuming a categorical encoding and decoding process. In contrast, appraisal theories of emotion suggest that cognitive appraisal of a situation and the resulting action tendencies determine facial actions in a complex cumulative and sequential process. It is feasible to assume that, in consequence, the expression recognition process is driven by the inference of appraisal configurations that can then be interpreted as discrete emotions. To obtain first evidence with realistic but well-controlled stimuli, theory-guided systematic facial synthesis of action units in avatar faces was used, asking judges to rate 42 combinations of facial actions (action units) on 9 appraisal dimensions. The results support the view that emotion recognition from facial expression is largely mediated by appraisal-action tendency inferences rather than direct categorical judgment. Implications for affective computing are discussed.


Sensors ◽  
2021 ◽  
Vol 21 (9) ◽  
pp. 3046
Author(s):  
Shervin Minaee ◽  
Mehdi Minaei ◽  
Amirali Abdolrashidi

Facial expression recognition has been an active area of research over the past few decades, and it is still challenging due to the high intra-class variation. Traditional approaches for this problem rely on hand-crafted features such as SIFT, HOG, and LBP, followed by a classifier trained on a database of images or videos. Most of these works perform reasonably well on datasets of images captured in a controlled condition but fail to perform as well on more challenging datasets with more image variation and partial faces. In recent years, several works proposed an end-to-end framework for facial expression recognition using deep learning models. Despite the better performance of these works, there are still much room for improvement. In this work, we propose a deep learning approach based on attentional convolutional network that is able to focus on important parts of the face and achieves significant improvement over previous models on multiple datasets, including FER-2013, CK+, FERG, and JAFFE. We also use a visualization technique that is able to find important facial regions to detect different emotions based on the classifier’s output. Through experimental results, we show that different emotions are sensitive to different parts of the face.


2021 ◽  
Vol 11 (4) ◽  
pp. 1428
Author(s):  
Haopeng Wu ◽  
Zhiying Lu ◽  
Jianfeng Zhang ◽  
Xin Li ◽  
Mingyue Zhao ◽  
...  

This paper addresses the problem of Facial Expression Recognition (FER), focusing on unobvious facial movements. Traditional methods often cause overfitting problems or incomplete information due to insufficient data and manual selection of features. Instead, our proposed network, which is called the Multi-features Cooperative Deep Convolutional Network (MC-DCN), maintains focus on the overall feature of the face and the trend of key parts. The processing of video data is the first stage. The method of ensemble of regression trees (ERT) is used to obtain the overall contour of the face. Then, the attention model is used to pick up the parts of face that are more susceptible to expressions. Under the combined effect of these two methods, the image which can be called a local feature map is obtained. After that, the video data are sent to MC-DCN, containing parallel sub-networks. While the overall spatiotemporal characteristics of facial expressions are obtained through the sequence of images, the selection of keys parts can better learn the changes in facial expressions brought about by subtle facial movements. By combining local features and global features, the proposed method can acquire more information, leading to better performance. The experimental results show that MC-DCN can achieve recognition rates of 95%, 78.6% and 78.3% on the three datasets SAVEE, MMI, and edited GEMEP, respectively.


Sensors ◽  
2021 ◽  
Vol 21 (6) ◽  
pp. 2003 ◽  
Author(s):  
Xiaoliang Zhu ◽  
Shihao Ye ◽  
Liang Zhao ◽  
Zhicheng Dai

As a sub-challenge of EmotiW (the Emotion Recognition in the Wild challenge), how to improve performance on the AFEW (Acted Facial Expressions in the wild) dataset is a popular benchmark for emotion recognition tasks with various constraints, including uneven illumination, head deflection, and facial posture. In this paper, we propose a convenient facial expression recognition cascade network comprising spatial feature extraction, hybrid attention, and temporal feature extraction. First, in a video sequence, faces in each frame are detected, and the corresponding face ROI (range of interest) is extracted to obtain the face images. Then, the face images in each frame are aligned based on the position information of the facial feature points in the images. Second, the aligned face images are input to the residual neural network to extract the spatial features of facial expressions corresponding to the face images. The spatial features are input to the hybrid attention module to obtain the fusion features of facial expressions. Finally, the fusion features are input in the gate control loop unit to extract the temporal features of facial expressions. The temporal features are input to the fully connected layer to classify and recognize facial expressions. Experiments using the CK+ (the extended Cohn Kanade), Oulu-CASIA (Institute of Automation, Chinese Academy of Sciences) and AFEW datasets obtained recognition accuracy rates of 98.46%, 87.31%, and 53.44%, respectively. This demonstrated that the proposed method achieves not only competitive performance comparable to state-of-the-art methods but also greater than 2% performance improvement on the AFEW dataset, proving the significant outperformance of facial expression recognition in the natural environment.


Sign in / Sign up

Export Citation Format

Share Document