scholarly journals Audio-Visual Emotion Recognition System Using Multi-Modal Features

Author(s):  
Anand Handa ◽  
Rashi Agarwal ◽  
Narendra Kohli

Due to the highly variant face geometry and appearances, Facial Expression Recognition (FER) is still a challenging problem. CNN can characterize 2-D signals. Therefore, for emotion recognition in a video, the authors propose a feature selection model in AlexNet architecture to extract and filter facial features automatically. Similarly, for emotion recognition in audio, the authors use a deep LSTM-RNN. Finally, they propose a probabilistic model for the fusion of audio and visual models using facial features and speech of a subject. The model combines all the extracted features and use them to train the linear SVM (Support Vector Machine) classifiers. The proposed model outperforms the other existing models and achieves state-of-the-art performance for audio, visual and fusion models. The model classifies the seven known facial expressions, namely anger, happy, surprise, fear, disgust, sad, and neutral on the eNTERFACE’05 dataset with an overall accuracy of 76.61%.

Due to the highly variant face geometry and appearances, Facial Expression Recognition (FER) is still a challenging problem. CNN can characterize 2-D signals. Therefore, for emotion recognition in a video, the authors propose a feature selection model in AlexNet architecture to extract and filter facial features automatically. Similarly, for emotion recognition in audio, the authors use a deep LSTM-RNN. Finally, they propose a probabilistic model for the fusion of audio and visual models using facial features and speech of a subject. The model combines all the extracted features and use them to train the linear SVM (Support Vector Machine) classifiers. The proposed model outperforms the other existing models and achieves state-of-the-art performance for audio, visual and fusion models. The model classifies the seven known facial expressions, namely anger, happy, surprise, fear, disgust, sad, and neutral on the eNTERFACE’05 dataset with an overall accuracy of 76.61%.


2019 ◽  
Vol 8 (4) ◽  
pp. 3570-3574

The facial expression recognition system is playing vital role in many organizations, institutes, shopping malls to know about their stakeholders’ need and mind set. It comes under the broad category of computer vision. Facial expression can easily explain the true intention of a person without any kind of conversation. The main objective of this work is to improve the performance of facial expression recognition in the benchmark datasets like CK+, JAFFE. In order to achieve the needed accuracy metrics, the convolution neural network was constructed to extract the facial expression features automatically and combined with the handcrafted features extracted using Histogram of Gradients (HoG) and Local Binary Pattern (LBP) methods. Linear Support Vector Machine (SVM) is built to predict the emotions using the combined features. The proposed method produces promising results as compared to the recent work in [1].This is mainly needed in the working environment, shopping malls and other public places to effectively understand the likeliness of the stakeholders at that moment.


2021 ◽  
Author(s):  
Erkang Fu ◽  
Xi Li ◽  
Zhi Yao ◽  
Yuxin Ren ◽  
Yuanhao Wu ◽  
...  

Abstract In recent years, the Internet of vehicles (IOV) with intelligent networked automobiles as the terminal node has gradually become the development trend of the automotive industry and the research hot spot in related fields. This is due to its characteristics of intelligence, networking, low-carbon and energy saving. Real time emotion recognition for drivers and pedestrians in the community can be utilized to prevent fatigue driving and malicious collision, keep safety verification and pedestrian safety detection. This paper mainly studies the face emotion recognition model that can be utilized for IOV. Considering the fluctuation of image acquisition perspective and image quality in the application scene of IOV, the natural scene video similar to vehicle environment and its galvanic skin response (GSR) are utilized to make the testing set of emotion recognition. Then an expression recognition model combining codec and Support Vector Machine (SVM) classifier is proposed. Finally, emotion recognition testing is completed on the basis of Algorithm 1. The matching accuracy between the emotion recognition model and GSR is 82.01%. In the process of model testing, 189 effective videos are involved and 155 are correctly identified.


Author(s):  
Sourabh Suke ◽  
Ganesh Regulwar ◽  
Nikesh Aote ◽  
Pratik Chaudhari ◽  
Rajat Ghatode ◽  
...  

This project describes "VoiEmo- A Speech Emotion Recognizer", a system for recognizing the emotional state of an individual from his/her speech. For example, one's speech becomes loud and fast, with a higher and wider range in pitch, when in a state of fear, anger, or joy whereas human voice is generally slow and low pitched in sadness and tiredness. We have particularly developed a classification model speech emotion detection based on Convolutional neural networks (CNNs), Support Vector Machine (SVM), Multilayer Perceptron (MLP) Classification which make predictions considering the acoustic features of speech signal such as Mel Frequency Cepstral Coefficient (MFCC). Our models have been trained to recognize seven common emotions (neutral, calm, happy, sad, angry, fearful, disgust, surprise). For training and testing the model, we have used relevant data from the Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS) dataset and the Toronto Emotional Speech Set (TESS) Dataset. The system is advantageous as it can provide a general idea about the emotional state of the individual based on the acoustic features of the speech irrespective of the language the speaker speaks in, moreover, it also saves time and effort. Speech emotion recognition systems have their applications in various fields like in call centers and BPOs, criminal investigation, psychiatric therapy, the automobile industry, etc.


Author(s):  
D.N.V.S.L.S. Indira, Et. al.

The importance of integrating visual components into the speech recognition process for improving robustness has been identified by recent developments in audio visual emotion recognition (AVER). Visual characteristics have a strong potential to boost the accuracy of current techniques for speech recognition and have become increasingly important when modelling speech recognizers. CNN is very good to work with images. An audio file can be converted into image file like a spectrogram with good frequency to extract hidden knowledge. This paper provides a method for emotional expression recognition using Spectrograms and CNN-2D. Spectrograms formed from the signals of speech it’s a CNN-2D input. The proposed model, which consists of three layers of CNN and those are convolution layers, pooling layers and fully connected layers extract discriminatory characteristics from the representations of spectrograms and for the seven feelings, performance estimates. This article compares the output with the existing SER using audio files and CNN. The accuracy is improved by 6.5% when CNN-2D is used.


2021 ◽  
pp. 7278-7290
Author(s):  
Divyanshu Sinha, Dr J. P. Pandey, Dr. Bhavesh Chauhan

Face recognition system is a state-of-the-art computer vision application within the artificial intelligence arena. Face recognition is the automated recognition of humans for their names/unique ID. The age invariant face recognition is a challenge task in the field of face recog-nition. In this work, we have introduced a stacked support vector machine where kernel activation of prototype examples is combined in nonlinear ways. The proposed work integrates soft compu-ting-based support vector machine (SVM) with deep SVM. The proposed model uses the implied relation between the variables described above in order to optimize their overall performance. Specifically, our method uses three different stages of complex convolution neural networks that detect and analyze the location of faces position and landmarks. This work has introduced cross-age celebrity dataset (CACD) for both single as well as cross-database enabling the transition of age. The proposed work has been implemented in the MATLAB simulation tool considering CACD dataset. Experimental results indicate that our techniques significantly outperform other strategies across a range of challenging metrics.


Sign in / Sign up

Export Citation Format

Share Document