scholarly journals Speech-based human emotion recognition

2021 ◽  
Author(s):  
Talieh Seyed Tabtabae

Automatic Emotion Recognition (AER) is an emerging research area in the Human-Computer Interaction (HCI) field. As Computers are becoming more and more popular every day, the study of interaction between humans (users) and computers is catching more attention. In order to have a more natural and friendly interface between humans and computers, it would be beneficial to give computers the ability to recognize situations the same way a human does. Equipped with an emotion recognition system, computers will be able to recognize their users' emotional state and show the appropriate reaction to that. In today's HCI systems, machines can recognize the speaker and also content of the speech, using speech recognition and speaker identification techniques. If machines are equipped with emotion recognition techniques, they can also know "how it is said" to react more appropriately, and make the interaction more natural. One of the most important human communication channels is the auditory channel which carries speech and vocal intonation. In fact people can perceive each other's emotional state by the way they talk. Therefore in this work the speech signals are analyzed in order to set up an automatic system which recognizes the human emotional state. Six discrete emotional states have been considered and categorized in this research: anger, happiness, fear, surprise, sadness, and disgust. A set of novel spectral features are proposed in this contribution. Two approaches are applied and the results are compared. In the first approach, all the acoustic features are extracted from consequent frames along the speech signals. The statistical values of features are considered to constitute the features vectors. Suport Vector Machine (SVM), which is a relatively new approach in the field of machine learning is used to classify the emotional states. In the second approach, spectral features are extracted from non-overlapping logarithmically-spaced frequency sub-bands. In order to make use of all the extracted information, sequence discriminant SVMs are adopted. The empirical results show that the employed techniques are very promising.

2021 ◽  
Author(s):  
Talieh Seyed Tabtabae

Automatic Emotion Recognition (AER) is an emerging research area in the Human-Computer Interaction (HCI) field. As Computers are becoming more and more popular every day, the study of interaction between humans (users) and computers is catching more attention. In order to have a more natural and friendly interface between humans and computers, it would be beneficial to give computers the ability to recognize situations the same way a human does. Equipped with an emotion recognition system, computers will be able to recognize their users' emotional state and show the appropriate reaction to that. In today's HCI systems, machines can recognize the speaker and also content of the speech, using speech recognition and speaker identification techniques. If machines are equipped with emotion recognition techniques, they can also know "how it is said" to react more appropriately, and make the interaction more natural. One of the most important human communication channels is the auditory channel which carries speech and vocal intonation. In fact people can perceive each other's emotional state by the way they talk. Therefore in this work the speech signals are analyzed in order to set up an automatic system which recognizes the human emotional state. Six discrete emotional states have been considered and categorized in this research: anger, happiness, fear, surprise, sadness, and disgust. A set of novel spectral features are proposed in this contribution. Two approaches are applied and the results are compared. In the first approach, all the acoustic features are extracted from consequent frames along the speech signals. The statistical values of features are considered to constitute the features vectors. Suport Vector Machine (SVM), which is a relatively new approach in the field of machine learning is used to classify the emotional states. In the second approach, spectral features are extracted from non-overlapping logarithmically-spaced frequency sub-bands. In order to make use of all the extracted information, sequence discriminant SVMs are adopted. The empirical results show that the employed techniques are very promising.


Author(s):  
Sourabh Suke ◽  
Ganesh Regulwar ◽  
Nikesh Aote ◽  
Pratik Chaudhari ◽  
Rajat Ghatode ◽  
...  

This project describes "VoiEmo- A Speech Emotion Recognizer", a system for recognizing the emotional state of an individual from his/her speech. For example, one's speech becomes loud and fast, with a higher and wider range in pitch, when in a state of fear, anger, or joy whereas human voice is generally slow and low pitched in sadness and tiredness. We have particularly developed a classification model speech emotion detection based on Convolutional neural networks (CNNs), Support Vector Machine (SVM), Multilayer Perceptron (MLP) Classification which make predictions considering the acoustic features of speech signal such as Mel Frequency Cepstral Coefficient (MFCC). Our models have been trained to recognize seven common emotions (neutral, calm, happy, sad, angry, fearful, disgust, surprise). For training and testing the model, we have used relevant data from the Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS) dataset and the Toronto Emotional Speech Set (TESS) Dataset. The system is advantageous as it can provide a general idea about the emotional state of the individual based on the acoustic features of the speech irrespective of the language the speaker speaks in, moreover, it also saves time and effort. Speech emotion recognition systems have their applications in various fields like in call centers and BPOs, criminal investigation, psychiatric therapy, the automobile industry, etc.


Author(s):  
Semiye Demircan ◽  
Humar Kahramanli

ER (Emotion Recognition) from speech signals has been among the attractive subjects lately. As known feature extraction and feature selection are most important process steps in ER from speech signals. The aim of present study is to select the most relevant spectral feature subset. The proposed method is based on feature selection with optimization algorithm among the features obtained from speech signals. Firstly, MFCC (Mel-Frequency Cepstrum Coefficients) were extracted from the EmoDB. Several statistical values as maximum, minimum, mean, standard deviation, skewness, kurtosis and median were obtained from MFCC. The next process of study was feature selection which was performed in two stages: In the first stage ABM (Agent-Based Modelling) that is hardly applied to this area was applied to actual features. In the second stageOpt-aiNET optimization algorithm was applied in order to choose the agent group giving the best classification success. The last process of the study is classification. ANN (Artificial Neural Network) and 10 cross-validations were used for classification and evaluation. A narrow comprehension with three emotions was performed in the application. As a result, it was seen that the classification accuracy was rising after applying proposed method. The method was shown promising performance with spectral features.


Author(s):  
A. Pramod Reddy ◽  
Vijayarajan V

For emotion recognition, here the features extracted from prevalent speech samples of Berlin emotional database are pitch, intensity, log energy, formant, mel-frequency ceptral coefficients (MFCC) as base features and power spectral density as an added function of frequency. In these work seven emotions namely anger, neutral, happy, Boredom, disgust, fear and sadness are considered in our study. Temporal and Spectral features are considered for building AER(Automatic Emotion Recognition) model. The extracted features are analyzed using Support Vector Machine (SVM) and with multilayer perceptron (MLP) a class of feed-forward ANN classifiers is/are used to classify different emotional states. We observed 91% accuracy for Angry and Boredom emotional classes by using SVM and more than 96% accuracy using ANN and with an overall accuracy of 87.17% using SVM, 94% for ANN.


2020 ◽  
Vol 0 (0) ◽  
Author(s):  
Mehmet Akif Ozdemir ◽  
Murside Degirmenci ◽  
Elif Izci ◽  
Aydin Akan

AbstractThe emotional state of people plays a key role in physiological and behavioral human interaction. Emotional state analysis entails many fields such as neuroscience, cognitive sciences, and biomedical engineering because the parameters of interest contain the complex neuronal activities of the brain. Electroencephalogram (EEG) signals are processed to communicate brain signals with external systems and make predictions over emotional states. This paper proposes a novel method for emotion recognition based on deep convolutional neural networks (CNNs) that are used to classify Valence, Arousal, Dominance, and Liking emotional states. Hence, a novel approach is proposed for emotion recognition with time series of multi-channel EEG signals from a Database for Emotion Analysis and Using Physiological Signals (DEAP). We propose a new approach to emotional state estimation utilizing CNN-based classification of multi-spectral topology images obtained from EEG signals. In contrast to most of the EEG-based approaches that eliminate spatial information of EEG signals, converting EEG signals into a sequence of multi-spectral topology images, temporal, spectral, and spatial information of EEG signals are preserved. The deep recurrent convolutional network is trained to learn important representations from a sequence of three-channel topographical images. We have achieved test accuracy of 90.62% for negative and positive Valence, 86.13% for high and low Arousal, 88.48% for high and low Dominance, and finally 86.23% for like–unlike. The evaluations of this method on emotion recognition problem revealed significant improvements in the classification accuracy when compared with other studies using deep neural networks (DNNs) and one-dimensional CNNs.


2021 ◽  
pp. 1-17
Author(s):  
Naveen Masood ◽  
Humera Farooq

Most of the electroencephalography (EEG) based emotion recognition systems rely on single stimulus to evoke emotions. EEG data is mostly recorded with higher number of electrodes that can lead to data redundancy and longer experimental setup time. The question “whether the configuration with lesser number of electrodes is common amongst different stimuli presentation paradigms” remains unanswered. There are publicly available datasets for EEG based human emotional states recognition. Since this work is focused towards classifying emotions while subjects are experiencing different stimuli, therefore we need to perform new experiments. Keeping aforementioned issues in consideration, this work presents a novel experimental study that records EEG data for three different human emotional states evoked with four different stimuli presentation paradigms. A methodology based on iterative Genetic Algorithm in combination with majority voting has been used to achieve configuration with reduced number of EEG electrodes keeping in consideration minimum loss of classification accuracy. The results obtained are comparable with recent studies. Stimulus independent configurations with lesser number of electrodes lead towards low computational complexity as well as reduced set up time for future EEG based smart systems for emotions recognition


2021 ◽  
Vol 57 (2) ◽  
pp. 313-321
Author(s):  
S Shuma ◽  
◽  
T. Christy Bobby ◽  
S. Malathi ◽  
◽  
...  

Emotion recognition is important in human communication and to achieve a complete interaction between humans and machines. In medical applications, emotion recognition is used to assist the children with Autism Spectrum Disorder (ASD to improve their socio-emotional communication, helps doctors with diagnosis of diseases such as depression and dementia and also helps the caretakers of older patients to monitor their well-being. This paper discusses the application of feature level fusion of speech and facial expressions of different emotions such as neutral, happy, sad, angry, surprise, fearful and disgust. Also, to explore how best to build the deep learning networks to classify the emotions independently and jointly from these two modalities. VGG-model is utilized to extract features from facial images, and spectral features are extracted from speech signals. Further, feature level fusion technique is adopted to fuse the features extracted from the two modalities. Principal Component Analysis (PCA is implemented to choose the significant features. The proposed method achieved a maximum score of 90% on training set and 82% on validation set. The recognition rate in case of multimodal data improved greatly when compared to unimodal system. The multimodal system gave an improvement of 9% compared to the performance of the system based on speech. Thus, result shows that the proposed Multimodal Emotion Recognition (MER outperform the unimodal emotion recognition system.


Sign in / Sign up

Export Citation Format

Share Document