Robust Speech Emotion Recognition System Through Novel ER-CNN and Spectral Features

Author(s):  
Muhammad Zeeshan ◽  
Huma Qayoom ◽  
Farman Hassan
2021 ◽  
Author(s):  
Talieh Seyed Tabtabae

Automatic Emotion Recognition (AER) is an emerging research area in the Human-Computer Interaction (HCI) field. As Computers are becoming more and more popular every day, the study of interaction between humans (users) and computers is catching more attention. In order to have a more natural and friendly interface between humans and computers, it would be beneficial to give computers the ability to recognize situations the same way a human does. Equipped with an emotion recognition system, computers will be able to recognize their users' emotional state and show the appropriate reaction to that. In today's HCI systems, machines can recognize the speaker and also content of the speech, using speech recognition and speaker identification techniques. If machines are equipped with emotion recognition techniques, they can also know "how it is said" to react more appropriately, and make the interaction more natural. One of the most important human communication channels is the auditory channel which carries speech and vocal intonation. In fact people can perceive each other's emotional state by the way they talk. Therefore in this work the speech signals are analyzed in order to set up an automatic system which recognizes the human emotional state. Six discrete emotional states have been considered and categorized in this research: anger, happiness, fear, surprise, sadness, and disgust. A set of novel spectral features are proposed in this contribution. Two approaches are applied and the results are compared. In the first approach, all the acoustic features are extracted from consequent frames along the speech signals. The statistical values of features are considered to constitute the features vectors. Suport Vector Machine (SVM), which is a relatively new approach in the field of machine learning is used to classify the emotional states. In the second approach, spectral features are extracted from non-overlapping logarithmically-spaced frequency sub-bands. In order to make use of all the extracted information, sequence discriminant SVMs are adopted. The empirical results show that the employed techniques are very promising.


Author(s):  
Jian Zhou ◽  
Guoyin Wang ◽  
Yong Yang

Speech emotion recognition is becoming more and more important in such computer application fields as health care, children education, etc. In order to improve the prediction performance or providing faster and more cost-effective recognition system, an attribute selection is often carried out beforehand to select the important attributes from the input attribute sets. However, it is time-consuming for traditional feature selection method used in speech emotion recognition to determine an optimum or suboptimum feature subset. Rough set theory offers an alternative, formal and methodology that can be employed to reduce the dimensionality of data. The purpose of this study is to investigate the effectiveness of Rough Set Theory in identifying important features in speech emotion recognition system. The experiments on CLDC emotion speech database clearly show this approach can reduce the calculation cost while retaining a suitable high recognition rate.


Sign in / Sign up

Export Citation Format

Share Document