Impact of Feature Selection Algorithm on Speech Emotion Recognition Using Deep Convolutional Neural Network

Speech emotion recognition (SER) plays a significant role in human–machine interaction. Emotion recognition from speech and its precise classification is a challenging task because a machine is unable to understand its context. For an accurate emotion classification, emotionally relevant features must be extracted from the speech data. Traditionally, handcrafted features were used for emotional classification from speech signals; however, they are not efficient enough to accurately depict the emotional states of the speaker. In this study, the benefits of a deep convolutional neural network (DCNN) for SER are explored. For this purpose, a pretrained network is used to extract features from state-of-the-art speech emotional datasets. Subsequently, a correlation-based feature selection technique is applied to the extracted features to select the most appropriate and discriminative features for SER. For the classification of emotions, we utilize support vector machines, random forests, the k-nearest neighbors algorithm, and neural network classifiers. Experiments are performed for speaker-dependent and speaker-independent SER using four publicly available datasets: the Berlin Dataset of Emotional Speech (Emo-DB), Surrey Audio Visual Expressed Emotion (SAVEE), Interactive Emotional Dyadic Motion Capture (IEMOCAP), and the Ryerson Audio Visual Dataset of Emotional Speech and Song (RAVDESS). Our proposed method achieves an accuracy of 95.10% for Emo-DB, 82.10% for SAVEE, 83.80% for IEMOCAP, and 81.30% for RAVDESS, for speaker-dependent SER experiments. Moreover, our method yields the best results for speaker-independent SER with existing handcrafted features-based SER approaches.

Download Full-text

Optimal feature selection based speech emotion recognition using two‐stream deep convolutional neural network

International Journal of Intelligent Systems ◽

10.1002/int.22505 ◽

2021 ◽

Author(s):

Mustaqeem ◽

Soonil Kwon

Keyword(s):

Neural Network ◽

Feature Selection ◽

Convolutional Neural Network ◽

Emotion Recognition ◽

Deep Convolutional Neural Network ◽

Speech Emotion Recognition ◽

Optimal Feature Selection ◽

Optimal Feature

Download Full-text

Robust Speech Emotion Recognition for Sindhi Language based on Deep Convolutional Neural Network

2021 International Conference on Communications, Information System and Computer Engineering (CISCE) ◽

10.1109/cisce52179.2021.9445883 ◽

2021 ◽

Author(s):

Muddasar Laghari ◽

Muhammad Junaid Tahir ◽

Abdullah Azeem ◽

Waqar Riaz ◽

Yi Zhou

Keyword(s):

Neural Network ◽

Convolutional Neural Network ◽

Emotion Recognition ◽

Deep Convolutional Neural Network ◽

Speech Emotion Recognition

Download Full-text

A Light-Weight Deep Convolutional Neural Network for Speech Emotion Recognition using Mel-Spectrograms

2019 14th International Joint Symposium on Artificial Intelligence and Natural Language Processing (iSAI-NLP) ◽

10.1109/isai-nlp48611.2019.9045511 ◽

2019 ◽

Author(s):

Kamin Atsavasirilert ◽

Thanaruk Theeramunkong ◽

Sasiporn Usanavasin ◽

Anocha Rugchatjaroen ◽

Surasak Boonkla ◽

...

Keyword(s):

Neural Network ◽

Convolutional Neural Network ◽

Emotion Recognition ◽

Deep Convolutional Neural Network ◽

Speech Emotion Recognition ◽

Light Weight

Download Full-text

Effect on speech emotion classification of a feature selection approach using a convolutional neural network

PeerJ Computer Science ◽

10.7717/peerj-cs.766 ◽

2021 ◽

Vol 7 ◽

pp. e766

Author(s):

Ammar Amjad ◽

Lal Khan ◽

Hsien-Tsung Chang

Keyword(s):

Neural Network ◽

Feature Selection ◽

Convolutional Neural Network ◽

Feature Selection Method ◽

Classification Problem ◽

Speech Emotion Recognition ◽

Support Vector ◽

Emotion Classification ◽

K Nearest Neighbors ◽

Feature Selection Technique

Speech emotion recognition (SER) is a challenging issue because it is not clear which features are effective for classification. Emotionally related features are always extracted from speech signals for emotional classification. Handcrafted features are mainly used for emotional identification from audio signals. However, these features are not sufficient to correctly identify the emotional state of the speaker. The advantages of a deep convolutional neural network (DCNN) are investigated in the proposed work. A pretrained framework is used to extract the features from speech emotion databases. In this work, we adopt the feature selection (FS) approach to find the discriminative and most important features for SER. Many algorithms are used for the emotion classification problem. We use the random forest (RF), decision tree (DT), support vector machine (SVM), multilayer perceptron classifier (MLP), and k-nearest neighbors (KNN) to classify seven emotions. All experiments are performed by utilizing four different publicly accessible databases. Our method obtains accuracies of 92.02%, 88.77%, 93.61%, and 77.23% for Emo-DB, SAVEE, RAVDESS, and IEMOCAP, respectively, for speaker-dependent (SD) recognition with the feature selection method. Furthermore, compared to current handcrafted feature-based SER methods, the proposed method shows the best results for speaker-independent SER. For EMO-DB, all classifiers attain an accuracy of more than 80% with or without the feature selection technique.

Download Full-text

Deep and shallow features fusion based on deep convolutional neural network for speech emotion recognition

International Journal of Speech Technology ◽

10.1007/s10772-018-9551-4 ◽

2018 ◽

Vol 21 (4) ◽

pp. 931-940 ◽

Cited By ~ 7

Author(s):

Linhui Sun ◽

Jia Chen ◽

Keli Xie ◽

Ting Gu

Keyword(s):

Neural Network ◽

Convolutional Neural Network ◽

Emotion Recognition ◽

Deep Convolutional Neural Network ◽

Speech Emotion Recognition ◽

Features Fusion

Download Full-text

Improving speech emotion recognition based on acoustic words emotion dictionary

Natural Language Engineering ◽

10.1017/s1351324920000339 ◽

2020 ◽

pp. 1-15

Author(s):

Wang Wei ◽

Xinyi Cao ◽

He Li ◽

Lingjie Shen ◽

Yaqin Feng ◽

...

Keyword(s):

Neural Network ◽

Support Vector Machine ◽

Convolutional Neural Network ◽

Emotion Recognition ◽

Speech Emotion Recognition ◽

Support Vector ◽

Emotion Classification ◽

Acoustic Features ◽

Emotional Information ◽

Average Recall

Abstract To improve speech emotion recognition, a U-acoustic words emotion dictionary (AWED) features model is proposed based on an AWED. The method models emotional information from acoustic words level in different emotion classes. The top-list words in each emotion are selected to generate the AWED vector. Then, the U-AWED model is constructed by combining utterance-level acoustic features with the AWED features. Support vector machine and convolutional neural network are employed as the classifiers in our experiment. The results show that our proposed method in four tasks of emotion classification all provides significant improvement in unweighted average recall.

Download Full-text