Speech Emotion Recognition using Manta Ray Foraging Optimization Based Feature Selection

Speech emotion recognition (SER) plays a significant role in human–machine interaction. Emotion recognition from speech and its precise classification is a challenging task because a machine is unable to understand its context. For an accurate emotion classification, emotionally relevant features must be extracted from the speech data. Traditionally, handcrafted features were used for emotional classification from speech signals; however, they are not efficient enough to accurately depict the emotional states of the speaker. In this study, the benefits of a deep convolutional neural network (DCNN) for SER are explored. For this purpose, a pretrained network is used to extract features from state-of-the-art speech emotional datasets. Subsequently, a correlation-based feature selection technique is applied to the extracted features to select the most appropriate and discriminative features for SER. For the classification of emotions, we utilize support vector machines, random forests, the k-nearest neighbors algorithm, and neural network classifiers. Experiments are performed for speaker-dependent and speaker-independent SER using four publicly available datasets: the Berlin Dataset of Emotional Speech (Emo-DB), Surrey Audio Visual Expressed Emotion (SAVEE), Interactive Emotional Dyadic Motion Capture (IEMOCAP), and the Ryerson Audio Visual Dataset of Emotional Speech and Song (RAVDESS). Our proposed method achieves an accuracy of 95.10% for Emo-DB, 82.10% for SAVEE, 83.80% for IEMOCAP, and 81.30% for RAVDESS, for speaker-dependent SER experiments. Moreover, our method yields the best results for speaker-independent SER with existing handcrafted features-based SER approaches.

Download Full-text

Speaker independent feature selection for speech emotion recognition: A multi-task approach

Multimedia Tools and Applications ◽

10.1007/s11042-020-10119-w ◽

2020 ◽

Author(s):

Elham Kalhor ◽

Behzad Bakhtiari

Keyword(s):

Feature Selection ◽

Emotion Recognition ◽

Speech Emotion Recognition ◽

Speaker Independent ◽

Selection For

Download Full-text

Maximal Information Coefficient and Predominant Correlation-Based Feature Selection Toward A Three-Layer Model for Speech Emotion Recognition

2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) ◽

10.23919/apsipa.2018.8659695 ◽

2018 ◽

Author(s):

Xingfeng Li ◽

Masato Akagi

Keyword(s):

Feature Selection ◽

Emotion Recognition ◽

Speech Emotion Recognition ◽

Layer Model ◽

Correlation Based Feature Selection ◽

Information Coefficient ◽

Maximal Information Coefficient

Download Full-text

Survey on discriminative feature selection for speech emotion recognition

The 9th International Symposium on Chinese Spoken Language Processing ◽

10.1109/iscslp.2014.6936641 ◽

2014 ◽

Cited By ~ 7

Author(s):

Xin Xu ◽

Ya Li ◽

Xiaoying Xu ◽

Zhengqi Wen ◽

Hao Che ◽

...

Keyword(s):

Feature Selection ◽

Emotion Recognition ◽

Speech Emotion Recognition ◽

Discriminative Feature ◽

Selection For

Download Full-text

Speaker-independent speech emotion recognition based on random forest feature selection algorithm

2017 36th Chinese Control Conference (CCC) ◽

10.23919/chicc.2017.8029112 ◽

2017 ◽

Cited By ~ 2

Author(s):

Wei-Hua Cao ◽

Jian-Ping Xu ◽

Zhen-Tao Liu

Keyword(s):

Feature Selection ◽

Random Forest ◽

Emotion Recognition ◽

Speech Emotion Recognition ◽

Selection Algorithm ◽

Feature Selection Algorithm ◽

Speaker Independent

Download Full-text

Harmony search for feature selection in speech emotion recognition

2015 International Conference on Affective Computing and Intelligent Interaction (ACII) ◽

10.1109/acii.2015.7344596 ◽

2015 ◽

Cited By ~ 6

Author(s):

Yongsen Tao ◽

Kunxia Wang ◽

Jing Yang ◽

Ning An ◽

Lian Li

Keyword(s):

Feature Selection ◽

Emotion Recognition ◽

Harmony Search ◽

Speech Emotion Recognition

Download Full-text

Combining feature selection and representation for speech emotion recognition

2016 IEEE International Conference on Multimedia & Expo Workshops (ICMEW) ◽

10.1109/icmew.2016.7574773 ◽

2016 ◽

Cited By ~ 1

Author(s):

Wenjing Han ◽

Huabin Ruan ◽

Xiaojie Yu ◽

Xuan Zhu

Keyword(s):

Feature Selection ◽

Emotion Recognition ◽

Speech Emotion Recognition

Download Full-text

Multi-Stage Recognition of Speech Emotion Using Sequential Forward Feature Selection

Electrical Control and Communication Engineering ◽

10.1515/ecce-2016-0005 ◽

2016 ◽

Vol 10 (1) ◽

pp. 35-41 ◽

Cited By ~ 1

Author(s):

Tatjana Liogienė ◽

Gintautas Tamulevičius

Keyword(s):

Feature Selection ◽

Emotion Recognition ◽

Classification Scheme ◽

Recognition Rate ◽

Single Stage ◽

Speech Emotion Recognition ◽

Forward Selection ◽

Multi Stage ◽

Stage Scheme ◽

Stage Classification

Abstract The intensive research of speech emotion recognition introduced a huge collection of speech emotion features. Large feature sets complicate the speech emotion recognition task. Among various feature selection and transformation techniques for one-stage classification, multiple classifier systems were proposed. The main idea of multiple classifiers is to arrange the emotion classification process in stages. Besides parallel and serial cases, the hierarchical arrangement of multi-stage classification is most widely used for speech emotion recognition. In this paper, we present a sequential-forward-feature-selection-based multi-stage classification scheme. The Sequential Forward Selection (SFS) and Sequential Floating Forward Selection (SFFS) techniques were employed for every stage of the multi-stage classification scheme. Experimental testing of the proposed scheme was performed using the German and Lithuanian emotional speech datasets. Sequential-feature-selection-based multi-stage classification outperformed the single-stage scheme by 12–42 % for different emotion sets. The multi-stage scheme has shown higher robustness to the growth of emotion set. The decrease in recognition rate with the increase in emotion set for multi-stage scheme was lower by 10–20 % in comparison with the single-stage case. Differences in SFS and SFFS employment for feature selection were negligible.

Download Full-text