IMPROVED SPEAKER-INDEPENDENT EMOTION RECOGNITION FROM SPEECH USING TWO-STAGE FEATURE REDUCTION

2015 ◽

Author(s):

Hasrul Mohd Nazid ◽

Hariharan Muthusamy ◽

Vikneswaran Vijean ◽

Sazali Yaacob

Keyword(s):

Emotion Recognition ◽

Principal Component ◽

Feature Reduction ◽

Speech Emotion Recognition ◽

Emotional Speech ◽

Two Stage ◽

Linear Discriminant ◽

Speaker Independent ◽

Speech Features ◽

And Gender

In the recent years, researchers are focusing to improve the accuracy of speech emotion recognition. Generally, high emotion recognition accuracies were obtained for two-class emotion recognition, but multi-class emotion recognition is still a challenging task . The main aim of this work is to propose a two-stage feature reduction using Principal Component Analysis (PCA) and Linear Discriminant Analysis (LDA) for improving the accuracy of the speech emotion recognition (ER) system. Short-term speech features were extracted from the emotional speech signals. Experiments were carried out using four different supervised classifi ers with two different emotional speech databases. From the experimental results, it can be inferred that the proposed method provides better accuracies of 87.48% for speaker dependent (SD) and gender dependent (GD) ER experiment, 85.15% for speaker independent (SI) ER experiment, and 87.09% for gender independent (GI) experiment.

Download Full-text

A Two-Stage Spatiotemporal Attention Convolution Network for Continuous Dimensional Emotion Recognition from Facial Video

IEEE Signal Processing Letters ◽

10.1109/lsp.2021.3063609 ◽

2021 ◽

pp. 1-1

Author(s):

Min Hu ◽

Qian Chu ◽

Xiaohua Wang ◽

Lei He ◽

Fuji Ren

Keyword(s):

Emotion Recognition ◽

Two Stage

Download Full-text

Two stage emotion recognition based on speaking rate

International Journal of Speech Technology ◽

10.1007/s10772-010-9085-x ◽

2010 ◽

Vol 14 (1) ◽

pp. 35-48 ◽

Cited By ~ 32

Author(s):

Shashidhar G. Koolagudi ◽

Rao Sreenivasa Krothapalli

Keyword(s):

Emotion Recognition ◽

Speaking Rate ◽

Two Stage

Download Full-text

The Relevance of Voice Quality Features in Speaker Independent Emotion Recognition

2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07 ◽

10.1109/icassp.2007.367152 ◽

2007 ◽

Cited By ~ 43

Author(s):

Marko Lugger ◽

Bin Yang

Keyword(s):

Emotion Recognition ◽

Voice Quality ◽

Speaker Independent ◽

Quality Features

Download Full-text

On the relevance of high-level features for speaker independent emotion recognition of spontaneous speech

10.21437/interspeech.2009-483 ◽

2009 ◽

Author(s):

Marko Lugger ◽

Bin Yang

Keyword(s):

Emotion Recognition ◽

Spontaneous Speech ◽

Speaker Independent ◽

High Level

Download Full-text

Emotion Recognition From Speech Using Perceptual Filter and Neural Network

Advances in Computer and Electrical Engineering - Neural Networks for Natural Language Processing ◽

10.4018/978-1-7998-1159-6.ch004 ◽

2020 ◽

pp. 78-91 ◽

Cited By ~ 2

Author(s):

Revathi A. ◽

Sasikaladevi N.

Keyword(s):

Neural Network ◽

Emotion Recognition ◽

Vector Quantization ◽

Group Performance ◽

Back Propagation ◽

Critical Band ◽

Emotion Classification ◽

Back Propagation Algorithm ◽

Propagation Algorithm ◽

Speaker Independent

This chapter on multi speaker independent emotion recognition encompasses the use of perceptual features with filters spaced in Equivalent rectangular bandwidth (ERB) and BARK scale and vector quantization (VQ) classifier for classifying groups and artificial neural network with back propagation algorithm for emotion classification in a group. Performance can be improved by using the large amount of data in a pertinent emotion to adequately train the system. With the limited set of data, this proposed system has provided consistently better accuracy for the perceptual feature with critical band analysis done in ERB scale.

Download Full-text

Feature Reduction for Dimensional Emotion Recognition in Human-Robot Interaction

2015 IEEE Symposium Series on Computational Intelligence ◽

10.1109/ssci.2015.119 ◽

2015 ◽

Cited By ~ 2

Author(s):

Ntombikayise Banda ◽

Andries Engelbrecht ◽

Peter Robinson

Keyword(s):

Emotion Recognition ◽

Human Robot Interaction ◽

Feature Reduction ◽

Robot Interaction

Download Full-text

Impact of Feature Selection Algorithm on Speech Emotion Recognition Using Deep Convolutional Neural Network

Sensors ◽

10.3390/s20216008 ◽

2020 ◽

Vol 20 (21) ◽

pp. 6008 ◽

Cited By ~ 1

Author(s):

Misbah Farooq ◽

Fawad Hussain ◽

Naveed Khan Baloch ◽

Fawad Riasat Raja ◽

Heejung Yu ◽

...

Keyword(s):

Neural Network ◽

Feature Selection ◽

Convolutional Neural Network ◽

Emotion Recognition ◽

Deep Convolutional Neural Network ◽

Speech Emotion Recognition ◽

Support Vector ◽

Emotional Speech ◽

Human Machine Interaction ◽

Speaker Independent

Speech emotion recognition (SER) plays a significant role in human–machine interaction. Emotion recognition from speech and its precise classification is a challenging task because a machine is unable to understand its context. For an accurate emotion classification, emotionally relevant features must be extracted from the speech data. Traditionally, handcrafted features were used for emotional classification from speech signals; however, they are not efficient enough to accurately depict the emotional states of the speaker. In this study, the benefits of a deep convolutional neural network (DCNN) for SER are explored. For this purpose, a pretrained network is used to extract features from state-of-the-art speech emotional datasets. Subsequently, a correlation-based feature selection technique is applied to the extracted features to select the most appropriate and discriminative features for SER. For the classification of emotions, we utilize support vector machines, random forests, the k-nearest neighbors algorithm, and neural network classifiers. Experiments are performed for speaker-dependent and speaker-independent SER using four publicly available datasets: the Berlin Dataset of Emotional Speech (Emo-DB), Surrey Audio Visual Expressed Emotion (SAVEE), Interactive Emotional Dyadic Motion Capture (IEMOCAP), and the Ryerson Audio Visual Dataset of Emotional Speech and Song (RAVDESS). Our proposed method achieves an accuracy of 95.10% for Emo-DB, 82.10% for SAVEE, 83.80% for IEMOCAP, and 81.30% for RAVDESS, for speaker-dependent SER experiments. Moreover, our method yields the best results for speaker-independent SER with existing handcrafted features-based SER approaches.

Download Full-text