Speech Emotion Recognition Based on Speech Segment Using LSTM with Attention Model

We propose a speech-emotion recognition (SER) model with an “attention-long Long Short-Term Memory (LSTM)-attention” component to combine IS09, a commonly used feature for SER, and mel spectrogram, and we analyze the reliability problem of the interactive emotional dyadic motion capture (IEMOCAP) database. The attention mechanism of the model focuses on emotion-related elements of the IS09 and mel spectrogram feature and the emotion-related duration from the time of the feature. Thus, the model extracts emotion information from a given speech signal. The proposed model for the baseline study achieved a weighted accuracy (WA) of 68% for the improvised dataset of IEMOCAP. However, the WA of the proposed model of the main study and modified models could not achieve more than 68% in the improvised dataset. This is because of the reliability limit of the IEMOCAP dataset. A more reliable dataset is required for a more accurate evaluation of the model’s performance. Therefore, in this study, we reconstructed a more reliable dataset based on the labeling results provided by IEMOCAP. The experimental results of the model for the more reliable dataset confirmed a WA of 73%.

Download Full-text

Speech Emotion Recognition based on DCNN BiGRU Self-attention Model

2020 International Conference on Information Science, Parallel and Distributed Systems (ISPDS) ◽

10.1109/ispds51347.2020.00017 ◽

2020 ◽

Author(s):

Changjiang Jiang ◽

Junliang Liu ◽

Rong Mao ◽

Sifan Sun

Keyword(s):

Emotion Recognition ◽

Speech Emotion Recognition ◽

Attention Model

Download Full-text

Research on the Effect of Different Speech Segment Lengths on Speech Emotion Recognition Based on LSTM

Proceedings of 2019 the 9th International Workshop on Computer Science and Engineering ◽

10.18178/wcse.2019.06.073 ◽

2019 ◽

Keyword(s):

Emotion Recognition ◽

Speech Emotion Recognition ◽

Speech Segment

Download Full-text

The Impact of Attention Mechanisms on Speech Emotion Recognition

Sensors ◽

10.3390/s21227530 ◽

2021 ◽

Vol 21 (22) ◽

pp. 7530

Author(s):

Shouyan Chen ◽

Mingyan Zhang ◽

Xiaofen Yang ◽

Zhijia Zhao ◽

Tao Zou ◽

...

Keyword(s):

Emotion Recognition ◽

Attention Mechanism ◽

Speech Emotion Recognition ◽

Human Machine Interaction ◽

Attention Model ◽

Result Show ◽

Real Time Applications ◽

The Difference ◽

The Impact ◽

Machine Interaction

Speech emotion recognition (SER) plays an important role in real-time applications of human-machine interaction. The Attention Mechanism is widely used to improve the performance of SER. However, the applicable rules of attention mechanism are not deeply discussed. This paper discussed the difference between Global-Attention and Self-Attention and explored their applicable rules to SER classification construction. The experimental results show that the Global-Attention can improve the accuracy of the sequential model, while the Self-Attention can improve the accuracy of the parallel model when conducting the model with the CNN and the LSTM. With this knowledge, a classifier (CNN-LSTM×2+Global-Attention model) for SER is proposed. The experiments result show that it could achieve an accuracy of 85.427% on the EMO-DB dataset.

Download Full-text

Speech Emotion Recognition Based on Sparse Representation

Archives of Acoustics ◽

10.2478/aoa-2013-0055 ◽

2013 ◽

Vol 38 (4) ◽

pp. 465-470 ◽

Cited By ~ 11

Author(s):

Jingjie Yan ◽

Xiaolan Wang ◽

Weiyi Gu ◽

LiLi Ma

Keyword(s):

Dimensionality Reduction ◽

Emotion Recognition ◽

Least Squares ◽

Partial Least Squares ◽

Partial Least Squares Regression ◽

Speech Emotion Recognition ◽

Least Squares Regression ◽

Computer Science Pedagogy ◽

Reduction Methods ◽

Analysis Computer

Abstract Speech emotion recognition is deemed to be a meaningful and intractable issue among a number of do- mains comprising sentiment analysis, computer science, pedagogy, and so on. In this study, we investigate speech emotion recognition based on sparse partial least squares regression (SPLSR) approach in depth. We make use of the sparse partial least squares regression method to implement the feature selection and dimensionality reduction on the whole acquired speech emotion features. By the means of exploiting the SPLSR method, the component parts of those redundant and meaningless speech emotion features are lessened to zero while those serviceable and informative speech emotion features are maintained and selected to the following classification step. A number of tests on Berlin database reveal that the recogni- tion rate of the SPLSR method can reach up to 79.23% and is superior to other compared dimensionality reduction methods.

Download Full-text

Classifier fusion for speech emotion recognition based on improved queuing voting algorithm

Journal of Computer Applications ◽

10.3724/sp.j.1087.2009.00381 ◽

2009 ◽

Vol 29 (2) ◽

pp. 381-385 ◽

Cited By ~ 1

Author(s):

Li-qin FU ◽

Xia MAO ◽

Li-jiang CHEN

Keyword(s):

Emotion Recognition ◽

Classifier Fusion ◽

Speech Emotion Recognition

Download Full-text

Speech Emotion Recognition Based on Speech Segment Using LSTM with Attention Model

Speech Emotion Recognition Using Convolutional- Recurrent Neural Networks with Attention Model

Speech Emotion Recognition using Convolutional Neural Networks and Recurrent Neural Networks with Attention Model

3-D Convolutional Recurrent Neural Networks With Attention Model for Speech Emotion Recognition

Speech emotion recognition using derived features from speech segment and kernel principal component analysis

Attention-LSTM-Attention Model for Speech Emotion Recognition and Analysis of IEMOCAP Database

Speech Emotion Recognition based on DCNN BiGRU Self-attention Model

Research on the Effect of Different Speech Segment Lengths on Speech Emotion Recognition Based on LSTM

The Impact of Attention Mechanisms on Speech Emotion Recognition

Speech Emotion Recognition Based on Sparse Representation

Classifier fusion for speech emotion recognition based on improved queuing voting algorithm

Export Citation Format