Speech emotion recognition using cepstral features extracted with novel triangular filter banks based on bark and ERB frequency scales

Many speech emotion recognition systems have been designed using different features and classification methods. Still, there is a lack of knowledge and reasoning regarding the underlying speech characteristics and processing, i.e., how basic characteristics, methods, and settings affect the accuracy, to what extent, etc. This study is to extend physical perspective on speech emotion recognition by analyzing basic speech characteristics and modeling methods, e.g., time characteristics (segmentation, window types, and classification regions—lengths and overlaps), frequency ranges, frequency scales, processing of whole speech (spectrograms), vocal tract (filter banks, linear prediction coefficient (LPC) modeling), and excitation (inverse LPC filtering) signals, magnitude and phase manipulations, cepstral features, etc. In the evaluation phase the state-of-the-art classification method and rigorous statistical tests were applied, namely N-fold cross validation, paired t-test, rank, and Pearson correlations. The results revealed several settings in a 75% accuracy range (seven emotions). The most successful methods were based on vocal tract features using psychoacoustic filter banks covering the 0–8 kHz frequency range. Well scoring are also spectrograms carrying vocal tract and excitation information. It was found that even basic processing like pre-emphasis, segmentation, magnitude modifications, etc., can dramatically affect the results. Most findings are robust by exhibiting strong correlations across tested databases.

Download Full-text

Performance Comparison of Different Cepstral Features for Speech Emotion Recognition

2018 International CET Conference on Control, Communication, and Computing (IC4) ◽

10.1109/cetic4.2018.8531065 ◽

2018 ◽

Cited By ~ 4

Author(s):

N. Sugan ◽

N. S. Sai Srinivas ◽

Niladri Kar ◽

L. S. Kumar ◽

Malaya Kumar Nath ◽

...

Keyword(s):

Emotion Recognition ◽

Performance Comparison ◽

Speech Emotion Recognition ◽

Cepstral Features

Download Full-text

Improved Speech Emotion Recognition Using Modified Mean Cepstral Features

2020 IEEE 17th India Council International Conference (INDICON) ◽

10.1109/indicon49873.2020.9342495 ◽

2020 ◽

Author(s):

Krishna Chauhan ◽

Kamlesh Kumar Sharma ◽

Tarun Varma

Keyword(s):

Emotion Recognition ◽

Speech Emotion Recognition ◽

Cepstral Features

Download Full-text

NMF-based Cepstral Features for Speech Emotion Recognition

2018 4th Iranian Conference on Signal Processing and Intelligent Systems (ICSPIS) ◽

10.1109/icspis.2018.8700539 ◽

2018 ◽

Cited By ~ 2

Author(s):

Milad Lashkari ◽

Sanaz Seyedin

Keyword(s):

Emotion Recognition ◽

Speech Emotion Recognition ◽

Cepstral Features

Download Full-text

Speech Emotion Recognition Using Auditory Spectrogram and Cepstral Features

10.23919/eusipco54536.2021.9616144 ◽

2021 ◽

Author(s):

Shujie Zhao ◽

Yan Yang ◽

Israel Cohen ◽

Lijun Zhang

Keyword(s):

Emotion Recognition ◽

Speech Emotion Recognition ◽

Cepstral Features

Download Full-text

Speech Emotion Recognition Based on Sparse Representation

Archives of Acoustics ◽

10.2478/aoa-2013-0055 ◽

2013 ◽

Vol 38 (4) ◽

pp. 465-470 ◽

Cited By ~ 11

Author(s):

Jingjie Yan ◽

Xiaolan Wang ◽

Weiyi Gu ◽

LiLi Ma

Keyword(s):

Dimensionality Reduction ◽

Emotion Recognition ◽

Least Squares ◽

Partial Least Squares ◽

Partial Least Squares Regression ◽

Speech Emotion Recognition ◽

Least Squares Regression ◽

Computer Science Pedagogy ◽

Reduction Methods ◽

Analysis Computer

Abstract Speech emotion recognition is deemed to be a meaningful and intractable issue among a number of do- mains comprising sentiment analysis, computer science, pedagogy, and so on. In this study, we investigate speech emotion recognition based on sparse partial least squares regression (SPLSR) approach in depth. We make use of the sparse partial least squares regression method to implement the feature selection and dimensionality reduction on the whole acquired speech emotion features. By the means of exploiting the SPLSR method, the component parts of those redundant and meaningless speech emotion features are lessened to zero while those serviceable and informative speech emotion features are maintained and selected to the following classification step. A number of tests on Berlin database reveal that the recogni- tion rate of the SPLSR method can reach up to 79.23% and is superior to other compared dimensionality reduction methods.

Download Full-text