RNN-based Dimensional Speech Emotion Recognition
◆ A speech emotion recognition system based on recurrent neural networks is developed using long short-term memory networks.◆ Two of acoustic feature sets are evaluated: 31 Features (3 time-domain features, 5 frequency-domain features, 13 MFCCs, 5 F0s, and 5 Harmonics) and eGeMaps feature set (23 features).◆ To evaluate the performance, some metrics are used i.e. mean squared error (MSE), mean absolute percentage error (MAPE), mean absolute error (MAE) and concordance correlation coefficient (CCC). Among those metrics, CCC is main focus as it is used by other researchers.◆ The developed system used multi-task learning to maximize arousal, valence, and dominance at the same time using CCC loss (1 - CCC). The result shows using LSTM networks improve the CCC score compared to baseline dense system. The best CCC score isobtained on arousal followed by dominance and valence.