Auditory-Inspired End-to-End Speech Emotion Recognition Using 3D Convolutional Recurrent Neural Networks Based on Spectral-Temporal Representation

Author(s):  
Zhichao Peng ◽  
Zhi Zhu ◽  
Masashi Unoki ◽  
Jianwu Dang ◽  
Masato Akagi
2020 ◽  
Vol 17 (8) ◽  
pp. 3786-3789
Author(s):  
P. Gayathri ◽  
P. Gowri Priya ◽  
L. Sravani ◽  
Sandra Johnson ◽  
Visanth Sampath

Recognition of emotions is the aspect of speech recognition that is gaining more attention and the need for it is growing enormously. Although there are methods to identify emotion using machine learning techniques, we assume in this paper that calculating deltas and delta-deltas for customized features not only preserves effective emotional information, but also that the impact of irrelevant emotional factors, leading to a reduction in misclassification. Furthermore, Speech Emotion Recognition (SER) often suffers from the silent frames and irrelevant emotional frames. Meanwhile, the process of attention has demonstrated exceptional performance in learning related feature representations for specific tasks. Inspired by this, propose a Convolutionary Recurrent Neural Networks (ACRNN) based on Attention to learn discriminative features for SER, where the Mel-spectrogram with deltas and delta-deltas is used as input. Finally, experimental results show the feasibility of the proposed method and attain state-of-the-art performance in terms of unweighted average recall.


2021 ◽  
Vol 173 ◽  
pp. 114683
Author(s):  
Dongdong Li ◽  
Jinlin Liu ◽  
Zhuo Yang ◽  
Linyu Sun ◽  
Zhe Wang

Sign in / Sign up

Export Citation Format

Share Document