Automatic speech emotion recognition is a very
necessary activity for effective human-computer interaction. This
paper is motivated by using spectrograms as inputs to the hybrid
deep convolutional LSTM for speech emotion recognition. In this
study, we trained our proposed model using four convolutional
layers for high-level feature extraction from input spectrograms,
LSTM layer for accumulating long-term dependencies and finally
two dense layers. Experimental results on the SAVEE database
shows promising performance. Our proposed model is highly
capable as it obtained an accuracy of 94.26%.