Novel acoustic features for speech emotion recognition

This project describes "VoiEmo- A Speech Emotion Recognizer", a system for recognizing the emotional state of an individual from his/her speech. For example, one's speech becomes loud and fast, with a higher and wider range in pitch, when in a state of fear, anger, or joy whereas human voice is generally slow and low pitched in sadness and tiredness. We have particularly developed a classification model speech emotion detection based on Convolutional neural networks (CNNs), Support Vector Machine (SVM), Multilayer Perceptron (MLP) Classification which make predictions considering the acoustic features of speech signal such as Mel Frequency Cepstral Coefficient (MFCC). Our models have been trained to recognize seven common emotions (neutral, calm, happy, sad, angry, fearful, disgust, surprise). For training and testing the model, we have used relevant data from the Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS) dataset and the Toronto Emotional Speech Set (TESS) Dataset. The system is advantageous as it can provide a general idea about the emotional state of the individual based on the acoustic features of the speech irrespective of the language the speaker speaks in, moreover, it also saves time and effort. Speech emotion recognition systems have their applications in various fields like in call centers and BPOs, criminal investigation, psychiatric therapy, the automobile industry, etc.

Download Full-text

Emotion Recognition from Speech Signals Using Elicited Data and Fuzzy KDA

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.385-386.1385 ◽

2013 ◽

Vol 385-386 ◽

pp. 1385-1388

Author(s):

Yong Qiang Bao ◽

Li Zhao ◽

Cheng Wei Hang

Keyword(s):

Discriminant Analysis ◽

Emotion Recognition ◽

Real World ◽

Recognition Rate ◽

Experimental Results ◽

Speech Emotion Recognition ◽

Acoustic Features ◽

Reliable System ◽

Kernel Discriminant Analysis ◽

Real World Applications

In this paper we introduced the application of Fuzzy KDA in speech emotion recognition using elicited data. The emotional data induced in a psychology experiment. The acted data is not suitable for developing real world applications and by using more naturalistic data we may build more reliable system. The emotional feature set is then constructed for modeling and recognition. A total of 372 low level acoustic features are used and kernel discriminant analysis is used for emotion recognition. The experimental results show a promising recognition rate.

Download Full-text

Exploring the benefits of discretization of acoustic features for speech emotion recognition

10.21437/interspeech.2009-107 ◽

2009 ◽

Author(s):

Thurid Vogt ◽

Elisabeth André

Keyword(s):

Emotion Recognition ◽

Speech Emotion Recognition ◽

Acoustic Features

Download Full-text

Survey of Deep Representation Learning for Speech Emotion Recognition

10.36227/techrxiv.16689484 ◽

2021 ◽

Author(s):

Siddique Latif ◽

Rajib Rana ◽

Sara Khalifa ◽

Raja Jurdak ◽

Junaid Qadir ◽

...

Keyword(s):

Emotion Recognition ◽

General Setting ◽

Representation Learning ◽

Data Driven ◽

Speech Emotion Recognition ◽

Feature Engineering ◽

Acoustic Features ◽

Learning Techniques ◽

Comprehensive Survey ◽

Hierarchical Representations

<div>Traditionally, speech emotion recognition (SER) research has relied on manually handcrafted acoustic features using feature engineering. However, the design of handcrafted features for complex SER tasks requires significant manual effort, which impedes generalisability and slows the pace of innovation. This has motivated the adoption of representation learning techniques that can automatically learn an intermediate representation of the input signal without any manual feature engineering. Representation learning has led to improved SER performance and enabled rapid innovation. Its effectiveness has further increased with advances in deep learning (DL), which has facilitated deep representation learning where hierarchical representations are automatically learned in a data-driven manner. This paper presents the first comprehensive survey on the important topic of deep representation learning for SER. We highlight various techniques, related challenges and identify important future areas of research. Our survey bridges the gap in the literature since existing surveys either focus on SER with hand-engineered features or representation learning in the general setting without focusing on SER.</div>

Download Full-text

Improving speech emotion recognition based on acoustic words emotion dictionary

Natural Language Engineering ◽

10.1017/s1351324920000339 ◽

2020 ◽

pp. 1-15

Author(s):

Wang Wei ◽

Xinyi Cao ◽

He Li ◽

Lingjie Shen ◽

Yaqin Feng ◽

...

Keyword(s):

Neural Network ◽

Support Vector Machine ◽

Convolutional Neural Network ◽

Emotion Recognition ◽

Speech Emotion Recognition ◽

Support Vector ◽

Emotion Classification ◽

Acoustic Features ◽

Emotional Information ◽

Average Recall

Abstract To improve speech emotion recognition, a U-acoustic words emotion dictionary (AWED) features model is proposed based on an AWED. The method models emotional information from acoustic words level in different emotion classes. The top-list words in each emotion are selected to generate the AWED vector. Then, the U-AWED model is constructed by combining utterance-level acoustic features with the AWED features. Support vector machine and convolutional neural network are employed as the classifiers in our experiment. The results show that our proposed method in four tasks of emotion classification all provides significant improvement in unweighted average recall.

Download Full-text

Speech Emotion Recognition Using Cross-Correlation and Acoustic Features

2018 IEEE 16th Intl Conf on Dependable, Autonomic and Secure Computing, 16th Intl Conf on Pervasive Intelligence and Computing, 4th Intl Conf on Big Data Intelligence and Computing and Cyber Science and Technology Congress(DASC/PiCom/DataCom/CyberSciTech) ◽

10.1109/dasc/picom/datacom/cyberscitec.2018.00050 ◽

2018 ◽

Cited By ~ 2

Author(s):

Joyjit Chatterjee ◽

Vajja Mukesh ◽

Hui-Huang Hsu ◽

Garima Vyas ◽

Zhen Liu

Keyword(s):

Emotion Recognition ◽

Cross Correlation ◽

Speech Emotion Recognition ◽

Acoustic Features

Download Full-text

Ensemble Learning of Hybrid Acoustic Features for Speech Emotion Recognition

Algorithms ◽

10.3390/a13030070 ◽

2020 ◽

Vol 13 (3) ◽

pp. 70 ◽

Cited By ~ 5

Author(s):

Kudakwashe Zvarevashe ◽

Oludayo Olugbara

Keyword(s):

Emotion Recognition ◽

Ensemble Learning ◽

Automatic Recognition ◽

Speech Emotion Recognition ◽

Spectral Features ◽

Intelligent Robot ◽

Acoustic Features ◽

Random Decision Forest ◽

Facial Images ◽

Decision Forest

Automatic recognition of emotion is important for facilitating seamless interactivity between a human being and intelligent robot towards the full realization of a smart society. The methods of signal processing and machine learning are widely applied to recognize human emotions based on features extracted from facial images, video files or speech signals. However, these features were not able to recognize the fear emotion with the same level of precision as other emotions. The authors propose the agglutination of prosodic and spectral features from a group of carefully selected features to realize hybrid acoustic features for improving the task of emotion recognition. Experiments were performed to test the effectiveness of the proposed features extracted from speech files of two public databases and used to train five popular ensemble learning algorithms. Results show that random decision forest ensemble learning of the proposed hybrid acoustic features is highly effective for speech emotion recognition.

Download Full-text

A Study on a Speech Emotion Recognition System with Effective Acoustic Features Using Deep Learning Algorithms

Applied Sciences ◽

10.3390/app11041890 ◽

2021 ◽

Vol 11 (4) ◽

pp. 1890

Author(s):

Sung-Woo Byun ◽

Seok-Pil Lee

Keyword(s):

Neural Network ◽

Emotion Recognition ◽

Network Model ◽

Recurrent Neural Network ◽

Neural Network Model ◽

Recognition Performance ◽

Recognition System ◽

Speech Emotion Recognition ◽

Acoustic Features ◽

Speech Database

The goal of the human interface is to recognize the user’s emotional state precisely. In the speech emotion recognition study, the most important issue is the effective parallel use of the extraction of proper speech features and an appropriate classification engine. Well defined speech databases are also needed to accurately recognize and analyze emotions from speech signals. In this work, we constructed a Korean emotional speech database for speech emotion analysis and proposed a feature combination that can improve emotion recognition performance using a recurrent neural network model. To investigate the acoustic features, which can reflect distinct momentary changes in emotional expression, we extracted F0, Mel-frequency cepstrum coefficients, spectral features, harmonic features, and others. Statistical analysis was performed to select an optimal combination of acoustic features that affect the emotion from speech. We used a recurrent neural network model to classify emotions from speech. The results show the proposed system has more accurate performance than previous studies.

Download Full-text

Survey of Deep Representation Learning for Speech Emotion Recognition

10.36227/techrxiv.16689484.v1 ◽

2021 ◽

Author(s):

Siddique Latif ◽

Rajib Rana ◽

Sara Khalifa ◽

Raja Jurdak ◽

Junaid Qadir ◽

...

Keyword(s):

Emotion Recognition ◽

General Setting ◽

Representation Learning ◽

Data Driven ◽

Speech Emotion Recognition ◽

Feature Engineering ◽

Acoustic Features ◽

Learning Techniques ◽

Comprehensive Survey ◽

Hierarchical Representations

<div>Traditionally, speech emotion recognition (SER) research has relied on manually handcrafted acoustic features using feature engineering. However, the design of handcrafted features for complex SER tasks requires significant manual effort, which impedes generalisability and slows the pace of innovation. This has motivated the adoption of representation learning techniques that can automatically learn an intermediate representation of the input signal without any manual feature engineering. Representation learning has led to improved SER performance and enabled rapid innovation. Its effectiveness has further increased with advances in deep learning (DL), which has facilitated deep representation learning where hierarchical representations are automatically learned in a data-driven manner. This paper presents the first comprehensive survey on the important topic of deep representation learning for SER. We highlight various techniques, related challenges and identify important future areas of research. Our survey bridges the gap in the literature since existing surveys either focus on SER with hand-engineered features or representation learning in the general setting without focusing on SER.</div>

Download Full-text