Pattern Recognition Methods and Features Selection for Speech Emotion Recognition System

The impact of the classification method and features selection for the speech emotion recognition accuracy is discussed in this paper. Selecting the correct parameters in combination with the classifier is an important part of reducing the complexity of system computing. This step is necessary especially for systems that will be deployed in real-time applications. The reason for the development and improvement of speech emotion recognition systems is wide usability in nowadays automatic voice controlled systems. Berlin database of emotional recordings was used in this experiment. Classification accuracy of artificial neural networks,k-nearest neighbours, and Gaussian mixture model is measured considering the selection of prosodic, spectral, and voice quality features. The purpose was to find an optimal combination of methods and group of features for stress detection in human speech. The research contribution lies in the design of the speech emotion recognition system due to its accuracy and efficiency.

Download Full-text

Practical Speech Emotion Recognition Based on Online Learning: From Acted Data to Elicited Data

Mathematical Problems in Engineering ◽

10.1155/2013/265819 ◽

2013 ◽

Vol 2013 ◽

pp. 1-9 ◽

Cited By ~ 4

Author(s):

Chengwei Huang ◽

Ruiyu Liang ◽

Qingyun Wang ◽

Ji Xi ◽

Cheng Zha ◽

...

Keyword(s):

Online Learning ◽

Emotion Recognition ◽

Cognitive Task ◽

Gaussian Mixture ◽

Recognition System ◽

Speech Emotion Recognition ◽

Independent Data ◽

Data Set ◽

Speaker Independent ◽

Maximal Information Coefficient

We study the cross-database speech emotion recognition based on online learning. How to apply a classifier trained on acted data to naturalistic data, such as elicited data, remains a major challenge in today’s speech emotion recognition system. We introduce three types of different data sources: first, a basic speech emotion dataset which is collected from acted speech by professional actors and actresses; second, a speaker-independent data set which contains a large number of speakers; third, an elicited speech data set collected from a cognitive task. Acoustic features are extracted from emotional utterances and evaluated by using maximal information coefficient (MIC). A baseline valence and arousal classifier is designed based on Gaussian mixture models. Online training module is implemented by using AdaBoost. While the offline recognizer is trained on the acted data, the online testing data includes the speaker-independent data and the elicited data. Experimental results show that by introducing the online learning module our speech emotion recognition system can be better adapted to new data, which is an important character in real world applications.

Download Full-text

Speech Emotion Recognition System Using Gaussian Mixture Model and Improvement proposed via Boosted GMM

IRA-International Journal of Technology & Engineering (ISSN 2455-4480) ◽

10.21013/jte.icsesd201706 ◽

2017 ◽

Vol 7 (2 (S)) ◽

pp. 56

Author(s):

Pavitra Patel ◽

A. A. Chaudhari ◽

M. A. Pund ◽

D. H. Deshmukh

Keyword(s):

Emotion Recognition ◽

Speech Signal ◽

Gaussian Mixture ◽

Recognition System ◽

Training Data ◽

Speech Emotion Recognition ◽

Human Beings ◽

Human Machine Interaction ◽

Data Set ◽

Communication Partner

<p>Speech emotion recognition is an important issue which affects the human machine interaction. Automatic recognition of human emotion in speech aims at recognizing the underlying emotional state of a speaker from the speech signal. Gaussian mixture models (GMMs) and the minimum error rate classifier (i.e. Bayesian optimal classifier) are popular and effective tools for speech emotion recognition. Typically, GMMs are used to model the class-conditional distributions of acoustic features and their parameters are estimated by the expectation maximization (EM) algorithm based on a training data set. In this paper, we introduce a boosting algorithm for reliably and accurately estimating the class-conditional GMMs. The resulting algorithm is named the Boosted-GMM algorithm. Our speech emotion recognition experiments show that the emotion recognition rates are effectively and significantly boosted by the Boosted-GMM algorithm as compared to the EM-GMM algorithm.<br />During this interaction, human beings have some feelings that they want to convey to their communication partner with whom they are communicating, and then their communication partner may be the human or machine. This work dependent on the emotion recognition of the human beings from their speech signal<br />Emotion recognition from the speaker’s speech is very difficult because of the following reasons: Because of the existence of the different sentences, speakers, speaking styles, speaking rates accosting variability was introduced. The same utterance may show different emotions. Therefore it is very difficult to differentiate these portions of utterance. Another problem is that emotion expression is depending on the speaker and his or her culture and environment. As the culture and environment gets change the speaking style also gets change, which is another challenge in front of the speech emotion recognition system.</p>

Download Full-text

Pattern recognition and features selection for speech emotion recognition model using deep learning

International Journal of Speech Technology ◽

10.1007/s10772-020-09690-2 ◽

2020 ◽

Vol 23 (4) ◽

pp. 799-806

Author(s):

Kittisak Jermsittiparsert ◽

Abdurrahman Abdurrahman ◽

Parinya Siriattakul ◽

Ludmila A. Sundeeva ◽

Wahidah Hashim ◽

...

Keyword(s):

Pattern Recognition ◽

Deep Learning ◽

Emotion Recognition ◽

Speech Emotion Recognition ◽

Features Selection ◽

Recognition Model ◽

Selection For

Download Full-text

Implementation and Comparison of Speech Emotion Recognition System Using Gaussian Mixture Model (GMM) and K- Nearest Neighbor (K-NN) Techniques

Procedia Computer Science ◽

10.1016/j.procs.2015.04.226 ◽

2015 ◽

Vol 49 ◽

pp. 50-57 ◽

Cited By ~ 33

Author(s):

Rahul B. Lanjewar ◽

Swarup Mathurkar ◽

Nilesh Patel

Keyword(s):

Emotion Recognition ◽

Gaussian Mixture Model ◽

Mixture Model ◽

Nearest Neighbor ◽

Gaussian Mixture ◽

Recognition System ◽

Speech Emotion Recognition ◽

K Nearest Neighbor

Download Full-text

Comparison of Several Acoustic Modeling Techniques for Speech Emotion Recognition

Cognitive Analytics ◽

10.4018/978-1-7998-2460-2.ch015 ◽

2020 ◽

pp. 283-293

Author(s):

Imen Trabelsi ◽

Med Salim Bouhlel

Keyword(s):

Emotion Recognition ◽

Linear Prediction ◽

Recognition Rate ◽

Gaussian Mixture Models ◽

Gaussian Mixture ◽

Recognition System ◽

Speech Emotion Recognition ◽

Support Vector ◽

Emotional States ◽

Perceptual Linear Prediction

Automatic Speech Emotion Recognition (SER) is a current research topic in the field of Human Computer Interaction (HCI) with a wide range of applications. The purpose of speech emotion recognition system is to automatically classify speaker's utterances into different emotional states such as disgust, boredom, sadness, neutral, and happiness. The speech samples in this paper are from the Berlin emotional database. Mel Frequency cepstrum coefficients (MFCC), Linear prediction coefficients (LPC), linear prediction cepstrum coefficients (LPCC), Perceptual Linear Prediction (PLP) and Relative Spectral Perceptual Linear Prediction (Rasta-PLP) features are used to characterize the emotional utterances using a combination between Gaussian mixture models (GMM) and Support Vector Machines (SVM) based on the Kullback-Leibler Divergence Kernel. In this study, the effect of feature type and its dimension are comparatively investigated. The best results are obtained with 12-coefficient MFCC. Utilizing the proposed features a recognition rate of 84% has been achieved which is close to the performance of humans on this database.

Download Full-text