scholarly journals Speech Emotion Recognition: Performance Analysis based on Fused Algorithms and GMM Modelling

Author(s):  
R. Subhashree ◽  
G. N. Rathna
Sensors ◽  
2020 ◽  
Vol 20 (18) ◽  
pp. 5212 ◽  
Author(s):  
Tursunov Anvarjon ◽  
Mustaqeem ◽  
Soonil Kwon

Artificial intelligence (AI) and machine learning (ML) are employed to make systems smarter. Today, the speech emotion recognition (SER) system evaluates the emotional state of the speaker by investigating his/her speech signal. Emotion recognition is a challenging task for a machine. In addition, making it smarter so that the emotions are efficiently recognized by AI is equally challenging. The speech signal is quite hard to examine using signal processing methods because it consists of different frequencies and features that vary according to emotions, such as anger, fear, sadness, happiness, boredom, disgust, and surprise. Even though different algorithms are being developed for the SER, the success rates are very low according to the languages, the emotions, and the databases. In this paper, we propose a new lightweight effective SER model that has a low computational complexity and a high recognition accuracy. The suggested method uses the convolutional neural network (CNN) approach to learn the deep frequency features by using a plain rectangular filter with a modified pooling strategy that have more discriminative power for the SER. The proposed CNN model was trained on the extracted frequency features from the speech data and was then tested to predict the emotions. The proposed SER model was evaluated over two benchmarks, which included the interactive emotional dyadic motion capture (IEMOCAP) and the berlin emotional speech database (EMO-DB) speech datasets, and it obtained 77.01% and 92.02% recognition results. The experimental results demonstrated that the proposed CNN-based SER system can achieve a better recognition performance than the state-of-the-art SER systems.


2021 ◽  
Vol 11 (4) ◽  
pp. 1890
Author(s):  
Sung-Woo Byun ◽  
Seok-Pil Lee

The goal of the human interface is to recognize the user’s emotional state precisely. In the speech emotion recognition study, the most important issue is the effective parallel use of the extraction of proper speech features and an appropriate classification engine. Well defined speech databases are also needed to accurately recognize and analyze emotions from speech signals. In this work, we constructed a Korean emotional speech database for speech emotion analysis and proposed a feature combination that can improve emotion recognition performance using a recurrent neural network model. To investigate the acoustic features, which can reflect distinct momentary changes in emotional expression, we extracted F0, Mel-frequency cepstrum coefficients, spectral features, harmonic features, and others. Statistical analysis was performed to select an optimal combination of acoustic features that affect the emotion from speech. We used a recurrent neural network model to classify emotions from speech. The results show the proposed system has more accurate performance than previous studies.


2015 ◽  
Vol 781 ◽  
pp. 551-554 ◽  
Author(s):  
Chaidiaw Thiangtham ◽  
Jakkree Srinonchat

Speech Emotion Recognition has widely researched and applied to some appllication such as for communication with robot, E-learning system and emergency call etc.Speech emotion feature extraction is an importance key to achieve the speech emotion recognition which can be classify for personal identity. Speech emotion features are extracted into several coefficients such as Linear Predictive Coefficients (LPCs), Linear Spectral Frequency (LSF), Zero-Crossing (ZC), Mel-Frequency Cepstrum Coefficients (MFCC) [1-6] etc. There are some of research works which have been done in the speech emotion recgnition. A study of zero-crossing with peak-amplitudes in speech emotion classification is introduced in [4]. The results shown that it provides the the technique to extract the emotion feature in time-domain, which still got the problem in amplitude shifting. The emotion recognition from speech is descrpited in [5]. It used the Gaussian Mixture Model (GMM) for extractor of feature speech. The GMM is provided the good results to reduce the back ground noise, howere it still have to focus on random noise in GMM for recognition model. The speech emotion recognition using hidden markov model and support vector machine is explained in [6]. The results shown the average performance of recognition system according to the features of speech emotion still has got the error information. Thus [1-6] provides the recognition performance which still requiers more focus on speech features.


2017 ◽  
Vol 2017 ◽  
pp. 1-7 ◽  
Author(s):  
Xiaoqing Jiang ◽  
Kewen Xia ◽  
Lingyin Wang ◽  
Yongliang Lin

The selection of feature subset is a crucial aspect in speech emotion recognition problem. In this paper, a Reordering Features with Weights Fusion (RFWF) algorithm is proposed for selecting more effective and compact feature subset. The RFWF algorithm fuses the weights reflecting the relevance, complementarity, and redundancy between features and classes comprehensively and implements the reordering of features to construct feature subset with excellent emotional recognizability. A binary-tree structured multiple-kernel SVM classifier is adopted in emotion recognition. And different feature subsets are selected in different nodes of the classifier. The highest recognition accuracy of the five emotions in Berlin database is 90.549% with only 15 features selected by RFWF. The experimental results show the effectiveness of RFWF in building feature subset and the utilization of different feature subsets for specified emotions can improve the overall recognition performance.


Sensors ◽  
2019 ◽  
Vol 19 (12) ◽  
pp. 2730 ◽  
Author(s):  
Wei Jiang ◽  
Zheng Wang ◽  
Jesse S. Jin ◽  
Xianfeng Han ◽  
Chunguang Li

Automatic speech emotion recognition is a challenging task due to the gap between acoustic features and human emotions, which rely strongly on the discriminative acoustic features extracted for a given recognition task. We propose a novel deep neural architecture to extract the informative feature representations from the heterogeneous acoustic feature groups which may contain redundant and unrelated information leading to low emotion recognition performance in this work. After obtaining the informative features, a fusion network is trained to jointly learn the discriminative acoustic feature representation and a Support Vector Machine (SVM) is used as the final classifier for recognition task. Experimental results on the IEMOCAP dataset demonstrate that the proposed architecture improved the recognition performance, achieving accuracy of 64% compared to existing state-of-the-art approaches.


2015 ◽  
Vol 2015 ◽  
pp. 1-13 ◽  
Author(s):  
Hariharan Muthusamy ◽  
Kemal Polat ◽  
Sazali Yaacob

Recently, researchers have paid escalating attention to studying the emotional state of an individual from his/her speech signals as the speech signal is the fastest and the most natural method of communication between individuals. In this work, new feature enhancement using Gaussian mixture model (GMM) was proposed to enhance the discriminatory power of the features extracted from speech and glottal signals. Three different emotional speech databases were utilized to gauge the proposed methods. Extreme learning machine (ELM) andk-nearest neighbor (kNN) classifier were employed to classify the different types of emotions. Several experiments were conducted and results show that the proposed methods significantly improved the speech emotion recognition performance compared to research works published in the literature.


2013 ◽  
Vol 38 (4) ◽  
pp. 465-470 ◽  
Author(s):  
Jingjie Yan ◽  
Xiaolan Wang ◽  
Weiyi Gu ◽  
LiLi Ma

Abstract Speech emotion recognition is deemed to be a meaningful and intractable issue among a number of do- mains comprising sentiment analysis, computer science, pedagogy, and so on. In this study, we investigate speech emotion recognition based on sparse partial least squares regression (SPLSR) approach in depth. We make use of the sparse partial least squares regression method to implement the feature selection and dimensionality reduction on the whole acquired speech emotion features. By the means of exploiting the SPLSR method, the component parts of those redundant and meaningless speech emotion features are lessened to zero while those serviceable and informative speech emotion features are maintained and selected to the following classification step. A number of tests on Berlin database reveal that the recogni- tion rate of the SPLSR method can reach up to 79.23% and is superior to other compared dimensionality reduction methods.


2019 ◽  
Author(s):  
Alex Bertrams ◽  
Katja Schlegel

People high in autistic-like traits have been found to have difficulties with recognizing emotions from nonverbal expressions. However, findings on the autism—emotion recognition relationship are inconsistent. In the present study, we investigated whether speeded reasoning ability (reasoning performance under time pressure) moderates the inverse relationship between autistic-like traits and emotion recognition performance. We expected the negative correlation between autistic-like traits and emotion recognition to be less strong when speeded reasoning ability was high. MTurkers (N = 217) completed the ten item version of the Autism Spectrum Quotient (AQ-10), two emotion recognition tests using videos with sound (Geneva Emotion Recognition Test, GERT-S) and pictures (Reading the Mind in the Eyes Test, RMET), and Baddeley's Grammatical Reasoning test to measure speeded reasoning. As expected, the higher the ability in speeded reasoning, the less were higher autistic-like traits related to lower emotion recognition performance. These results suggest that a high ability in making quick mental inferences may (partly) compensate for difficulties with intuitive emotion recognition related to autistic-like traits.


Sign in / Sign up

Export Citation Format

Share Document