An Audio Processing Approach using Ensemble Learning for Speech-Emotion Recognition for Children with ASD

AbstractReceiving an accurate emotional response from robots has been a challenging task for researchers for the past few years. With the advancements in technology, robots like service robots interact with users of different cultural and lingual backgrounds. The traditional approach towards speech emotion recognition cannot be utilized to enable the robot and give an efficient and emotional response. The conventional approach towards speech emotion recognition uses the same corpus for both training and testing of classifiers to detect accurate emotions, but this approach cannot be generalized for multi-lingual environments, which is a requirement for robots used by people all across the globe. In this paper, a series of experiments are conducted to highlight an ensemble learning effect using a majority voting technique for cross-corpus, multi-lingual speech emotion recognition system. A comparison of the performance of an ensemble learning approach against traditional machine learning algorithms is performed. This study tests a classifier’s performance trained on one corpus with data from another corpus to evaluate its efficiency for multi-lingual emotion detection. According to experimental analysis, different classifiers give the highest accuracy for different corpora. Using an ensemble learning approach gives the benefit of combining all classifiers’ effect instead of choosing one classifier and compromising certain language corpus’s accuracy. Experiments show an increased accuracy of 13% for Urdu corpus, 8% for German corpus, 11% for Italian corpus, and 5% for English corpus from with-in corpus testing. For cross-corpus experiments, an improvement of 2% when training on Urdu data and testing on German data and 15% when training on Urdu data and testing on Italian data is achieved. An increase of 7% in accuracy is obtained when testing on Urdu data and training on German data, 3% when testing on Urdu data and training on Italian data, and 5% when testing on Urdu data and training on English data. Experiments prove that the ensemble learning approach gives promising results against other state-of-the-art techniques.

Download Full-text

Ensemble Learning of Hybrid Acoustic Features for Speech Emotion Recognition

Algorithms ◽

10.3390/a13030070 ◽

2020 ◽

Vol 13 (3) ◽

pp. 70 ◽

Cited By ~ 5

Author(s):

Kudakwashe Zvarevashe ◽

Oludayo Olugbara

Keyword(s):

Emotion Recognition ◽

Ensemble Learning ◽

Automatic Recognition ◽

Speech Emotion Recognition ◽

Spectral Features ◽

Intelligent Robot ◽

Acoustic Features ◽

Random Decision Forest ◽

Facial Images ◽

Decision Forest

Automatic recognition of emotion is important for facilitating seamless interactivity between a human being and intelligent robot towards the full realization of a smart society. The methods of signal processing and machine learning are widely applied to recognize human emotions based on features extracted from facial images, video files or speech signals. However, these features were not able to recognize the fear emotion with the same level of precision as other emotions. The authors propose the agglutination of prosodic and spectral features from a group of carefully selected features to realize hybrid acoustic features for improving the task of emotion recognition. Experiments were performed to test the effectiveness of the proposed features extracted from speech files of two public databases and used to train five popular ensemble learning algorithms. Results show that random decision forest ensemble learning of the proposed hybrid acoustic features is highly effective for speech emotion recognition.

Download Full-text

Ensemble Learning With Attention-Integrated Convolutional Recurrent Neural Network for Imbalanced Speech Emotion Recognition

IEEE Access ◽

10.1109/access.2020.3035910 ◽

2020 ◽

Vol 8 ◽

pp. 199909-199919

Author(s):

Xusheng Ai ◽

Victor S. Sheng ◽

Wei Fang ◽

Charles X. Ling ◽

Chunhua Li

Keyword(s):

Neural Network ◽

Emotion Recognition ◽

Ensemble Learning ◽

Recurrent Neural Network ◽

Speech Emotion Recognition

Download Full-text

An Ensemble Model for Multi-Level Speech Emotion Recognition

Applied Sciences ◽

10.3390/app10010205 ◽

2019 ◽

Vol 10 (1) ◽

pp. 205 ◽

Cited By ~ 5

Author(s):

Chunjun Zheng ◽

Chunli Wang ◽

Ning Jia

Keyword(s):

Neural Network ◽

Emotion Recognition ◽

Ensemble Learning ◽

Short Term Memory ◽

Learning Model ◽

Local Features ◽

Speech Emotion Recognition ◽

Model Design ◽

Local Data ◽

Global Features

Speech emotion recognition is a challenging and widely examined research topic in the field of speech processing. The accuracy of existing models in speech emotion recognition tasks is not high, and the generalization ability is not strong. Since the feature set and model design of effective speech directly affect the accuracy of speech emotion recognition, research on features and models is important. Because emotional expression is often correlated with the global features, local features, and model design of speech, it is often difficult to find a universal solution for effective speech emotion recognition. Based on this, the main research purpose of this paper is to generate general emotion features in speech signals from different angles, and use the ensemble learning model to perform emotion recognition tasks. It is divided into the following aspects: (1) Three expert roles of speech emotion recognition are designed. Expert 1 focuses on three-dimensional feature extraction of local signals; expert 2 focuses on extraction of comprehensive information in local data; and expert 3 emphasizes global features: acoustic feature descriptors (low-level descriptors (LLDs)), high-level statistics functionals (HSFs), and local features and their timing relationships. A single-/multiple-level deep learning model that meets expert characteristics is designed for each expert, including convolutional neural network (CNN), bi-directional long short-term memory (BLSTM), and gated recurrent unit (GRU). Convolutional recurrent neural network (CRNN), based on a combination of an attention mechanism, is used for internal training of experts. (2) By designing an ensemble learning model, each expert can play to its own advantages and evaluate speech emotions from different focuses. (3) Through experiments, the performance of various experts and ensemble learning models in emotion recognition is compared in the Interactive Emotional Dyadic Motion Capture (IEMOCAP) corpus and the validity of the proposed model is verified.

Download Full-text

Recognition of Cross-Language Acoustic Emotional Valence Using Stacked Ensemble Learning

Algorithms ◽

10.3390/a13100246 ◽

2020 ◽

Vol 13 (10) ◽

pp. 246

Author(s):

Kudakwashe Zvarevashe ◽

Oludayo O. Olugbara

Keyword(s):

Speech Recognition ◽

Emotion Recognition ◽

Ensemble Learning ◽

Learning Algorithm ◽

Recognition Task ◽

Emotional Valence ◽

Recursive Feature Elimination ◽

Speech Emotion Recognition ◽

Gradient Boosting ◽

Cross Language

Most of the studies on speech emotion recognition have used single-language corpora, but little research has been done in cross-language valence speech emotion recognition. Research has shown that the models developed for single-language speech recognition systems perform poorly when used in different environments. Cross-language speech recognition is a craving alternative, but it is highly challenging because the corpora used will have been recorded in different environments and under varying conditions. The differences in the quality of recording devices, elicitation techniques, languages, and accents of speakers make the recognition task even more arduous. In this paper, we propose a stacked ensemble learning algorithm to recognize valence emotion in a cross-language speech environment. The proposed ensemble algorithm was developed from random decision forest, AdaBoost, logistic regression, and gradient boosting machine and is therefore called RALOG. In addition, we propose feature scaling using random forest recursive feature elimination and a feature selection algorithm to boost the performance of RALOG. The algorithm has been evaluated against four widely used ensemble algorithms to appraise its performance. The amalgam of five benchmarked corpora has resulted in a cross-language corpus to validate the performance of RALOG trained with the selected acoustic features. The comparative analysis results have shown that RALOG gave better performance than the other ensemble learning algorithms investigated in this study.

Download Full-text

Remediation of Emotion Recognition Deficits in Children With ASD

PsycEXTRA Dataset ◽

10.1037/e624152010-001 ◽

2010 ◽

Author(s):

Paige Weinger ◽

Richard Depue

Keyword(s):

Emotion Recognition ◽

Children With Asd

Download Full-text

Speech Emotion Recognition Based on Sparse Representation

Archives of Acoustics ◽

10.2478/aoa-2013-0055 ◽

2013 ◽

Vol 38 (4) ◽

pp. 465-470 ◽

Cited By ~ 11

Author(s):

Jingjie Yan ◽

Xiaolan Wang ◽

Weiyi Gu ◽

LiLi Ma

Keyword(s):

Dimensionality Reduction ◽

Emotion Recognition ◽

Least Squares ◽

Partial Least Squares ◽

Partial Least Squares Regression ◽

Speech Emotion Recognition ◽

Least Squares Regression ◽

Computer Science Pedagogy ◽

Reduction Methods ◽

Analysis Computer

Abstract Speech emotion recognition is deemed to be a meaningful and intractable issue among a number of do- mains comprising sentiment analysis, computer science, pedagogy, and so on. In this study, we investigate speech emotion recognition based on sparse partial least squares regression (SPLSR) approach in depth. We make use of the sparse partial least squares regression method to implement the feature selection and dimensionality reduction on the whole acquired speech emotion features. By the means of exploiting the SPLSR method, the component parts of those redundant and meaningless speech emotion features are lessened to zero while those serviceable and informative speech emotion features are maintained and selected to the following classification step. A number of tests on Berlin database reveal that the recogni- tion rate of the SPLSR method can reach up to 79.23% and is superior to other compared dimensionality reduction methods.

Download Full-text