Cross corpus multi-lingual speech emotion recognition using ensemble learning

AbstractReceiving an accurate emotional response from robots has been a challenging task for researchers for the past few years. With the advancements in technology, robots like service robots interact with users of different cultural and lingual backgrounds. The traditional approach towards speech emotion recognition cannot be utilized to enable the robot and give an efficient and emotional response. The conventional approach towards speech emotion recognition uses the same corpus for both training and testing of classifiers to detect accurate emotions, but this approach cannot be generalized for multi-lingual environments, which is a requirement for robots used by people all across the globe. In this paper, a series of experiments are conducted to highlight an ensemble learning effect using a majority voting technique for cross-corpus, multi-lingual speech emotion recognition system. A comparison of the performance of an ensemble learning approach against traditional machine learning algorithms is performed. This study tests a classifier’s performance trained on one corpus with data from another corpus to evaluate its efficiency for multi-lingual emotion detection. According to experimental analysis, different classifiers give the highest accuracy for different corpora. Using an ensemble learning approach gives the benefit of combining all classifiers’ effect instead of choosing one classifier and compromising certain language corpus’s accuracy. Experiments show an increased accuracy of 13% for Urdu corpus, 8% for German corpus, 11% for Italian corpus, and 5% for English corpus from with-in corpus testing. For cross-corpus experiments, an improvement of 2% when training on Urdu data and testing on German data and 15% when training on Urdu data and testing on Italian data is achieved. An increase of 7% in accuracy is obtained when testing on Urdu data and training on German data, 3% when testing on Urdu data and training on Italian data, and 5% when testing on Urdu data and training on English data. Experiments prove that the ensemble learning approach gives promising results against other state-of-the-art techniques.

Download Full-text

Speech Emotion Recognition on Indonesian YouTube Web Series Using Deep Learning Approach

2020 Fifth International Conference on Informatics and Computing (ICIC) ◽

10.1109/icic50835.2020.9288650 ◽

2020 ◽

Author(s):

Hanina Nuralifa Zahra ◽

Muhammad Okky Ibrohim ◽

Junaedi Fahmi ◽

Rike Adelia ◽

Fandy Akhmad Nur Febryanto ◽

...

Keyword(s):

Deep Learning ◽

Emotion Recognition ◽

Speech Emotion Recognition ◽

Learning Approach ◽

Web Series

Download Full-text

An Audio Processing Approach using Ensemble Learning for Speech-Emotion Recognition for Children with ASD

2021 IEEE World AI IoT Congress (AIIoT) ◽

10.1109/aiiot52608.2021.9454174 ◽

2021 ◽

Author(s):

Damian Valles ◽

Rezwan Matin

Keyword(s):

Emotion Recognition ◽

Ensemble Learning ◽

Speech Emotion Recognition ◽

Audio Processing ◽

Children With Asd ◽

Processing Approach

Download Full-text

Deep Learning Approach for Speech Emotion Recognition

Data Analytics and Management - Lecture Notes on Data Engineering and Communications Technologies ◽

10.1007/978-981-15-8335-3_29 ◽

2021 ◽

pp. 367-376

Author(s):

M. Kalpana Chowdary ◽

D. Jude Hemanth

Keyword(s):

Deep Learning ◽

Emotion Recognition ◽

Speech Emotion Recognition ◽

Learning Approach

Download Full-text

Speech Emotion Recognition-A Deep Learning Approach

10.1109/i-smac52330.2021.9640995 ◽

2021 ◽

Author(s):

Asiya U A ◽

Kiran V K

Keyword(s):

Deep Learning ◽

Emotion Recognition ◽

Speech Emotion Recognition ◽

Learning Approach

Download Full-text

Unsupervised Learning Approach to Feature Analysis for Automatic Speech Emotion Recognition

2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) ◽

10.1109/icassp.2018.8462685 ◽

2018 ◽

Cited By ~ 8

Author(s):

Sefik Emre Eskimez ◽

Zhiyao Duan ◽

Wendi Heinzelman

Keyword(s):

Emotion Recognition ◽

Unsupervised Learning ◽

Feature Analysis ◽

Speech Emotion Recognition ◽

Learning Approach

Download Full-text

Random Deep Belief Networks for Recognizing Emotions from Speech Signals

Computational Intelligence and Neuroscience ◽

10.1155/2017/1945630 ◽

2017 ◽

Vol 2017 ◽

pp. 1-9 ◽

Cited By ~ 19

Author(s):

Guihua Wen ◽

Huihui Li ◽

Jubing Huang ◽

Danyang Li ◽

Eryang Xun

Keyword(s):

Emotion Recognition ◽

Speech Signal ◽

Majority Voting ◽

Speech Emotion Recognition ◽

Speech Signals ◽

Belief Networks ◽

Deep Belief Networks ◽

Emotion Label ◽

The Rich ◽

Emotion Labels

Now the human emotions can be recognized from speech signals using machine learning methods; however, they are challenged by the lower recognition accuracies in real applications due to lack of the rich representation ability. Deep belief networks (DBN) can automatically discover the multiple levels of representations in speech signals. To make full of its advantages, this paper presents an ensemble of random deep belief networks (RDBN) method for speech emotion recognition. It firstly extracts the low level features of the input speech signal and then applies them to construct lots of random subspaces. Each random subspace is then provided for DBN to yield the higher level features as the input of the classifier to output an emotion label. All outputted emotion labels are then fused through the majority voting to decide the final emotion label for the input speech signal. The conducted experimental results on benchmark speech emotion databases show that RDBN has better accuracy than the compared methods for speech emotion recognition.

Download Full-text

Ensemble Learning Approaches Based on Covariance Pooling of CNN Features for High Resolution Remote Sensing Scene Classification

Remote Sensing ◽

10.3390/rs12203292 ◽

2020 ◽

Vol 12 (20) ◽

pp. 3292

Author(s):

Sara Akodad ◽

Lionel Bombrun ◽

Junshi Xia ◽

Yannick Berthoumieu ◽

Christian Germain

Keyword(s):

Machine Learning ◽

Remote Sensing ◽

Ensemble Learning ◽

Covariance Matrices ◽

Machine Learning Algorithms ◽

Learning Approach ◽

Learning Approaches ◽

Scene Classification ◽

First Order ◽

Fisher Vector Encoding

Remote sensing image scene classification, which consists of labeling remote sensing images with a set of categories based on their content, has received remarkable attention for many applications such as land use mapping. Standard approaches are based on the multi-layer representation of first-order convolutional neural network (CNN) features. However, second-order CNNs have recently been shown to outperform traditional first-order CNNs for many computer vision tasks. Hence, the aim of this paper is to show the use of second-order statistics of CNN features for remote sensing scene classification. This takes the form of covariance matrices computed locally or globally on the output of a CNN. However, these datapoints do not lie in an Euclidean space but a Riemannian manifold. To manipulate them, Euclidean tools are not adapted. Other metrics should be considered such as the log-Euclidean one. This consists of projecting the set of covariance matrices on a tangent space defined at a reference point. In this tangent plane, which is a vector space, conventional machine learning algorithms can be considered, such as the Fisher vector encoding or SVM classifier. Based on this log-Euclidean framework, we propose a novel transfer learning approach composed of two hybrid architectures based on covariance pooling of CNN features, the first is local and the second is global. They rely on the extraction of features from models pre-trained on the ImageNet dataset processed with some machine learning algorithms. The first hybrid architecture consists of an ensemble learning approach with the log-Euclidean Fisher vector encoding of region covariance matrices computed locally on the first layers of a CNN. The second one concerns an ensemble learning approach based on the covariance pooling of CNN features extracted globally from the deepest layers. These two ensemble learning approaches are then combined together based on the strategy of the most diverse ensembles. For validation and comparison purposes, the proposed approach is tested on various challenging remote sensing datasets. Experimental results exhibit a significant gain of approximately 2% in overall accuracy for the proposed approach compared to a similar state-of-the-art method based on covariance pooling of CNN features (on the UC Merced dataset).

Download Full-text

Speech emotion recognition with ensemble learning methods

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) ◽

10.1109/icassp.2017.7952658 ◽

2017 ◽

Author(s):

Po-Yuan Shih ◽

Chia-Ping Chen ◽

Chung-Hsien Wu

Keyword(s):

Emotion Recognition ◽

Ensemble Learning ◽

Speech Emotion Recognition ◽

Learning Methods

Download Full-text

An Ensemble Learning Approach for Electrocardiogram Sensor Based Human Emotion Recognition

Sensors ◽

10.3390/s19204495 ◽

2019 ◽

Vol 19 (20) ◽

pp. 4495 ◽

Cited By ~ 9

Author(s):

Theekshana Dissanayake ◽

Yasitha Rajapaksha ◽

Roshan Ragel ◽

Isuru Nawinne

Keyword(s):

Machine Learning ◽

Feature Extraction ◽

Emotion Recognition ◽

Ensemble Learning ◽

Classification Accuracy ◽

Extraction Methods ◽

Learning Approach ◽

Human Emotion ◽

Different Types ◽

Human Emotions

Recently, researchers in the area of biosensor based human emotion recognition have used different types of machine learning models for recognizing human emotions. However, most of them still lack the ability to recognize human emotions with higher classification accuracy incorporating a limited number of bio-sensors. In the domain of machine learning, ensemble learning methods have been successfully applied to solve different types of real-world machine learning problems which require improved classification accuracies. Emphasising on that, this research suggests an ensemble learning approach for developing a machine learning model that can recognize four major human emotions namely: anger; sadness; joy; and pleasure incorporating electrocardiogram (ECG) signals. As feature extraction methods, this analysis combines four ECG signal based techniques, namely: heart rate variability; empirical mode decomposition; with-in beat analysis; and frequency spectrum analysis. The first three feature extraction methods are well-known ECG based feature extraction techniques mentioned in the literature, and the fourth technique is a novel method proposed in this study. The machine learning procedure of this investigation evaluates the performance of a set of well-known ensemble learners for emotion classification and further improves the classification results using feature selection as a prior step to ensemble model training. Compared to the best performing single biosensor based model in the literature, the developed ensemble learner has the accuracy gain of 10.77%. Furthermore, the developed model outperforms most of the multiple biosensor based emotion recognition models with a significantly higher classification accuracy gain.

Download Full-text

Ensemble Learning of Hybrid Acoustic Features for Speech Emotion Recognition

Algorithms ◽

10.3390/a13030070 ◽

2020 ◽

Vol 13 (3) ◽

pp. 70 ◽

Cited By ~ 5

Author(s):

Kudakwashe Zvarevashe ◽

Oludayo Olugbara

Keyword(s):

Emotion Recognition ◽

Ensemble Learning ◽

Automatic Recognition ◽

Speech Emotion Recognition ◽

Spectral Features ◽

Intelligent Robot ◽

Acoustic Features ◽

Random Decision Forest ◽

Facial Images ◽

Decision Forest

Automatic recognition of emotion is important for facilitating seamless interactivity between a human being and intelligent robot towards the full realization of a smart society. The methods of signal processing and machine learning are widely applied to recognize human emotions based on features extracted from facial images, video files or speech signals. However, these features were not able to recognize the fear emotion with the same level of precision as other emotions. The authors propose the agglutination of prosodic and spectral features from a group of carefully selected features to realize hybrid acoustic features for improving the task of emotion recognition. Experiments were performed to test the effectiveness of the proposed features extracted from speech files of two public databases and used to train five popular ensemble learning algorithms. Results show that random decision forest ensemble learning of the proposed hybrid acoustic features is highly effective for speech emotion recognition.

Download Full-text