Emotion classification from speech signal based on empirical mode decomposition and non-linear features

AbstractEmotion recognition system from speech signal is a widely researched topic in the design of the Human–Computer Interface (HCI) models, since it provides insights into the mental states of human beings. Often, it is required to identify the emotional condition of the humans as cognitive feedback in the HCI. In this paper, an attempt to recognize seven emotional states from speech signals, known as sad, angry, disgust, happy, surprise, pleasant, and neutral sentiment, is investigated. The proposed method employs a non-linear signal quantifying method based on randomness measure, known as the entropy feature, for the detection of emotions. Initially, the speech signals are decomposed into Intrinsic Mode Function (IMF), where the IMF signals are divided into dominant frequency bands such as the high frequency, mid-frequency , and base frequency. The entropy measures are computed directly from the high-frequency band in the IMF domain. However, for the mid- and base-band frequencies, the IMFs are averaged and their entropy measures are computed. A feature vector is formed from the computed entropy measures incorporating the randomness feature for all the emotional signals. Then, the feature vector is used to train a few state-of-the-art classifiers, such as Linear Discriminant Analysis (LDA), Naïve Bayes, K-Nearest Neighbor, Support Vector Machine, Random Forest, and Gradient Boosting Machine. A tenfold cross-validation, performed on a publicly available Toronto Emotional Speech dataset, illustrates that the LDA classifier presents a peak balanced accuracy of 93.3%, F1 score of 87.9%, and an area under the curve value of 0.995 in the recognition of emotions from speech signals of native English speakers.

Download Full-text

Gender-based speaker recognition from speech signals using GMM model

Modern Physics Letters B ◽

10.1142/s0217984919504384 ◽

2019 ◽

Vol 33 (35) ◽

pp. 1950438 ◽

Cited By ~ 1

Author(s):

Manish Gupta ◽

Shambhu Shankar Bharti ◽

Suneeta Agarwal

Keyword(s):

Speaker Recognition ◽

Speech Signal ◽

Speech Synthesis ◽

English Language ◽

Gaussian Mixture ◽

Support Vector ◽

Speech Signals ◽

Human Beings ◽

Mel Frequency Cepstral Coefficients ◽

Single Level

Speech is a convenient medium for communication among human beings. Speaker recognition is a process of automatically recognizing the speaker by processing the information included in the speech signal. In this paper, a new approach is proposed for speaker recognition through speech signal. Here, a two-level approach is proposed. In the first-level, the gender of the speaker is recognized, and in the second-level speaker is recognized based on recognized gender at first-level. After recognizing the gender of the speaker, search space is reduced to half for the second-level as speaker recognition system searches only in a set of speech signals belonging to identified gender. To identify gender, gender-specific features: Mel Frequency Cepstral Coefficients (MFCC) and pitch are used. Speaker is recognized by using speaker specific features: MFCC, Pitch and RASTA-PLP. Support Vector Machine (SVM) and Gaussian Mixture Model (GMM) classifiers are used for identifying the gender and recognizing the speaker, respectively. Experiments are performed on speech signals of two databases: “IIT-Madras speech synthesis and recognition” (containing speech samples spoken by eight male and eight female speakers of eight different regions in English language) and “ELSDSR” (containing speech samples spoken by five male and five female in English language). Experimentally, it is observed that by using two-level approach, time taken for speaker recognition is reduced by 30–32% as compared to the approach when speaker is recognized without identifying the gender (single-level approach). The accuracy of speaker recognition in this proposed approach is also improved from 99.7% to 99.9% as compared to single-level approach. It is concluded through the experiments that speech signal of a minimum 1.12 duration (after neglecting silence parts) is sufficient for recognizing the speaker.

Download Full-text

Impact of feature selection on system identification by means of NARX-SVM

MATEC Web of Conferences ◽

10.1051/matecconf/201925203012 ◽

2019 ◽

Vol 252 ◽

pp. 03012

Author(s):

Michał Awtoniuk ◽

Marcin Daniun ◽

Kinga Sałat ◽

Robert Sałat

Keyword(s):

Feature Selection ◽

System Identification ◽

Input Signal ◽

Output Signal ◽

Feature Vector ◽

Classical Method ◽

Support Vector ◽

Peltier Module ◽

Non Linear ◽

Selection Of

Support Vector Machines (SVM) are widely used in many fields of science, including system identification. The selection of feature vector plays a crucial role in SVM-based model building process. In this paper, we investigate the influence of the selection of feature vector on model’s quality. We have built an SVM model with a non-linear ARX (NARX) structure. The modelled system had a SISO structure, i.e. one input signal and one output signal. The output signal was temperature, which was controlled by a Peltier module. The supply voltage of the Peltier module was the input signal. The system had a non-linear characteristic. We have evaluated the model’s quality by the fit index. The classical feature selection of SVM with NARX structure comes down to a choice of the length of the regressor vector. For SISO models, this vector is determined by two parameters: nu and ny. These parameters determine the number of past samples of input and output signals of the system used to form the vector of regressors. In the present research we have tested two methods of building the vector of regressors, one classic and one using custom regressors. The results show that the vector of regressors obtained by the classical method can be shortened while maintaining the acceptable quality of the model. By using custom regressors, the feature vector of SVM can be reduced, which means also the reduction in calculation time.

Download Full-text

Frequency Selection Based Separation of Speech Signals with Reduced Computational Time Using Sparse NMF

Archives of Acoustics ◽

10.1515/aoa-2017-0031 ◽

2017 ◽

Vol 42 (2) ◽

pp. 287-295 ◽

Cited By ~ 3

Author(s):

Yash Vardhan Varshney ◽

Zia Ahmad Abbasi ◽

Musiur Raza Abidi ◽

Omar Farooq

Keyword(s):

Speech Signal ◽

High Frequency ◽

Wavelet Decomposition ◽

Audio Signal ◽

Training Data ◽

Computational Time ◽

Speech Signals ◽

Proposed Model ◽

Frequency Components ◽

Matrix Factorisation

Abstract Application of wavelet decomposition is described to speed up the mixed speech signal separation with the help of non-negative matrix factorisation (NMF). It is assumed that the basis vectors of training data of individual speakers had been recorded. In this paper, the spectrogram magnitude of a mixed signal has been factorised with the help of NMF with consideration of sparseness of speech signals. The high frequency components of signal contain very small amount of signal energy. By rejecting the high frequency components, the size of input signal is reduced, which reduces the computational time of matrix factorisation. The signal of lower energy has been separated by using wavelet decomposition. The present work is done for wideband microphone speech signal and standard audio signal from digital video equipment. This shows an improvement in the separation capability using the proposed model as compared with an existing one in terms of correlation between separated and original signals. Obtained signal to distortion ratio (SDR) and signal to interference ratio (SIR) are also larger as compare of the existing model. The proposed model also shows a reduction in computational time, which results in faster operation.

Download Full-text

Nonlinear Dynamic Feature Extraction Based on Phase Space Reconstruction for the Classification of Speech and Emotion

Mathematical Problems in Engineering ◽

10.1155/2020/9452976 ◽

2020 ◽

Vol 2020 ◽

pp. 1-15

Author(s):

Ying Sun ◽

Xue-Ying Zhang ◽

Jiang-He Ma ◽

Chun-Xiao Song ◽

Hui-Fen Lv

Keyword(s):

Feature Extraction ◽

Phase Space ◽

Speech Signal ◽

Nonlinear Dynamic ◽

Phase Space Reconstruction ◽

Support Vector ◽

Speech Signals ◽

Emotional Speech ◽

Speech Database ◽

Emotional Speech Database

Due to the shortcomings of linear feature parameters in speech signals, and the limitations of existing time- and frequency-domain attribute features in characterizing the integrity of the speech information, in this paper, we propose a nonlinear method for feature extraction based on the phase space reconstruction (PSR) theory. First, the speech signal was analyzed using a nonlinear dynamic model. Then, the model was used to reconstruct a one-dimensional time speech signal. Finally, nonlinear dynamic (NLD) features based on the reconstruction of the phase space were extracted as the new characteristic parameters. Then, the performance of NLD features was verified by comparing their recognition rates with those of other features (NLD features, prosodic features, and MFCC features). Finally, the Korean isolated words database, the Berlin emotional speech database, and the CASIA emotional speech database were chosen for validation. The effectiveness of the NLD features was tested using the Support Vector Machine classifier. The results show that NLD features not only have high recognition rate and excellent antinoise performance for speech recognition tasks but also can fully characterize the different emotions contained in speech signals.

Download Full-text

Emotion Sound Classification with Support Vector Machine Algorithm

Kinetik Game Technology Information System Computer Network Computing Electronics and Control ◽

10.22219/kinetik.v3i2.610 ◽

2018 ◽

pp. 181-190

Author(s):

Chabib Arifin ◽

Hartanto Junaedi

Keyword(s):

Support Vector Machine ◽

Speech Signal ◽

Implementation Process ◽

Recognition System ◽

Support Vector ◽

Human Beings ◽

Identity Recognition ◽

Biometric Characteristic ◽

The Face ◽

Filter Process

Speech one of the biometric characteristic owned by human being, as well as fingerprint, DNA, retina of the eyes and so not the two human beings who have the same voice. Human emotion is a matter that can only be predicted through the face of a person, or from the change of facial expression but it turns out human emotions can also be detected through the spoken voice. Someone emotion are happy, angry, neutral, sad, and surprise can be detected through speech signal. The development of voice recognition system is still running at this moment. So that ini this research, the analysis of someone emotion through speech signal. Some related research about the sound aims to have process of identity recognition gender recognition, Emotion recognition based on conversation. In this research the writer does research on the emotional classification of speech two classes started from happy, angry, neutral, sad and surprise while the used algorithm in this research is SVM (Support Vector Machine) with alghoritmMFCC (Mel-frequency cepstral coefficient)for extraction where it contains filter process that adapted to human’s listening. The result of the implementation process of both algorithms gives accuracy level ashappy=68.54%, angry=75.24%, neutral=78.50%, sad=74.22% and surprise=68.23%.

Download Full-text

Misfire Fault Diagnosis Method for Diesel Engine Based on MEMD and Dispersion Entropy

Shock and Vibration ◽

10.1155/2021/9213697 ◽

2021 ◽

Vol 2021 ◽

pp. 1-14

Author(s):

Cheng Gu ◽

Xin-Yong Qiao ◽

Huaying Li ◽

Ying Jin

Keyword(s):

Fault Diagnosis ◽

Diesel Engine ◽

Feature Vector ◽

Single Channel ◽

Vibration Signal ◽

Fault Classification ◽

Support Vector ◽

Reconstructed Signal ◽

Diagnosis Method ◽

The Imf

As a main source of power, diesel engines are widely used in large mechanical systems. Fire failure is a kind of common fault condition, which seriously affects the power and economy of the diesel engine. Previously, scholars mostly used single-channel signal to diagnose the misfire fault of the diesel engine. However, the single-channel signal has limitations in reflecting the information of fault. A novel fault diagnosis method based on MEMD and dispersion entropy is proposed in this paper. Firstly, the multichannel vibration signal of the diesel engine cylinder head is decomposed by multivariate empirical mode decomposition (MEMD), which obtains the IMF component groups with the same frequency in the same order. Then, the IMF component with a large correlation coefficient with the original signal in each group is selected to reconstruct new signal, and dispersion entropy (DE) of the reconstructed signal is calculated as a fault feature vector. Finally, the fault feature vector is input into the support vector machine (SVM) for misfire fault classification. Compared with the other three methods, the results show that the diagnosis method proposed in this paper can effectively extract the fault features and accurately identify the fault type, which is superior to the comparison method.

Download Full-text

Investigating the use of random forest, gradient boosting machine, support vector machine and their ensemble applied to fault detection

10.26678/abcm.cobem2017.cob17-1600 ◽

2017 ◽

Author(s):

Luis Felipe Nogoseke ◽

Gabriel Herman Bernardim Andrade ◽

Marco Boaretto ◽

Leandro Coelho

Keyword(s):

Support Vector Machine ◽

Random Forest ◽

Fault Detection ◽

Gradient Boosting ◽

Support Vector ◽

Gradient Boosting Machine

Download Full-text

Laboratory Measurements of Multi-Frequency and Broadband Acoustic Scattering from Turbulent and Double-Diffusive Microstructure. High-Frequency Broadband Acoustic Scattering from Non-Linear Internal Waves during SW06

10.21236/ada521009 ◽

2010 ◽

Author(s):

Andone C. Lavery

Keyword(s):

Internal Waves ◽

High Frequency ◽

Acoustic Scattering ◽

Laboratory Measurements ◽

Non Linear ◽

Double Diffusive

Download Full-text

In silico Prediction of Inhibitory Constant of Thrombin Inhibitors Using Machine Learning

Combinatorial Chemistry & High Throughput Screening ◽

10.2174/1386207322666181220130232 ◽

2019 ◽

Vol 21 (9) ◽

pp. 662-669 ◽

Cited By ~ 1

Author(s):

Junnan Zhao ◽

Lu Zhu ◽

Weineng Zhou ◽

Lingfeng Yin ◽

Yuchen Wang ◽

...

Keyword(s):

Machine Learning ◽

Prediction Models ◽

Regression Tree ◽

Large Data ◽

Thrombin Inhibitors ◽

Coagulation Cascade ◽

Gradient Boosting ◽

Support Vector ◽

Data Set ◽

Descriptor Selection

Background: Thrombin is the central protease of the vertebrate blood coagulation cascade, which is closely related to cardiovascular diseases. The inhibitory constant Ki is the most significant property of thrombin inhibitors. Method: This study was carried out to predict Ki values of thrombin inhibitors based on a large data set by using machine learning methods. Taking advantage of finding non-intuitive regularities on high-dimensional datasets, machine learning can be used to build effective predictive models. A total of 6554 descriptors for each compound were collected and an efficient descriptor selection method was chosen to find the appropriate descriptors. Four different methods including multiple linear regression (MLR), K Nearest Neighbors (KNN), Gradient Boosting Regression Tree (GBRT) and Support Vector Machine (SVM) were implemented to build prediction models with these selected descriptors. Results: The SVM model was the best one among these methods with R2=0.84, MSE=0.55 for the training set and R2=0.83, MSE=0.56 for the test set. Several validation methods such as yrandomization test and applicability domain evaluation, were adopted to assess the robustness and generalization ability of the model. The final model shows excellent stability and predictive ability and can be employed for rapid estimation of the inhibitory constant, which is full of help for designing novel thrombin inhibitors.

Download Full-text

QSAR Study of PARP Inhibitors by GA-MLR, GA-SVM and GA-ANN Approaches

Current Analytical Chemistry ◽

10.2174/1573411016999200518083359 ◽

2020 ◽

Vol 16 (8) ◽

pp. 1088-1105

Author(s):

Nafiseh Vahedi ◽

Majid Mohammadhosseini ◽

Mehdi Nekoei

Keyword(s):

Present Report ◽

Principal Component ◽

Parp Inhibitors ◽

Support Vector ◽

Ann Model ◽

Statistical Parameters ◽

Qsar Study ◽

Data Set ◽

Test Set ◽

Non Linear

Background: The poly(ADP-ribose) polymerases (PARP) is a nuclear enzyme superfamily present in eukaryotes. Methods: In the present report, some efficient linear and non-linear methods including multiple linear regression (MLR), support vector machine (SVM) and artificial neural networks (ANN) were successfully used to develop and establish quantitative structure-activity relationship (QSAR) models capable of predicting pEC50 values of tetrahydropyridopyridazinone derivatives as effective PARP inhibitors. Principal component analysis (PCA) was used to a rational division of the whole data set and selection of the training and test sets. A genetic algorithm (GA) variable selection method was employed to select the optimal subset of descriptors that have the most significant contributions to the overall inhibitory activity from the large pool of calculated descriptors. Results: The accuracy and predictability of the proposed models were further confirmed using crossvalidation, validation through an external test set and Y-randomization (chance correlations) approaches. Moreover, an exhaustive statistical comparison was performed on the outputs of the proposed models. The results revealed that non-linear modeling approaches, including SVM and ANN could provide much more prediction capabilities. Conclusion: Among the constructed models and in terms of root mean square error of predictions (RMSEP), cross-validation coefficients (Q2 LOO and Q2 LGO), as well as R2 and F-statistical value for the training set, the predictive power of the GA-SVM approach was better. However, compared with MLR and SVM, the statistical parameters for the test set were more proper using the GA-ANN model.

Download Full-text