Acoustic Features Based Emotional Speech Signal Categorization by Advanced Linear Discriminator Analysis

Personal computer-based data collection and analysis systems may now be more resilient due to the recent advances in digital signal processing technology. The signal processing approach known as Speaker Recognition, uses the specific information contained in voice waves to automatically identify the speaker. For a single source, this study examines systems that can recognize a wide range of emotional states in speech. Since it offers insight into human brain states, it's a hot issue in the development during the interface between human and computer arrangement for speech processing. Mostly, it is necessary to recognize the emotional state of people in the arrangement. This research analyses an effort to discern various emotional stages such as anger, joy, neutral, fear and sadness by classification methods. The acoustic feature, a measure of unpredictability, is used in conjunction with a non-linear signal quantification approach to identify emotions. The unpredictability of all the emotional signals is included in a feature vector constructed from the calculated entropy measurements. In the next step, the acoustic features through speech signal are used for the training in the proposed neural network that are given to linear discriminator analysis approach for further greater classification with acoustic feature extraction. Besides, this research article compares the proposed work with various modern classifiers such as K- nearest neighbor, support vector machine and linear discriminator approach. Moreover, this proposed algorithm is based on acoustic features in Linear Discriminant Analysis (LDA) with acoustic feature extraction machine algorithm. The great advantage of this proposed algorithm is that it separates negative and positive features of emotions and provides good results during classification. According to the results from efficient cross-validation in the proposed framework, accessible sample of dataset of Emotional Speech, a single-source LDA classifier can recognize emotions in speech signals with above 90 percent of accuracy for various emotional stages.

Download Full-text

Nonlinear Dynamic Feature Extraction Based on Phase Space Reconstruction for the Classification of Speech and Emotion

Mathematical Problems in Engineering ◽

10.1155/2020/9452976 ◽

2020 ◽

Vol 2020 ◽

pp. 1-15

Author(s):

Ying Sun ◽

Xue-Ying Zhang ◽

Jiang-He Ma ◽

Chun-Xiao Song ◽

Hui-Fen Lv

Keyword(s):

Feature Extraction ◽

Phase Space ◽

Speech Signal ◽

Nonlinear Dynamic ◽

Phase Space Reconstruction ◽

Support Vector ◽

Speech Signals ◽

Emotional Speech ◽

Speech Database ◽

Emotional Speech Database

Due to the shortcomings of linear feature parameters in speech signals, and the limitations of existing time- and frequency-domain attribute features in characterizing the integrity of the speech information, in this paper, we propose a nonlinear method for feature extraction based on the phase space reconstruction (PSR) theory. First, the speech signal was analyzed using a nonlinear dynamic model. Then, the model was used to reconstruct a one-dimensional time speech signal. Finally, nonlinear dynamic (NLD) features based on the reconstruction of the phase space were extracted as the new characteristic parameters. Then, the performance of NLD features was verified by comparing their recognition rates with those of other features (NLD features, prosodic features, and MFCC features). Finally, the Korean isolated words database, the Berlin emotional speech database, and the CASIA emotional speech database were chosen for validation. The effectiveness of the NLD features was tested using the Support Vector Machine classifier. The results show that NLD features not only have high recognition rate and excellent antinoise performance for speech recognition tasks but also can fully characterize the different emotions contained in speech signals.

Download Full-text

A Comparison of the Analysis of Methods for Feature Extraction and Classification by Wavelet Transform in SSVEP BCIs

10.21203/rs.3.rs-82008/v1 ◽

2020 ◽

Author(s):

Hoda Heidari ◽

Zahra Einalou ◽

Mehrdad Dadgostar ◽

Hamidreza Hosseinzadeh

Keyword(s):

Feature Extraction ◽

Feature Selection ◽

Wavelet Transform ◽

Decision Tree ◽

Nearest Neighbor ◽

Support Vector ◽

K Nearest Neighbor ◽

Iir Filters ◽

Wide Range ◽

New Feature

Abstract Most of the studies in the field of Brain-Computer Interface (BCI) based on electroencephalography have a wide range of applications. Extracting Steady State Visual Evoked Potential (SSVEP) is regarded as one of the most useful tools in BCI systems. In this study, different methods such as feature extraction with different spectral methods (Shannon entropy, skewness, kurtosis, mean, variance) (bank of filters, narrow-bank IIR filters, and wavelet transform magnitude), feature selection performed by various methods (decision tree, principle component analysis (PCA), t-test, Wilcoxon, Receiver operating characteristic (ROC)), and classification step applying k nearest neighbor (k-NN), perceptron, support vector machines (SVM), Bayesian, multiple layer perceptron (MLP) were compared from the whole stream of signal processing. Through combining such methods, the effective overview of the study indicated the accuracy of classical methods. In addition, the present study relied on a rather new feature selection described by decision tree and PCA, which is used for the BCI-SSVEP systems. Finally, the obtained accuracies were calculated based on the four recorded frequencies representing four directions including right, left, up, and down.

Download Full-text

Acoustic comparison of electronics disguised voice using Different semitones

International Journal of Engineering & Technology ◽

10.14419/ijet.v7i2.16.11502 ◽

2018 ◽

Vol 7 (2.16) ◽

pp. 98 ◽

Cited By ~ 2

Author(s):

Mahesh K. Singh ◽

A K. Singh ◽

Narendra Singh

Keyword(s):

Support Vector Machine ◽

Acoustic Analysis ◽

Speaker Identification ◽

Support Vector ◽

Acoustic Features ◽

Acoustic Feature ◽

Mel Frequency Cepstral Coefficients ◽

Identification Rate ◽

Normal Voice ◽

Feature Based

This paper emphasizes an algorithm that is based on acoustic analysis of electronics disguised voice. Proposed work is given a comparative analysis of all acoustic feature and its statistical coefficients. Acoustic features are computed by Mel-frequency cepstral coefficients (MFCC) method and compare with a normal voice and disguised voice by different semitones. All acoustic features passed through the feature based classifier and detected the identification rate of all type of electronically disguised voice. There are two types of support vector machine (SVM) and decision tree (DT) classifiers are used for speaker identification in terms of classification efficiency of electronically disguised voice by different semitones.

Download Full-text

Feature extraction and representation via orthogonal signal decomposition for parametric speech signal processing

Signal Processing, Sensor/Information Fusion, and Target Recognition XXIX ◽

10.1117/12.2544715 ◽

2020 ◽

Author(s):

Jeffrey Y. Beyon

Keyword(s):

Signal Processing ◽

Feature Extraction ◽

Speech Signal ◽

Signal Decomposition ◽

Speech Signal Processing ◽

Orthogonal Signal

Download Full-text

EVALUATION OF FEATURE EXTRACTION AND CLASSIFICATION TECHNIQUES FOR EEG-BASED SUBJECT IDENTIFICATION

Jurnal Teknologi ◽

10.11113/jt.v78.9717 ◽

2016 ◽

Vol 78 (9-3) ◽

Author(s):

Dini Handayani ◽

Abdul Wahab ◽

Hamwira Yaacob

Keyword(s):

Feature Extraction ◽

Affective Computing ◽

User Profiling ◽

Support Vector ◽

Emotional States ◽

Mel Frequency Cepstral Coefficients ◽

Classification Techniques ◽

Power Spectral ◽

Wide Range ◽

Subject Identification

The ability to identify a subject is indispensable in affective computing research due to its wide range of applications. User profiling was created based on the strength of emotional patterns of the subject, which can be used for subject identification. Such system is made based on the emotional states of happiness and sadness, indicated by the electroencephalogram (EEG) data. In this paper, we examine several techniques used for subject profiling or identification purposed. Those techniques include feature extraction and classification techniques. In the experimental study, we compare three techniques for feature extraction namely, Power Spectral Density (PSD), Kernel Density Estimation (KDE), and Mel Frequency Cepstral Coefficients (MFCC). As for classification we compare three classification techniques, they are; Multilayer Perceptron (MLP), Naive Bayesian (NB), and Support Vector Machine (SVM). The best result achieved was 59.66%, using the MFCC and MLP-based techniques using 5-fold cross validation. The experiment results indicated that these profiles could be more accurate in identifying subject compared to NB and SVM. The comparisons demonstrated that profile-based methods for subject identification provide a viable and simple alternative to this problem.

Download Full-text

Detection and Characterization of Physical Activity and Psychological Stress from Wristband Data

Signals ◽

10.3390/signals1020011 ◽

2020 ◽

Vol 1 (2) ◽

pp. 188-208

Author(s):

Mert Sevil ◽

Mudassir Rashid ◽

Mohammad Reza Askari ◽

Zacharie Maloney ◽

Iman Hajizadeh ◽

...

Keyword(s):

Physical Activity ◽

Signal Processing ◽

Feature Extraction ◽

Psychological Stress ◽

Training Data ◽

Support Vector ◽

K Nearest Neighbor ◽

Data Set ◽

Linear Discriminant ◽

Physiological Variables

Wearable devices continuously measure multiple physiological variables to inform users of health and behavior indicators. The computed health indicators must rely on informative signals obtained by processing the raw physiological variables with powerful noise- and artifacts-filtering algorithms. In this study, we aimed to elucidate the effects of signal processing techniques on the accuracy of detecting and discriminating physical activity (PA) and acute psychological stress (APS) using physiological measurements (blood volume pulse, heart rate, skin temperature, galvanic skin response, and accelerometer) collected from a wristband. Data from 207 experiments involving 24 subjects were used to develop signal processing, feature extraction, and machine learning (ML) algorithms that can detect and discriminate PA and APS when they occur individually or concurrently, classify different types of PA and APS, and estimate energy expenditure (EE). Training data were used to generate feature variables from the physiological variables and develop ML models (naïve Bayes, decision tree, k-nearest neighbor, linear discriminant, ensemble learning, and support vector machine). Results from an independent labeled testing data set demonstrate that PA was detected and classified with an accuracy of 99.3%, and APS was detected and classified with an accuracy of 92.7%, whereas the simultaneous occurrences of both PA and APS were detected and classified with an accuracy of 89.9% (relative to actual class labels), and EE was estimated with a low mean absolute error of 0.02 metabolic equivalent of task (MET).The data filtering and adaptive noise cancellation techniques used to mitigate the effects of noise and artifacts on the classification results increased the detection and discrimination accuracy by 0.7% and 3.0% for PA and APS, respectively, and by 18% for EE estimation. The results demonstrate the physiological measurements from wristband devices are susceptible to noise and artifacts, and elucidate the effects of signal processing and feature extraction on the accuracy of detection, classification, and estimation of PA and APS.

Download Full-text

Speech Emotion Recognition System

International Journal of Advanced Research in Science, Communication and Technology ◽

10.48175/ijarsct-v4-i3-024 ◽

2021 ◽

pp. 156-159

Author(s):

Sourabh Suke ◽

Ganesh Regulwar ◽

Nikesh Aote ◽

Pratik Chaudhari ◽

Rajat Ghatode ◽

...

Keyword(s):

Emotion Recognition ◽

Automobile Industry ◽

Emotional State ◽

Recognition System ◽

Classification Model ◽

General Idea ◽

Speech Emotion Recognition ◽

Support Vector ◽

Emotional Speech ◽

Acoustic Features

This project describes "VoiEmo- A Speech Emotion Recognizer", a system for recognizing the emotional state of an individual from his/her speech. For example, one's speech becomes loud and fast, with a higher and wider range in pitch, when in a state of fear, anger, or joy whereas human voice is generally slow and low pitched in sadness and tiredness. We have particularly developed a classification model speech emotion detection based on Convolutional neural networks (CNNs), Support Vector Machine (SVM), Multilayer Perceptron (MLP) Classification which make predictions considering the acoustic features of speech signal such as Mel Frequency Cepstral Coefficient (MFCC). Our models have been trained to recognize seven common emotions (neutral, calm, happy, sad, angry, fearful, disgust, surprise). For training and testing the model, we have used relevant data from the Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS) dataset and the Toronto Emotional Speech Set (TESS) Dataset. The system is advantageous as it can provide a general idea about the emotional state of the individual based on the acoustic features of the speech irrespective of the language the speaker speaks in, moreover, it also saves time and effort. Speech emotion recognition systems have their applications in various fields like in call centers and BPOs, criminal investigation, psychiatric therapy, the automobile industry, etc.

Download Full-text

Vehicle Type Recognition in Sensor Networks Using Improved Time Encoded Signal Processing Algorithm

Mathematical Problems in Engineering ◽

10.1155/2014/142304 ◽

2014 ◽

Vol 2014 ◽

pp. 1-8 ◽

Cited By ~ 2

Author(s):

Yan Wang ◽

Xi Wu ◽

Xiaohua Li ◽

Jiliu Zhou

Keyword(s):

Signal Processing ◽

Feature Extraction ◽

Sensor Networks ◽

Spectral Feature ◽

Sound Signal ◽

Sensor Nodes ◽

Support Vector ◽

Signal Processing Algorithm ◽

Feature Extraction Method ◽

Vehicle Type

Vehicle type recognition is a demanding application of wireless sensor networks (WSN). In many cases, sensor nodes detect and recognize vehicles from their acoustic or seismic signals using wavelet based or spectral feature extraction methods. Such methods, while providing convincing results, are quite demanding in computational power and energy and are difficult to implement on low-cost sensor nodes with limitation resources. In this paper, we investigate the use of time encoded signal processing (TESP) algorithm for vehicle type recognition. The conventional TESP algorithm, which is effective for the speech signal feature extraction, however, is not suitable for the vehicle sound signal which is more complex. To solve this problem, an improved time encoded signal processing (ITESP) is proposed as the feature extraction method according to the characteristics of the vehicle sound signal. Recognition procedure is accomplished using the support vector machine (SVM) and thek-nearest neighbor (KNN) classifier. The experimental results indicate that the vehicle type recognition system with ITESP features give much better performance compared with the conventional TESP based features.

Download Full-text

Discriminating Emotions in the Valence Dimension from Speech Using Timbre Features

Applied Sciences ◽

10.3390/app9122470 ◽

2019 ◽

Vol 9 (12) ◽

pp. 2470 ◽

Cited By ~ 7

Author(s):

Anvarjon Tursunov ◽

Soonil Kwon ◽

Hee-Suk Pang

Keyword(s):

Short Term Memory ◽

Classification Performance ◽

Support Vector ◽

Emotional Speech ◽

Acoustic Features ◽

Discrete Emotions ◽

Forward Selection ◽

Mel Frequency Cepstral Coefficients ◽

Speech Database ◽

Emotional Speech Database

The most used and well-known acoustic features of a speech signal, the Mel frequency cepstral coefficients (MFCC), cannot characterize emotions in speech sufficiently when a classification is performed to classify both discrete emotions (i.e., anger, happiness, sadness, and neutral) and emotions in valence dimension (positive and negative). The main reason for this is that some of the discrete emotions, such as anger and happiness, share similar acoustic features in the arousal dimension (high and low) but are different in the valence dimension. Timbre is a sound quality that can discriminate between two sounds even with the same pitch and loudness. In this paper, we analyzed timbre acoustic features to improve the classification performance of discrete emotions as well as emotions in the valence dimension. Sequential forward selection (SFS) was used to find the most relevant acoustic features among timbre acoustic features. The experiments were carried out on the Berlin Emotional Speech Database and the Interactive Emotional Dyadic Motion Capture Database. Support vector machine (SVM) and long short-term memory recurrent neural network (LSTM-RNN) were used to classify emotions. The significant classification performance improvements were achieved using a combination of baseline and the most relevant timbre acoustic features, which were found by applying SFS on a classification of emotions for the Berlin Emotional Speech Database. From extensive experiments, it was found that timbre acoustic features could characterize emotions sufficiently in a speech in the valence dimension.

Download Full-text

Gender Classification Based on The Non-Lexical Cues Of Emergency Calls With Recurrent Neural Networks (RNN)

Symmetry ◽

10.3390/sym11040525 ◽

2019 ◽

Vol 11 (4) ◽

pp. 525 ◽

Cited By ~ 3

Author(s):

SON ◽

KWON ◽

PARK

Keyword(s):

Neural Networks ◽

Recurrent Neural Networks ◽

Spectral Characteristics ◽

Behavioral Approach ◽

Support Vector ◽

Gender Classification ◽

Acoustic Feature ◽

Emergency Calls ◽

Wide Range ◽

Speech Features

Automatic gender classification in speech is a challenging research field with a wide range of applications in HCI (humancomputer interaction). A couple of decades of research have shown promising results, but there is still a need for improvement. Until now, gender classification has been made using differences in the spectral characteristics of males and females. We assumed that a neutral margin exists between the male and female spectral range. This margin causes misclassification of gender. To address this limitation, we studied three non-lexical speech features (fillers, overlapping, and lengthening). From the statistical analysis, we found that overlapping and lengthening are effective in gender classification. Next, we performed gender classification using overlapping, lengthening, and the baseline acoustic feature, Mel Frequency Cepstral Coefficient (MFCC). We have tried to achieve the best results by using various combinations of features at the same time or sequentially. We used two types of machine-learning methods, support vector machine (SVM) and recurrent neural networks (RNN), to classify the gender. We achieved 89.61% with RNN using a feature set including MFCC, overlapping, and lengthening at the same time. Also, we have reclassified using non-lexical features with only data belonging to the neutral margin which was empirically selected based on the result of gender classification with only MFCC. As a result, we determined that the accuracy of classification with RNN using lengthening was 1.83% better than when MFCC alone was used. We concluded that new speech features could be effective in improving gender classification through a behavioral approach, notably including emergency calls.

Download Full-text