Emotional Interactive Simulation System of English Speech Recognition in Virtual Context

With the development of virtual scenes, the degree of simulation and functions of virtual reality have been very complete, providing a new platform and perspective for teaching design. Firstly, the hidden Markov chain model is used to perform emotion recognition on English speech signals. English speech emotion recognition and speech semantic recognition are essentially the same. Hidden Markov style has been widely used in English speech semantic recognition. The experiments of feature extraction and pattern recognition of speech samples prove that Hidden Markovian has higher recognition rate and better recognition effect in speech emotion recognition. Secondly, combining the human pronunciation model and the hearing model, by analyzing the impact of the glottis feature on the human ear hearing-model feature, the research application of the English speech recognition emotion interactive simulation system uses the glottis feature to compensate the human ear, hearing feature is proposed by compensated English speech recognition, and emotion interaction simulation system is used in the English speech emotion experiment, which has obtained a high recognition rate and showed excellent performance.

Download Full-text

A Mathematical Morphological Processing of Spectrograms for the Tone of Chinese Vowels Recognition

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.571-572.665 ◽

2014 ◽

Vol 571-572 ◽

pp. 665-671 ◽

Cited By ~ 1

Author(s):

Sen Xu ◽

Xu Zhao ◽

Cheng Hua Duan ◽

Xiao Lin Cao ◽

Hui Yan Li ◽

...

Keyword(s):

Neural Networks ◽

Speech Recognition ◽

Emotion Recognition ◽

Recognition Rate ◽

Morphological Processing ◽

Speech Emotion Recognition ◽

Normal Tone ◽

Tone Recognition ◽

Tone Signal ◽

The Neural Networks

As One of Features from other Languages, the Chinese Tone Changes of Chinese are Mainly Decided by its Vowels, so the Vowel Variation of Chinese Tone Becomes Important in Speech Recognition Research. the Normal Tone Recognition Ways are Always Based on Fundamental Frequency of Signal, which can Not Keep Integrity of Tone Signal. we Bring Forward to a Mathematical Morphological Processing of Spectrograms for the Tone of Chinese Vowels. Firstly, we will have Pretreatment to Recording Good Tone Signal by Using Cooledit Pro Software, and Converted into Spectrograms; Secondly, we will do Smooth and the Normalized Pretreatment to Spectrograms by Mathematical Morphological Processing; Finally, we get Whole Direction Angle Statistics of Tone Signal by Skeletonization way. the Neural Networks Stimulation Shows that the Speech Emotion Recognition Rate can Reach 92.50%.

Download Full-text

Speech Emotion Recognition Based on Mixed MFCC

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.249-250.1252 ◽

2012 ◽

Vol 249-250 ◽

pp. 1252-1258 ◽

Cited By ~ 2

Author(s):

Ping Zhou ◽

Xiao Pan Li ◽

Jie Li ◽

Xin Xing Jing

Keyword(s):

Speech Recognition ◽

Emotion Recognition ◽

High Frequency ◽

Recognition Rate ◽

Characteristic Parameter ◽

Identification Accuracy ◽

Speech Emotion Recognition ◽

Characteristic Parameters ◽

Frequency Signal ◽

High Frequency Signal

Due to MFCC characteristic parameter in speech recognition has low identification accuracy when signal is intermediate, high frequency signal, this paper put forward a improved algorithm of combining MFCC, Mid-MFCC and IMFCC, using increase or decrease component method to calculate the contribution that MFCC, Mid-MFCC and IMFCC each order cepstrum component was used in speech emotion recognition, extracting several order cepstrum component with highest contribution from three characteristic parameters and forming a new characteristic parameter. The experiment results show that under the same environment new characteristic parameter has higher recognition rate than classic MFCC characteristic parameter in speech emotion recognition.

Download Full-text

A Research of Speech Emotion Recognition Based on Deep Belief Network and SVM

Mathematical Problems in Engineering ◽

10.1155/2014/749604 ◽

2014 ◽

Vol 2014 ◽

pp. 1-7 ◽

Cited By ~ 21

Author(s):

Chenchen Huang ◽

Wei Gong ◽

Wenlong Fu ◽

Dongyu Feng

Keyword(s):

Feature Extraction ◽

Emotion Recognition ◽

Recognition Rate ◽

Original Method ◽

Speech Emotion Recognition ◽

High Dimensional ◽

Svm Classifier ◽

Multiple Classifier System ◽

Classifier System ◽

Multiple Classifier

Feature extraction is a very important part in speech emotion recognition, and in allusion to feature extraction in speech emotion recognition problems, this paper proposed a new method of feature extraction, using DBNs in DNN to extract emotional features in speech signal automatically. By training a 5 layers depth DBNs, to extract speech emotion feature and incorporate multiple consecutive frames to form a high dimensional feature. The features after training in DBNs were the input of nonlinear SVM classifier, and finally speech emotion recognition multiple classifier system was achieved. The speech emotion recognition rate of the system reached 86.5%, which was 7% higher than the original method.

Download Full-text

Important Attributes Selection Based on Rough Set for Speech Emotion Recognition

Transdisciplinary Advancements in Cognitive Mechanisms and Human Information Processing ◽

10.4018/978-1-60960-553-7.ch016 ◽

2011 ◽

pp. 262-271

Author(s):

Jian Zhou ◽

Guoyin Wang ◽

Yong Yang

Keyword(s):

Emotion Recognition ◽

Set Theory ◽

Rough Set ◽

Rough Set Theory ◽

Recognition Rate ◽

Feature Selection Method ◽

Recognition System ◽

Attribute Selection ◽

Computer Application ◽

Speech Emotion Recognition

Speech emotion recognition is becoming more and more important in such computer application fields as health care, children education, etc. In order to improve the prediction performance or providing faster and more cost-effective recognition system, an attribute selection is often carried out beforehand to select the important attributes from the input attribute sets. However, it is time-consuming for traditional feature selection method used in speech emotion recognition to determine an optimum or suboptimum feature subset. Rough set theory offers an alternative, formal and methodology that can be employed to reduce the dimensionality of data. The purpose of this study is to investigate the effectiveness of Rough Set Theory in identifying important features in speech emotion recognition system. The experiments on CLDC emotion speech database clearly show this approach can reduce the calculation cost while retaining a suitable high recognition rate.

Download Full-text

Comparison of Several Acoustic Modeling Techniques for Speech Emotion Recognition

International Journal of Synthetic Emotions ◽

10.4018/ijse.2016010105 ◽

2016 ◽

Vol 7 (1) ◽

pp. 58-68 ◽

Cited By ~ 4

Author(s):

Imen Trabelsi ◽

Med Salim Bouhlel

Keyword(s):

Emotion Recognition ◽

Linear Prediction ◽

Recognition Rate ◽

Gaussian Mixture ◽

Speech Emotion Recognition ◽

Support Vector ◽

Emotional States ◽

Wide Range ◽

Leibler Divergence ◽

Perceptual Linear Prediction

Automatic Speech Emotion Recognition (SER) is a current research topic in the field of Human Computer Interaction (HCI) with a wide range of applications. The purpose of speech emotion recognition system is to automatically classify speaker's utterances into different emotional states such as disgust, boredom, sadness, neutral, and happiness. The speech samples in this paper are from the Berlin emotional database. Mel Frequency cepstrum coefficients (MFCC), Linear prediction coefficients (LPC), linear prediction cepstrum coefficients (LPCC), Perceptual Linear Prediction (PLP) and Relative Spectral Perceptual Linear Prediction (Rasta-PLP) features are used to characterize the emotional utterances using a combination between Gaussian mixture models (GMM) and Support Vector Machines (SVM) based on the Kullback-Leibler Divergence Kernel. In this study, the effect of feature type and its dimension are comparatively investigated. The best results are obtained with 12-coefficient MFCC. Utilizing the proposed features a recognition rate of 84% has been achieved which is close to the performance of humans on this database.

Download Full-text

Speech Emotion Recognition Based on Selective Interpolation Synthetic Minority Over-Sampling Technique in Small Sample Environment

Sensors ◽

10.3390/s20082297 ◽

2020 ◽

Vol 20 (8) ◽

pp. 2297

Author(s):

Zhen-Tao Liu ◽

Bao-Han Wu ◽

Dan-Yun Li ◽

Peng Xiao ◽

Jun-Wei Mao

Keyword(s):

Emotion Recognition ◽

Feature Selection Method ◽

Sampling Technique ◽

Small Sample ◽

Speech Emotion Recognition ◽

Gradient Boosting ◽

Data Imbalance ◽

The Arts ◽

The Impact ◽

Sample Environment

Speech emotion recognition often encounters the problems of data imbalance and redundant features in different application scenarios. Researchers usually design different recognition models for different sample conditions. In this study, a speech emotion recognition model for a small sample environment is proposed. A data imbalance processing method based on selective interpolation synthetic minority over-sampling technique (SISMOTE) is proposed to reduce the impact of sample imbalance on emotion recognition results. In addition, feature selection method based on variance analysis and gradient boosting decision tree (GBDT) is introduced, which can exclude the redundant features that possess poor emotional representation. Results of experiments of speech emotion recognition on three databases (i.e., CASIA, Emo-DB, SAVEE) show that our method obtains average recognition accuracy of 90.28% (CASIA), 75.00% (SAVEE) and 85.82% (Emo-DB) for speaker-dependent speech emotion recognition which is superior to some state-of-the-arts works.

Download Full-text

Feature extraction algorithms to improve the speech emotion recognition rate

International Journal of Speech Technology ◽

10.1007/s10772-020-09672-4 ◽

2020 ◽

Vol 23 (1) ◽

pp. 45-55 ◽

Cited By ~ 7

Author(s):

Anusha Koduru ◽

Hima Bindu Valiveti ◽

Anil Kumar Budati

Keyword(s):

Feature Extraction ◽

Emotion Recognition ◽

Recognition Rate ◽

Speech Emotion Recognition

Download Full-text

Convolutional Recurrent Neural Networks Based Speech Emotion Recognition

Journal of Computational and Theoretical Nanoscience ◽

10.1166/jctn.2020.9321 ◽

2020 ◽

Vol 17 (8) ◽

pp. 3786-3789

Author(s):

P. Gayathri ◽

P. Gowri Priya ◽

L. Sravani ◽

Sandra Johnson ◽

Visanth Sampath

Keyword(s):

Neural Networks ◽

Emotion Recognition ◽

Recurrent Neural Networks ◽

Machine Learning Techniques ◽

Speech Emotion Recognition ◽

Emotional Information ◽

Feature Representations ◽

Emotional Factors ◽

Learning Techniques ◽

The Impact

Recognition of emotions is the aspect of speech recognition that is gaining more attention and the need for it is growing enormously. Although there are methods to identify emotion using machine learning techniques, we assume in this paper that calculating deltas and delta-deltas for customized features not only preserves effective emotional information, but also that the impact of irrelevant emotional factors, leading to a reduction in misclassification. Furthermore, Speech Emotion Recognition (SER) often suffers from the silent frames and irrelevant emotional frames. Meanwhile, the process of attention has demonstrated exceptional performance in learning related feature representations for specific tasks. Inspired by this, propose a Convolutionary Recurrent Neural Networks (ACRNN) based on Attention to learn discriminative features for SER, where the Mel-spectrogram with deltas and delta-deltas is used as input. Finally, experimental results show the feasibility of the proposed method and attain state-of-the-art performance in terms of unweighted average recall.

Download Full-text

Hidden Markov model-based speech emotion recognition

2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03). ◽

10.1109/icassp.2003.1202279 ◽

2003 ◽

Cited By ~ 65

Author(s):

B. Schuller ◽

G. Rigoll ◽

M. Lang

Keyword(s):

Markov Model ◽

Hidden Markov Model ◽

Emotion Recognition ◽

Hidden Markov ◽

Speech Emotion Recognition ◽

Model Based

Download Full-text

An Analysis of the Impact of Spectral Contrast Feature in Speech Emotion Recognition

International Journal of Recent Contributions from Engineering Science & IT (iJES) ◽

10.3991/ijes.v9i2.22983 ◽

2021 ◽

Vol 9 (2) ◽

pp. 87

Author(s):

Shreya Kumar ◽

Swarnalaxmi Thiruvenkadam

Keyword(s):

Feature Extraction ◽

Emotion Recognition ◽

Prediction Accuracy ◽

Speech Emotion Recognition ◽

German Language ◽

Spectral Contrast ◽

Recognition Systems ◽

The Impact ◽

Contrast Feature

Feature extraction is an integral part in speech emotion recognition. Some emotions become indistinguishable from others due to high resemblance in their features, which results in low prediction accuracy. This paper analyses the impact of spectral contrast feature in increasing the accuracy for such emotions. The RAVDESS dataset has been chosen for this study. The SAVEE dataset, CREMA-D dataset and JL corpus dataset were also used to test its performance over different English accents. In addition to that, EmoDB dataset has been used to study its performance in the German language. The use of spectral contrast feature has increased the prediction accuracy in speech emotion recognition systems to a good degree as it performs well in distinguishing emotions with significant differences in arousal levels, and it has been discussed in detail.<div> </div>

Download Full-text