A speech emotion recognition method in cross-languages corpus based on feature adaptation

Author(s):  
Xinran Zhang ◽  
Cheng Zha ◽  
Gang Xiao ◽  
Li Zhao
2014 ◽  
Vol 668-669 ◽  
pp. 1126-1129
Author(s):  
Wan Li Zhang ◽  
Guo Xin Li ◽  
Wei Gao

A new recognition method based on Gaussian mixture model for speech emotion recognition is proposed in this paper. To improve the effectiveness of feature extraction and accuracy of emotion recognition, extraction of Mel frequency cepstrum coefficient combined with Gaussian mixture model is used to recognize speech emotion. According to feature parameters extraction method by analyzing the principle of vocalization theory, emotion models based on Gaussian mixture model are generated and the similarity of their templates is obtained. A series of experiments is performed with recorded speech based on Gaussian mixture model and indicates the system gains high performance and better robustness.


2015 ◽  
Vol 51 (1) ◽  
pp. 112-114 ◽  
Author(s):  
Peng Song ◽  
Yun Jin ◽  
Cheng Zha ◽  
Li Zhao

2022 ◽  
Vol 2022 ◽  
pp. 1-11
Author(s):  
Yu Wang

In this paper, we use machine learning algorithms to conduct in-depth research and analysis on the construction of human-computer interaction systems and propose a simple and effective method for extracting salient features based on contextual information. The method can retain the dynamic and static information of gestures intact, which results in a richer and more robust feature representation. Secondly, this paper proposes a dynamic planning algorithm based on feature matching, which uses the consistency and accuracy of feature matching to measure the similarity of two frames and then uses a dynamic planning algorithm to find the optimal matching distance between two gesture sequences. The algorithm ensures the continuity and accuracy of the gesture description and makes full use of the spatiotemporal location information of the features. The features and limitations of common motion target detection methods in motion gesture detection and common machine learning tracking methods in gesture tracking are first analyzed, and then, the kernel correlation filter method is improved by designing a confidence model and introducing a scale filter, and finally, comparison experiments are conducted on a self-built gesture dataset to verify the effectiveness of the improved method. During the training and validation of the model by the corpus, the complementary feature extraction methods are ablated and learned, and the corresponding results obtained are compared with the three baseline methods. But due to this feature, GMMs are not suitable when users want to model the time structure. It has been widely used in classification tasks. By using the kernel function, the support vector machine can transform the original input set into a high-dimensional feature space. After experiments, the speech emotion recognition method proposed in this paper outperforms the baseline methods, proving the effectiveness of complementary feature extraction and the superiority of the deep learning model. The speech is used as the input of the system, and the emotion recognition is performed on the input speech, and the corresponding emotion obtained is successfully applied to the human-computer dialogue system in combination with the online speech recognition method, which proves that the speech emotion recognition applied to the human-computer dialogue system has application research value.


Sign in / Sign up

Export Citation Format

Share Document