scholarly journals An Appraisal on Speech and Emotion Recognition Technologies based on Machine Learning

2020 ◽  
Vol 8 (5) ◽  
pp. 2266-2276 ◽  

In earlier days, people used speech as a means of communication or the way a listener is conveyed by voice or expression. But the idea of machine learning and various methods are necessary for the recognition of speech in the matter of interaction with machines. With a voice as a bio-metric through use and significance, speech has become an important part of speech development. In this article, we attempted to explain a variety of speech and emotion recognition techniques and comparisons between several methods based on existing algorithms and mostly speech-based methods. We have listed and distinguished speaking technologies that are focused on specifications, databases, classification, feature extraction, enhancement, segmentation and process of Speech Emotion recognition in this paper

2022 ◽  
Vol 2022 ◽  
pp. 1-11
Author(s):  
Yu Wang

In this paper, we use machine learning algorithms to conduct in-depth research and analysis on the construction of human-computer interaction systems and propose a simple and effective method for extracting salient features based on contextual information. The method can retain the dynamic and static information of gestures intact, which results in a richer and more robust feature representation. Secondly, this paper proposes a dynamic planning algorithm based on feature matching, which uses the consistency and accuracy of feature matching to measure the similarity of two frames and then uses a dynamic planning algorithm to find the optimal matching distance between two gesture sequences. The algorithm ensures the continuity and accuracy of the gesture description and makes full use of the spatiotemporal location information of the features. The features and limitations of common motion target detection methods in motion gesture detection and common machine learning tracking methods in gesture tracking are first analyzed, and then, the kernel correlation filter method is improved by designing a confidence model and introducing a scale filter, and finally, comparison experiments are conducted on a self-built gesture dataset to verify the effectiveness of the improved method. During the training and validation of the model by the corpus, the complementary feature extraction methods are ablated and learned, and the corresponding results obtained are compared with the three baseline methods. But due to this feature, GMMs are not suitable when users want to model the time structure. It has been widely used in classification tasks. By using the kernel function, the support vector machine can transform the original input set into a high-dimensional feature space. After experiments, the speech emotion recognition method proposed in this paper outperforms the baseline methods, proving the effectiveness of complementary feature extraction and the superiority of the deep learning model. The speech is used as the input of the system, and the emotion recognition is performed on the input speech, and the corresponding emotion obtained is successfully applied to the human-computer dialogue system in combination with the online speech recognition method, which proves that the speech emotion recognition applied to the human-computer dialogue system has application research value.


2014 ◽  
Vol 2014 ◽  
pp. 1-7 ◽  
Author(s):  
Chenchen Huang ◽  
Wei Gong ◽  
Wenlong Fu ◽  
Dongyu Feng

Feature extraction is a very important part in speech emotion recognition, and in allusion to feature extraction in speech emotion recognition problems, this paper proposed a new method of feature extraction, using DBNs in DNN to extract emotional features in speech signal automatically. By training a 5 layers depth DBNs, to extract speech emotion feature and incorporate multiple consecutive frames to form a high dimensional feature. The features after training in DBNs were the input of nonlinear SVM classifier, and finally speech emotion recognition multiple classifier system was achieved. The speech emotion recognition rate of the system reached 86.5%, which was 7% higher than the original method.


Author(s):  
Shreya Kumar ◽  
Swarnalaxmi Thiruvenkadam

Feature extraction is an integral part in speech emotion recognition. Some emotions become indistinguishable from others due to high resemblance in their features, which results in low prediction accuracy. This paper analyses the impact of spectral contrast feature in increasing the accuracy for such emotions. The RAVDESS dataset has been chosen for this study. The SAVEE dataset, CREMA-D dataset and JL corpus dataset were also used to test its performance over different English accents. In addition to that, EmoDB dataset has been used to study its performance in the German language. The use of spectral contrast feature has increased the prediction accuracy in speech emotion recognition systems to a good degree as it performs well in distinguishing emotions with significant differences in arousal levels, and it has been discussed in detail.<div> </div>


Sign in / Sign up

Export Citation Format

Share Document