speech emotion recognition
Recently Published Documents





2022 ◽  
Vol 2022 ◽  
pp. 1-11
Yu Wang

In this paper, we use machine learning algorithms to conduct in-depth research and analysis on the construction of human-computer interaction systems and propose a simple and effective method for extracting salient features based on contextual information. The method can retain the dynamic and static information of gestures intact, which results in a richer and more robust feature representation. Secondly, this paper proposes a dynamic planning algorithm based on feature matching, which uses the consistency and accuracy of feature matching to measure the similarity of two frames and then uses a dynamic planning algorithm to find the optimal matching distance between two gesture sequences. The algorithm ensures the continuity and accuracy of the gesture description and makes full use of the spatiotemporal location information of the features. The features and limitations of common motion target detection methods in motion gesture detection and common machine learning tracking methods in gesture tracking are first analyzed, and then, the kernel correlation filter method is improved by designing a confidence model and introducing a scale filter, and finally, comparison experiments are conducted on a self-built gesture dataset to verify the effectiveness of the improved method. During the training and validation of the model by the corpus, the complementary feature extraction methods are ablated and learned, and the corresponding results obtained are compared with the three baseline methods. But due to this feature, GMMs are not suitable when users want to model the time structure. It has been widely used in classification tasks. By using the kernel function, the support vector machine can transform the original input set into a high-dimensional feature space. After experiments, the speech emotion recognition method proposed in this paper outperforms the baseline methods, proving the effectiveness of complementary feature extraction and the superiority of the deep learning model. The speech is used as the input of the system, and the emotion recognition is performed on the input speech, and the corresponding emotion obtained is successfully applied to the human-computer dialogue system in combination with the online speech recognition method, which proves that the speech emotion recognition applied to the human-computer dialogue system has application research value.

Vaibhav K. P.

Abstract: Speech emotion recognition is a trending research topic these days, with its main motive to improve the humanmachine interaction. At present, most of the work in this area utilizes extraction of discriminatory features for the purpose of classification of emotions into various categories. Most of the present work involves the utterance of words which is used for lexical analysis for emotion recognition. In our project, a technique is utilized for classifying emotions into Angry',' Calm', 'Fearful', 'Happy', and 'Sad' categories.

2021 ◽  
Vol 38 (6) ◽  
pp. 1861-1873
Kogila Raghu ◽  
Manchala Sadanandam

Automatic Speech Recognition (ASR) is a popular research area with many variations in human behaviour functionalities and interactions. Human beings want speech for communication and Conversations. When the conversation is going on, the information or message of the speech utterances is transferred. It also consists of message which includes speaker’s traits like emotion, his or her physiological characteristics and environmental statistics. There is a tremendous number of signals or records that are complex and encoded, but these can be decoded quickly because of human intelligence. Many academics in the domain of Human Computer Interaction (HCI) are working to automate speech generation and the extraction of speech attributes and meaning. For example, ASR can regulate the usage of voice command and maintain dictation discipline while also recognizing and verifying the speech of the speaker. As a result of accent and nativity traits, the speaker's emotional state can be discerned from the speech. In this Paper, we discussed Speech Production System of Human, Research Problems in Speech Processing, SER system Motivation, Challenges and Objectives of Speech Emotion Recognition, so far the work done on Telugu Speech Emotion Databases and their role thoroughly explained. In this Paper, our own Created Database i.e., (DETL) Database for Emotions in Telugu Language and the software Audacity for creating that database is discussed clearly.

2021 ◽  
Vol 23 (12) ◽  
pp. 212-223
P Jothi Thilaga ◽  
S Kavipriya ◽  
K Vijayalakshmi ◽  

Emotions are elementary for humans, impacting perception and everyday activities like communication, learning and decision-making. Speech emotion Recognition (SER) systems aim to facilitate the natural interaction with machines by direct voice interaction rather than exploitation ancient devices as input to know verbal content and build it straightforward for human listeners to react. During this SER system primarily composed of 2 sections called feature extraction and feature classification phase. SER implements on bots to speak with humans during a non-lexical manner. The speech emotion recognition algorithm here is predicated on the Convolutional Neural Network (CNN) model, which uses varied modules for emotion recognition and classifiers to differentiate feelings like happiness, calm, anger, neutral state, sadness, and fear. The accomplishment of classification is predicated on extracted features. Finally, the emotion of a speech signal will be determined.

Antonio Guerrieri ◽  
Eleonora Braccili ◽  
Federica Sgrò ◽  
Giulio Meldolesi

The real challenge in Human Robot Interaction (HRI) is to build machines capable of perceiving human emotions so that robots can interact with humans in a proper manner. It is well known from the literature that emotion varies accordingly to many factors. Among these, gender represents one of the most influencing one, and so an appropriate gender-dependent emotion recognition system is recommended. In this paper, a two-level hierarchical Speech Emotion Recognition (SER) system is proposed: the first level is represented by the Gender Recognition (GR) module for the speaker’s gender identification; the second is a gender-specific SER block. Specifically for this work, the attention was focused on the optimisation of the first level of the proposed architecture. The system was designed to be installed on social robots for hospitalised and living at home elderly patients monitoring. Hence, the importance of reducing the software computational effort of the architecture also minimizing the hardware bulkiness, in order for the system to be suitable for social robots. The algorithm was executed on the Raspberry Pi hardware. For the training, the Italian emotional database EMOVO was used. Results show a GR accuracy value of 97.8%, comparable with the ones found in literature.

Xiaoli Qiu ◽  
Wei Li ◽  
Yang Li ◽  
Hongmei Gu ◽  
Fei Song ◽  

The identification of speech emotions is amongst the most strenuous and fascinating fields of machine learning science. In this article, Chinese emotions are classified as a disruptive atmosphere that classifies several feelings into four major emotional organizations: pleasure, sorrow, resentment, and neutrality. A machine learning in human emotion detection (ML-HED) framework is proposed. The technology suggested removing prosodic and spectrum elements of an audio wave, such as a pulse, power, amplitude, Cepstrum melt frequency correlations, linearly fixed Cepstral, and identification with a template. In all, 87,75% of performers’ statements and 93% of women’s actors were given reliability. The research findings show that the revolutionary technology achieves greater precision by accurately interpreting the feelings, which contrasts with current speech emotion recognition approaches. Besides, the derived characteristics were contrasting with various classification techniques in this study for the comprehensive idea.

Sign in / Sign up

Export Citation Format

Share Document