A concise and effectual method for neutral pitch identification in stuttered speech

Author(s):  
Yashaswi Alva M ◽  
Nachamai M ◽  
Joy Paulose

AbstractResearchers have studied that human-computer interactions (HCIs) can be more effective only when machines understand the emotions conveyed in speech. Speech emotion recognition has seen growing interest in research due to its usefulness in different applications. Building a neutral speech model becomes an important and challenging task as it can help in identifying different emotions from stuttered speech. This paper suggests two different approaches for identifying neutral pitch from stuttered speech. The implementation has proved through its accuracy the best model that can be adopted for neutral speech pitch identification.

2021 ◽  
Vol 25 (4) ◽  
pp. 1031-1045
Author(s):  
Helang Lai ◽  
Keke Wu ◽  
Lingli Li

Emotion recognition in conversations is crucial as there is an urgent need to improve the overall experience of human-computer interactions. A promising improvement in this field is to develop a model that can effectively extract adequate contexts of a test utterance. We introduce a novel model, termed hierarchical memory networks (HMN), to address the issues of recognizing utterance level emotions. HMN divides the contexts into different aspects and employs different step lengths to represent the weights of these aspects. To model the self dependencies, HMN takes independent local memory networks to model these aspects. Further, to capture the interpersonal dependencies, HMN employs global memory networks to integrate the local outputs into global storages. Such storages can generate contextual summaries and help to find the emotional dependent utterance that is most relevant to the test utterance. With an attention-based multi-hops scheme, these storages are then merged with the test utterance using an addition operation in the iterations. Experiments on the IEMOCAP dataset show our model outperforms the compared methods with accuracy improvement.


Sensors ◽  
2021 ◽  
Vol 21 (4) ◽  
pp. 1249
Author(s):  
Babak Joze Abbaschian ◽  
Daniel Sierra-Sosa ◽  
Adel Elmaghraby

The advancements in neural networks and the on-demand need for accurate and near real-time Speech Emotion Recognition (SER) in human–computer interactions make it mandatory to compare available methods and databases in SER to achieve feasible solutions and a firmer understanding of this open-ended problem. The current study reviews deep learning approaches for SER with available datasets, followed by conventional machine learning techniques for speech emotion recognition. Ultimately, we present a multi-aspect comparison between practical neural network approaches in speech emotion recognition. The goal of this study is to provide a survey of the field of discrete speech emotion recognition.


Sensors ◽  
2020 ◽  
Vol 20 (22) ◽  
pp. 6688
Author(s):  
Sanghyun Lee ◽  
David K. Han ◽  
Hanseok Ko

Speech emotion recognition predicts the emotional state of a speaker based on the person’s speech. It brings an additional element for creating more natural human–computer interactions. Earlier studies on emotional recognition have been primarily based on handcrafted features and manual labels. With the advent of deep learning, there have been some efforts in applying the deep-network-based approach to the problem of emotion recognition. As deep learning automatically extracts salient features correlated to speaker emotion, it brings certain advantages over the handcrafted-feature-based methods. There are, however, some challenges in applying them to the emotion recognition problem, because data required for properly training deep networks are often lacking. Therefore, there is a need for a new deep-learning-based approach which can exploit available information from given speech signals to the maximum extent possible. Our proposed method, called “Fusion-ConvBERT”, is a parallel fusion model consisting of bidirectional encoder representations from transformers and convolutional neural networks. Extensive experiments were conducted on the proposed model using the EMO-DB and Interactive Emotional Dyadic Motion Capture Database emotion corpus, and it was shown that the proposed method outperformed state-of-the-art techniques in most of the test configurations.


2013 ◽  
Vol 38 (4) ◽  
pp. 465-470 ◽  
Author(s):  
Jingjie Yan ◽  
Xiaolan Wang ◽  
Weiyi Gu ◽  
LiLi Ma

Abstract Speech emotion recognition is deemed to be a meaningful and intractable issue among a number of do- mains comprising sentiment analysis, computer science, pedagogy, and so on. In this study, we investigate speech emotion recognition based on sparse partial least squares regression (SPLSR) approach in depth. We make use of the sparse partial least squares regression method to implement the feature selection and dimensionality reduction on the whole acquired speech emotion features. By the means of exploiting the SPLSR method, the component parts of those redundant and meaningless speech emotion features are lessened to zero while those serviceable and informative speech emotion features are maintained and selected to the following classification step. A number of tests on Berlin database reveal that the recogni- tion rate of the SPLSR method can reach up to 79.23% and is superior to other compared dimensionality reduction methods.


2019 ◽  
Vol 7 (3) ◽  
pp. 919-925
Author(s):  
Kelvin Wambani Siovi ◽  
Cheruiyot Willison Kipruto ◽  
Agnes Mindila

Sign in / Sign up

Export Citation Format

Share Document