Performance Improvement of Speech Emotion Recognition Model Using Generative Adversarial Networks

2019 ◽  
Vol 17 (11) ◽  
pp. 77-85
Author(s):  
You-Jung Ko ◽  
Yoon-Joong Kim
Author(s):  
Arash Shilandari ◽  
Hossein Marvi ◽  
Hossein Khosravi

Nowadays, and with the mechanization of life, speech processing has become so crucial for the interaction between humans and machines. Deep neural networks require a database with enough data for training. The more features are extracted from the speech signal, the more samples are needed to train these networks. Adequate training of these networks can be ensured when there is access to sufficient and varied data in each class. If there is not enough data; it is possible to use data augmentation methods to obtain a database with enough samples. One of the obstacles to developing speech emotion recognition systems is the Data sparsity problem in each class for neural network training. The current study has focused on making a cycle generative adversarial network for data augmentation in a system for speech emotion recognition. For each of the five emotions employed, an adversarial generating network is designed to generate data that is very similar to the main data in that class, as well as differentiate the emotions of the other classes. These networks are taught in an adversarial way to produce feature vectors like each class in the space of the main feature, and then they add to the training sets existing in the database to train the classifier network. Instead of using the common cross-entropy error to train generative adversarial networks and to remove the vanishing gradient problem, Wasserstein Divergence has been used to produce high-quality artificial samples. The suggested network has been tested to be applied for speech emotion recognition using EMODB as training, testing, and evaluating sets, and the quality of artificial data evaluated using two Support Vector Machine (SVM) and Deep Neural Network (DNN) classifiers. Moreover, it has been revealed that extracting and reproducing high-level features from acoustic features, speech emotion recognition with separating five primary emotions has been done with acceptable accuracy.


2020 ◽  
Author(s):  
Siddique Latif ◽  
Muhammad Asim ◽  
Rajib Rana ◽  
Sara Khalifa ◽  
Raja Jurdak ◽  
...  

2020 ◽  
Vol 140 ◽  
pp. 358-365
Author(s):  
Zijiang Zhu ◽  
Weihuang Dai ◽  
Yi Hu ◽  
Junshan Li

Sign in / Sign up

Export Citation Format

Share Document