scholarly journals On the Robustness of Speech Emotion Recognition for Human-Robot Interaction with Deep Neural Networks

Author(s):  
Egor Lakomkin ◽  
Mohammad Ali Zamani ◽  
Cornelius Weber ◽  
Sven Magg ◽  
Stefan Wermter
Author(s):  
Syed Asif Ahmad Qadri ◽  
Teddy Surya Gunawan ◽  
Taiba Majid Wani ◽  
Eliathamby Ambikairajah ◽  
Mira Kartiwi ◽  
...  

Author(s):  
Biqiao Zhang ◽  
Yuqing Kong ◽  
Georg Essl ◽  
Emily Mower Provost

In this paper, we propose a Deep Metric Learning (DML) approach that supports soft labels. DML seeks to learn representations that encode the similarity between examples through deep neural networks. DML generally presupposes that data can be divided into discrete classes using hard labels. However, some tasks, such as our exemplary domain of speech emotion recognition (SER), work with inherently subjective data, data for which it may not be possible to identify a single hard label. We propose a family of loss functions, fSimilarity Preservation Loss (f-SPL), based on the dual form of f-divergence for DML with soft labels. We show that the minimizer of f-SPL preserves the pairwise label similarities in the learned feature embeddings. We demonstrate the efficacy of the proposed loss function on the task of cross-corpus SER with soft labels. Our approach, which combines f-SPL and classification loss, significantly outperforms a baseline SER system with the same structure but trained with only classification loss in most experiments. We show that the presented techniques are more robust to over-training and can learn an embedding space in which the similarity between examples is meaningful.


2020 ◽  
Author(s):  
Ronnypetson Da Silva ◽  
Valter M. Filho ◽  
Mario Souza

Many works that apply Deep Neural Networks (DNNs) to Speech Emotion Recognition (SER) use single datasets or train and evaluate the models separately when using multiple datasets. Those datasets are constructed with specific guidelines and the subjective nature of the labels for SER makes it difficult to obtain robust and general models. We investigate how DNNs learn shared representations for different datasets in both multi-task and unified setups. We also analyse how each dataset benefits from others in different combinations of datasets and popular neural network architectures. We show that the longstanding belief of more data resulting in more general models doesn’t always hold for SER, as different dataset and meta-parameter combinations hold the best result for each of the analysed datasets.


2020 ◽  
Vol 509 ◽  
pp. 150-163 ◽  
Author(s):  
Luefeng Chen ◽  
Wanjuan Su ◽  
Yu Feng ◽  
Min Wu ◽  
Jinhua She ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document