f-Similarity Preservation Loss for Soft Labels: A Demonstration on Cross-Corpus Speech Emotion Recognition

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33015725 ◽

2019 ◽

Vol 33 ◽

pp. 5725-5732

Author(s):

Biqiao Zhang ◽

Yuqing Kong ◽

Georg Essl ◽

Emily Mower Provost

Keyword(s):

Neural Networks ◽

Emotion Recognition ◽

Loss Function ◽

Deep Neural Networks ◽

Metric Learning ◽

Loss Functions ◽

Speech Emotion Recognition ◽

Subjective Data ◽

Dual Form ◽

Deep Metric Learning

In this paper, we propose a Deep Metric Learning (DML) approach that supports soft labels. DML seeks to learn representations that encode the similarity between examples through deep neural networks. DML generally presupposes that data can be divided into discrete classes using hard labels. However, some tasks, such as our exemplary domain of speech emotion recognition (SER), work with inherently subjective data, data for which it may not be possible to identify a single hard label. We propose a family of loss functions, fSimilarity Preservation Loss (f-SPL), based on the dual form of f-divergence for DML with soft labels. We show that the minimizer of f-SPL preserves the pairwise label similarities in the learned feature embeddings. We demonstrate the efficacy of the proposed loss function on the task of cross-corpus SER with soft labels. Our approach, which combines f-SPL and classification loss, significantly outperforms a baseline SER system with the same structure but trained with only classification loss in most experiments. We show that the presented techniques are more robust to over-training and can learn an embedding space in which the similarity between examples is meaningful.

Download Full-text

Speech Emotion Recognition Using Deep Neural Networks on Multilingual Databases

Advances in Robotics, Automation and Data Analytics - Advances in Intelligent Systems and Computing ◽

10.1007/978-3-030-70917-4_3 ◽

2021 ◽

pp. 21-30

Author(s):

Syed Asif Ahmad Qadri ◽

Teddy Surya Gunawan ◽

Taiba Majid Wani ◽

Eliathamby Ambikairajah ◽

Mira Kartiwi ◽

...

Keyword(s):

Neural Networks ◽

Emotion Recognition ◽

Deep Neural Networks ◽

Speech Emotion Recognition

Download Full-text

Towards real-time Speech Emotion Recognition using deep neural networks

2015 9th International Conference on Signal Processing and Communication Systems (ICSPCS) ◽

10.1109/icspcs.2015.7391796 ◽

2015 ◽

Cited By ~ 18

Author(s):

H.M. Fayek ◽

M. Lech ◽

L. Cavedon

Keyword(s):

Neural Networks ◽

Emotion Recognition ◽

Real Time ◽

Deep Neural Networks ◽

Speech Emotion Recognition

Download Full-text

Speech emotion recognition on mobile devices based on modulation spectral feature pooling and deep neural networks

2017 IEEE International Symposium on Signal Processing and Information Technology (ISSPIT) ◽

10.1109/isspit.2017.8388669 ◽

2017 ◽

Cited By ~ 2

Author(s):

Anderson R. Avila ◽

Joao Monteiro ◽

Douglas O'Shaughneussy ◽

Tiago H. Falk

Keyword(s):

Neural Networks ◽

Emotion Recognition ◽

Mobile Devices ◽

Deep Neural Networks ◽

Spectral Feature ◽

Speech Emotion Recognition ◽

Feature Pooling

Download Full-text

Stratified neural networks in a time-to-event setting

10.1101/2021.02.01.429169 ◽

2021 ◽

Author(s):

Fabrizio Kuruc ◽

Harald Binder ◽

Moritz Hess

Keyword(s):

Neural Networks ◽

Loss Function ◽

Deep Neural Networks ◽

Proportional Hazards ◽

Proportional Hazards Model ◽

Cox Proportional Hazards ◽

Cox Proportional Hazards Model ◽

Loss Functions ◽

Partial Likelihood ◽

Hazards Model

AbstractDeep neural networks are now frequently employed to predict survival conditional on omics-type biomarkers, e.g. by employing the partial likelihood of Cox proportional hazards model as loss function. Due to the generally limited number of observations in clinical studies, combining different data-sets has been proposed to improve learning of network parameters. However, if baseline hazards differ between the studies, the assumptions of Cox proportional hazards model are violated. Based on high dimensional transcriptome profiles from different tumor entities, we demonstrate how using a stratified partial likelihood as loss function allows for accounting for the different baseline hazards in a deep learning framework. Additionally, we compare the partial likelihood with the ranking loss, which is frequently employed as loss function in machine learning approaches due to its seemingly simplicity. Using RNA-seq data from the Cancer Genome Atlas (TCGA) we show that use of stratified loss functions leads to an overall better discriminatory power and lower prediction error compared to their nonstratified counterparts. We investigate which genes are identified to have the greatest marginal impact on prediction of survival when using different loss functions. We find that while similar genes are identified, in particular known prognostic genes receive higher importance from stratified loss functions. Taken together, pooling data from different sources for improved parameter learning of deep neural networks benefits largely from employing stratified loss functions that consider potentially varying baseline hazards. For easy application, we provide PyTorch code for stratified loss functions and an explanatory Jupyter notebook in a GitHub repository.

Download Full-text

Speech Emotion Recognition using Deep Neural Networks

International Journal for Research in Applied Science and Engineering Technology ◽

10.22214/ijraset.2020.6395 ◽

2020 ◽

Vol 8 (6) ◽

pp. 2460-2465

Author(s):

Balaji Dharamsoth

Keyword(s):

Neural Networks ◽

Emotion Recognition ◽

Deep Neural Networks ◽

Speech Emotion Recognition

Download Full-text

Interaffection of Multiple Datasets with Neural Networks in Speech Emotion Recognition

10.5753/eniac.2020.12141 ◽

2020 ◽

Author(s):

Ronnypetson Da Silva ◽

Valter M. Filho ◽

Mario Souza

Keyword(s):

Neural Network ◽

Neural Networks ◽

Emotion Recognition ◽

Deep Neural Networks ◽

Speech Emotion Recognition ◽

Network Architectures ◽

Shared Representations ◽

Multiple Datasets ◽

Neural Network Architectures

Many works that apply Deep Neural Networks (DNNs) to Speech Emotion Recognition (SER) use single datasets or train and evaluate the models separately when using multiple datasets. Those datasets are constructed with specific guidelines and the subjective nature of the labels for SER makes it difficult to obtain robust and general models. We investigate how DNNs learn shared representations for different datasets in both multi-task and unified setups. We also analyse how each dataset benefits from others in different combinations of datasets and popular neural network architectures. We show that the longstanding belief of more data resulting in more general models doesn’t always hold for SER, as different dataset and meta-parameter combinations hold the best result for each of the analysed datasets.

Download Full-text

Stochastic Loss Function

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i04.5925 ◽

2020 ◽

Vol 34 (04) ◽

pp. 4884-4891

Author(s):

Qingliang Liu ◽

Jinmei Lai

Keyword(s):

Neural Networks ◽

Loss Function ◽

Real World ◽

Optimization Problem ◽

Deep Neural Networks ◽

Back Propagation ◽

Loss Functions ◽

Joint Optimization ◽

Neural Machine Translation ◽

Deep Networks

Training deep neural networks is inherently subject to the predefined and fixed loss functions during optimizing. To improve learning efficiency, we develop Stochastic Loss Function (SLF) to dynamically and automatically generating appropriate gradients to train deep networks in the same round of back-propagation, while maintaining the completeness and differentiability of the training pipeline. In SLF, a generic loss function is formulated as a joint optimization problem of network weights and loss parameters. In order to guarantee the requisite efficiency, gradients with the respect to the generic differentiable loss are leveraged for selecting loss function and optimizing network weights. Extensive experiments on a variety of popular datasets strongly demonstrate that SLF is capable of obtaining appropriate gradients at different stages during training, and can significantly improve the performance of various deep models on real world tasks including classification, clustering, regression, neural machine translation, and objection detection.

Download Full-text

Rare bioparticle detection via deep metric learning

RSC Advances ◽

10.1039/d1ra02869c ◽

2021 ◽

Vol 11 (29) ◽

pp. 17603-17610

Author(s):

Shaobo Luo ◽

Yuzhi Shi ◽

Lip Ket Chin ◽

Yi Zhang ◽

Bihan Wen ◽

...

Keyword(s):

Neural Network ◽

Neural Networks ◽

Network Model ◽

Neural Network Model ◽

Deep Neural Networks ◽

Metric Learning ◽

Practical Applications ◽

Deep Metric Learning

Conventional deep neural networks use simple classifiers to obtain highly accurate results. However, they have limitations in practical applications. This study demonstrates a robust deep metric neural network model for rare bioparticle detection.

Download Full-text

Speech emotion recognition based on Gaussian Mixture Models and Deep Neural Networks

2017 Information Theory and Applications Workshop (ITA) ◽

10.1109/ita.2017.8023477 ◽

2017 ◽

Cited By ~ 5

Author(s):

Ivan J. Tashev ◽

Zhong-Qiu Wang ◽

Keith Godin

Keyword(s):

Neural Networks ◽

Emotion Recognition ◽

Mixture Models ◽

Deep Neural Networks ◽

Gaussian Mixture Models ◽

Gaussian Mixture ◽

Speech Emotion Recognition

Download Full-text

On the Robustness of Speech Emotion Recognition for Human-Robot Interaction with Deep Neural Networks

2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) ◽

10.1109/iros.2018.8593571 ◽

2018 ◽

Cited By ~ 5

Author(s):

Egor Lakomkin ◽

Mohammad Ali Zamani ◽

Cornelius Weber ◽

Sven Magg ◽

Stefan Wermter

Keyword(s):

Neural Networks ◽

Emotion Recognition ◽

Deep Neural Networks ◽

Human Robot Interaction ◽

Speech Emotion Recognition ◽

Robot Interaction

Download Full-text