Speech emotion recognition on mobile devices based on modulation spectral feature pooling and deep neural networks

In this paper, we propose a Deep Metric Learning (DML) approach that supports soft labels. DML seeks to learn representations that encode the similarity between examples through deep neural networks. DML generally presupposes that data can be divided into discrete classes using hard labels. However, some tasks, such as our exemplary domain of speech emotion recognition (SER), work with inherently subjective data, data for which it may not be possible to identify a single hard label. We propose a family of loss functions, fSimilarity Preservation Loss (f-SPL), based on the dual form of f-divergence for DML with soft labels. We show that the minimizer of f-SPL preserves the pairwise label similarities in the learned feature embeddings. We demonstrate the efficacy of the proposed loss function on the task of cross-corpus SER with soft labels. Our approach, which combines f-SPL and classification loss, significantly outperforms a baseline SER system with the same structure but trained with only classification loss in most experiments. We show that the presented techniques are more robust to over-training and can learn an embedding space in which the similarity between examples is meaningful.

Download Full-text

Speech Emotion Recognition using Deep Neural Networks

International Journal for Research in Applied Science and Engineering Technology ◽

10.22214/ijraset.2020.6395 ◽

2020 ◽

Vol 8 (6) ◽

pp. 2460-2465

Author(s):

Balaji Dharamsoth

Keyword(s):

Neural Networks ◽

Emotion Recognition ◽

Deep Neural Networks ◽

Speech Emotion Recognition

Download Full-text

Interaffection of Multiple Datasets with Neural Networks in Speech Emotion Recognition

10.5753/eniac.2020.12141 ◽

2020 ◽

Author(s):

Ronnypetson Da Silva ◽

Valter M. Filho ◽

Mario Souza

Keyword(s):

Neural Network ◽

Neural Networks ◽

Emotion Recognition ◽

Deep Neural Networks ◽

Speech Emotion Recognition ◽

Network Architectures ◽

Shared Representations ◽

Multiple Datasets ◽

Neural Network Architectures

Many works that apply Deep Neural Networks (DNNs) to Speech Emotion Recognition (SER) use single datasets or train and evaluate the models separately when using multiple datasets. Those datasets are constructed with specific guidelines and the subjective nature of the labels for SER makes it difficult to obtain robust and general models. We investigate how DNNs learn shared representations for different datasets in both multi-task and unified setups. We also analyse how each dataset benefits from others in different combinations of datasets and popular neural network architectures. We show that the longstanding belief of more data resulting in more general models doesn’t always hold for SER, as different dataset and meta-parameter combinations hold the best result for each of the analysed datasets.

Download Full-text

Speech emotion recognition based on Gaussian Mixture Models and Deep Neural Networks

2017 Information Theory and Applications Workshop (ITA) ◽

10.1109/ita.2017.8023477 ◽

2017 ◽

Cited By ~ 5

Author(s):

Ivan J. Tashev ◽

Zhong-Qiu Wang ◽

Keith Godin

Keyword(s):

Neural Networks ◽

Emotion Recognition ◽

Mixture Models ◽

Deep Neural Networks ◽

Gaussian Mixture Models ◽

Gaussian Mixture ◽

Speech Emotion Recognition

Download Full-text

On the Robustness of Speech Emotion Recognition for Human-Robot Interaction with Deep Neural Networks

2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) ◽

10.1109/iros.2018.8593571 ◽

2018 ◽

Cited By ~ 5

Author(s):

Egor Lakomkin ◽

Mohammad Ali Zamani ◽

Cornelius Weber ◽

Sven Magg ◽

Stefan Wermter

Keyword(s):

Neural Networks ◽

Emotion Recognition ◽

Deep Neural Networks ◽

Human Robot Interaction ◽

Speech Emotion Recognition ◽

Robot Interaction

Download Full-text

End-to-End Speech Emotion Recognition Using Deep Neural Networks

2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) ◽

10.1109/icassp.2018.8462677 ◽

2018 ◽

Cited By ~ 31

Author(s):

Panagiotis Tzirakis ◽

Jiehao Zhang ◽

Bjorn W. Schuller

Keyword(s):

Neural Networks ◽

Emotion Recognition ◽

Deep Neural Networks ◽

Speech Emotion Recognition ◽

End To End

Download Full-text

Mobility-Included DNN Partition Offloading from Mobile Devices to Edge Clouds

Sensors ◽

10.3390/s21010229 ◽

2021 ◽

Vol 21 (1) ◽

pp. 229

Author(s):

Xianzhong Tian ◽

Juan Zhu ◽

Ting Xu ◽

Yanjun Li

Keyword(s):

Neural Networks ◽

Energy Consumption ◽

Mobile Devices ◽

Wireless Network ◽

Deep Neural Networks ◽

Mobile User ◽

Computation Offloading ◽

Long Latency ◽

Total Latency ◽

And Performance

The latest results in Deep Neural Networks (DNNs) have greatly improved the accuracy and performance of a variety of intelligent applications. However, running such computation-intensive DNN-based applications on resource-constrained mobile devices definitely leads to long latency and huge energy consumption. The traditional way is performing DNNs in the central cloud, but it requires significant amounts of data to be transferred to the cloud over the wireless network and also results in long latency. To solve this problem, offloading partial DNN computation to edge clouds has been proposed, to realize the collaborative execution between mobile devices and edge clouds. In addition, the mobility of mobile devices is easily to cause the computation offloading failure. In this paper, we develop a mobility-included DNN partition offloading algorithm (MDPO) to adapt to user’s mobility. The objective of MDPO is minimizing the total latency of completing a DNN job when the mobile user is moving. The MDPO algorithm is suitable for both DNNs with chain topology and graphic topology. We evaluate the performance of our proposed MDPO compared to local-only execution and edge-only execution, experiments show that MDPO significantly reduces the total latency and improves the performance of DNN, and MDPO can adjust well to different network conditions.

Download Full-text

Speech Emotion Recognition Using Convolution Neural Networks

2021 International Conference on Artificial Intelligence and Smart Systems (ICAIS) ◽

10.1109/icais50930.2021.9395844 ◽

2021 ◽

Author(s):

Krishna Chauhan ◽

Kamalesh Kumar Sharma ◽

Tarun Varma

Keyword(s):

Neural Networks ◽

Emotion Recognition ◽

Speech Emotion Recognition ◽

Convolution Neural Networks

Download Full-text

Speech emotion recognition on mobile devices based on modulation spectral feature pooling and deep neural networks

Speech Emotion Recognition Using Deep Neural Networks on Multilingual Databases

Towards real-time Speech Emotion Recognition using deep neural networks

f-Similarity Preservation Loss for Soft Labels: A Demonstration on Cross-Corpus Speech Emotion Recognition

Speech Emotion Recognition using Deep Neural Networks

Interaffection of Multiple Datasets with Neural Networks in Speech Emotion Recognition

Speech emotion recognition based on Gaussian Mixture Models and Deep Neural Networks

On the Robustness of Speech Emotion Recognition for Human-Robot Interaction with Deep Neural Networks

End-to-End Speech Emotion Recognition Using Deep Neural Networks

Mobility-Included DNN Partition Offloading from Mobile Devices to Edge Clouds

Speech Emotion Recognition Using Convolution Neural Networks

Export Citation Format