Speech emotion recognition on mobile devices based on modulation spectral feature pooling and deep neural networks

Author(s):  
Anderson R. Avila ◽  
Joao Monteiro ◽  
Douglas O'Shaughneussy ◽  
Tiago H. Falk
Author(s):  
Syed Asif Ahmad Qadri ◽  
Teddy Surya Gunawan ◽  
Taiba Majid Wani ◽  
Eliathamby Ambikairajah ◽  
Mira Kartiwi ◽  
...  

Author(s):  
Biqiao Zhang ◽  
Yuqing Kong ◽  
Georg Essl ◽  
Emily Mower Provost

In this paper, we propose a Deep Metric Learning (DML) approach that supports soft labels. DML seeks to learn representations that encode the similarity between examples through deep neural networks. DML generally presupposes that data can be divided into discrete classes using hard labels. However, some tasks, such as our exemplary domain of speech emotion recognition (SER), work with inherently subjective data, data for which it may not be possible to identify a single hard label. We propose a family of loss functions, fSimilarity Preservation Loss (f-SPL), based on the dual form of f-divergence for DML with soft labels. We show that the minimizer of f-SPL preserves the pairwise label similarities in the learned feature embeddings. We demonstrate the efficacy of the proposed loss function on the task of cross-corpus SER with soft labels. Our approach, which combines f-SPL and classification loss, significantly outperforms a baseline SER system with the same structure but trained with only classification loss in most experiments. We show that the presented techniques are more robust to over-training and can learn an embedding space in which the similarity between examples is meaningful.


2020 ◽  
Author(s):  
Ronnypetson Da Silva ◽  
Valter M. Filho ◽  
Mario Souza

Many works that apply Deep Neural Networks (DNNs) to Speech Emotion Recognition (SER) use single datasets or train and evaluate the models separately when using multiple datasets. Those datasets are constructed with specific guidelines and the subjective nature of the labels for SER makes it difficult to obtain robust and general models. We investigate how DNNs learn shared representations for different datasets in both multi-task and unified setups. We also analyse how each dataset benefits from others in different combinations of datasets and popular neural network architectures. We show that the longstanding belief of more data resulting in more general models doesn’t always hold for SER, as different dataset and meta-parameter combinations hold the best result for each of the analysed datasets.


Sensors ◽  
2021 ◽  
Vol 21 (1) ◽  
pp. 229
Author(s):  
Xianzhong Tian ◽  
Juan Zhu ◽  
Ting Xu ◽  
Yanjun Li

The latest results in Deep Neural Networks (DNNs) have greatly improved the accuracy and performance of a variety of intelligent applications. However, running such computation-intensive DNN-based applications on resource-constrained mobile devices definitely leads to long latency and huge energy consumption. The traditional way is performing DNNs in the central cloud, but it requires significant amounts of data to be transferred to the cloud over the wireless network and also results in long latency. To solve this problem, offloading partial DNN computation to edge clouds has been proposed, to realize the collaborative execution between mobile devices and edge clouds. In addition, the mobility of mobile devices is easily to cause the computation offloading failure. In this paper, we develop a mobility-included DNN partition offloading algorithm (MDPO) to adapt to user’s mobility. The objective of MDPO is minimizing the total latency of completing a DNN job when the mobile user is moving. The MDPO algorithm is suitable for both DNNs with chain topology and graphic topology. We evaluate the performance of our proposed MDPO compared to local-only execution and edge-only execution, experiments show that MDPO significantly reduces the total latency and improves the performance of DNN, and MDPO can adjust well to different network conditions.


Sign in / Sign up

Export Citation Format

Share Document