Deep Cross-Corpus Speech Emotion Recognition: Recent Advances and Perspectives

Survey of Deep Representation Learning for Speech Emotion Recognition

10.36227/techrxiv.16689484 ◽

2021 ◽

Author(s):

Siddique Latif ◽

Rajib Rana ◽

Sara Khalifa ◽

Raja Jurdak ◽

Junaid Qadir ◽

...

Keyword(s):

Emotion Recognition ◽

General Setting ◽

Representation Learning ◽

Data Driven ◽

Speech Emotion Recognition ◽

Feature Engineering ◽

Acoustic Features ◽

Learning Techniques ◽

Comprehensive Survey ◽

Hierarchical Representations

<div>Traditionally, speech emotion recognition (SER) research has relied on manually handcrafted acoustic features using feature engineering. However, the design of handcrafted features for complex SER tasks requires significant manual effort, which impedes generalisability and slows the pace of innovation. This has motivated the adoption of representation learning techniques that can automatically learn an intermediate representation of the input signal without any manual feature engineering. Representation learning has led to improved SER performance and enabled rapid innovation. Its effectiveness has further increased with advances in deep learning (DL), which has facilitated deep representation learning where hierarchical representations are automatically learned in a data-driven manner. This paper presents the first comprehensive survey on the important topic of deep representation learning for SER. We highlight various techniques, related challenges and identify important future areas of research. Our survey bridges the gap in the literature since existing surveys either focus on SER with hand-engineered features or representation learning in the general setting without focusing on SER.</div>

Download Full-text

Affective Model Based Speech Emotion Recognition Using Deep Learning Techniques

Indian Journal of Computer Science ◽

10.17010/ijcs/2020/v5/i4-5/154783 ◽

2020 ◽

Vol 5 (4&5) ◽

pp. 9

Author(s):

D. Karthika Renuka ◽

C. Akalya Devi ◽

R. Kiruba Tharani ◽

G. Pooventhiran

Keyword(s):

Deep Learning ◽

Emotion Recognition ◽

Speech Emotion Recognition ◽

Model Based ◽

Learning Techniques

Download Full-text

Deep Learning Techniques for Speech Emotion Recognition, from Databases to Models

Sensors ◽

10.3390/s21041249 ◽

2021 ◽

Vol 21 (4) ◽

pp. 1249

Author(s):

Babak Joze Abbaschian ◽

Daniel Sierra-Sosa ◽

Adel Elmaghraby

Keyword(s):

Deep Learning ◽

Emotion Recognition ◽

Machine Learning Techniques ◽

Speech Emotion Recognition ◽

Learning Approaches ◽

Human Computer Interactions ◽

Learning Techniques ◽

Conventional Machine ◽

Feasible Solutions ◽

Network Approaches

The advancements in neural networks and the on-demand need for accurate and near real-time Speech Emotion Recognition (SER) in human–computer interactions make it mandatory to compare available methods and databases in SER to achieve feasible solutions and a firmer understanding of this open-ended problem. The current study reviews deep learning approaches for SER with available datasets, followed by conventional machine learning techniques for speech emotion recognition. Ultimately, we present a multi-aspect comparison between practical neural network approaches in speech emotion recognition. The goal of this study is to provide a survey of the field of discrete speech emotion recognition.

Download Full-text

Speech Emotion Recognition Using Deep Learning Techniques

ABC Journal of Advanced Research ◽

10.18034/abcjar.v5i2.550 ◽

2016 ◽

Vol 5 (2) ◽

pp. 113-122 ◽

Cited By ~ 1

Author(s):

Apoorva Ganapathy ◽

Keyword(s):

Deep Learning ◽

Emotion Recognition ◽

Speech Emotion Recognition ◽

Neural Systems ◽

Emotion Detection ◽

Learning Methods ◽

Computer Interfaces ◽

Human Computer Interfaces ◽

Learning Techniques ◽

Actual Speech

The developments in neural systems and the high demand requirement for exact and close actual Speech Emotion Recognition in human-computer interfaces mark it compulsory to liken existing methods and datasets in speech emotion detection to accomplish practicable clarifications and a securer comprehension of this unrestricted issue. The present investigation assessed deep learning methods for speech emotion detection with accessible datasets, tracked by predictable machine learning methods for SER. Finally, we present-day a multi-aspect assessment between concrete neural network methods in SER. The objective of this investigation is to deliver a review of the area of distinct SER.

Download Full-text

Deep Emotion Recognition in Dynamic Data using Facial, Speech and Textual Cues: A Survey

10.36227/techrxiv.15184302.v1 ◽

2021 ◽

Author(s):

Tao Zhang ◽

Zhenhua Tan

Keyword(s):

Deep Learning ◽

Emotion Recognition ◽

State Of The Art ◽

The State ◽

Research Progress ◽

Speech Emotion Recognition ◽

Expression Recognition ◽

Learning Techniques ◽

Textual Cues ◽

Definition Of

With the development of social media and human-computer interaction, video has become one of the most common data formats. As a research hotspot, emotion recognition system is essential to serve people by perceiving people’s emotional state in videos. In recent years, a large number of studies focus on tackling the issue of emotion recognition based on three most common modalities in videos, that is, face, speech and text. The focus of this paper is to sort out the relevant studies of emotion recognition using facial, speech and textual cues due to the lack of review papers concentrating on the three modalities. On the other hand, because of the effective leverage of deep learning techniques to learn latent representation for emotion recognition, this paper focuses on the emotion recognition method based on deep learning techniques. In this paper, we firstly introduce widely accepted emotion models for the purpose of interpreting the definition of emotion. Then we introduce the state-of-the-art for emotion recognition based on unimodality including facial expression recognition, speech emotion recognition and textual emotion recognition. For multimodal emotion recognition, we summarize the feature-level and decision-level fusion methods in detail. In addition, the description of relevant benchmark datasets, the definition of metrics and the performance of the state-of-the-art in recent years are also outlined for the convenience of readers to find out the current research progress. Ultimately, we explore some potential research challenges and opportunities to give researchers reference for the enrichment of emotion recognition-related researches.

Download Full-text

Survey of Deep Representation Learning for Speech Emotion Recognition

10.36227/techrxiv.16689484.v1 ◽

2021 ◽

Author(s):

Siddique Latif ◽

Rajib Rana ◽

Sara Khalifa ◽

Raja Jurdak ◽

Junaid Qadir ◽

...

Keyword(s):

Emotion Recognition ◽

General Setting ◽

Representation Learning ◽

Data Driven ◽

Speech Emotion Recognition ◽

Feature Engineering ◽

Acoustic Features ◽

Learning Techniques ◽

Comprehensive Survey ◽

Hierarchical Representations

<div>Traditionally, speech emotion recognition (SER) research has relied on manually handcrafted acoustic features using feature engineering. However, the design of handcrafted features for complex SER tasks requires significant manual effort, which impedes generalisability and slows the pace of innovation. This has motivated the adoption of representation learning techniques that can automatically learn an intermediate representation of the input signal without any manual feature engineering. Representation learning has led to improved SER performance and enabled rapid innovation. Its effectiveness has further increased with advances in deep learning (DL), which has facilitated deep representation learning where hierarchical representations are automatically learned in a data-driven manner. This paper presents the first comprehensive survey on the important topic of deep representation learning for SER. We highlight various techniques, related challenges and identify important future areas of research. Our survey bridges the gap in the literature since existing surveys either focus on SER with hand-engineered features or representation learning in the general setting without focusing on SER.</div>

Download Full-text

Speech Emotion Recognition Using Deep Learning Techniques: A Review

IEEE Access ◽

10.1109/access.2019.2936124 ◽

2019 ◽

Vol 7 ◽

pp. 117327-117345 ◽

Cited By ~ 36

Author(s):

Ruhul Amin Khalil ◽

Edward Jones ◽

Mohammad Inayatullah Babar ◽

Tariqullah Jan ◽

Mohammad Haseeb Zafar ◽

...

Keyword(s):

Deep Learning ◽

Emotion Recognition ◽

Speech Emotion Recognition ◽

Learning Techniques

Download Full-text

Language dialect based speech emotion recognition through deep learning techniques

International Journal of Speech Technology ◽

10.1007/s10772-021-09838-8 ◽

2021 ◽

Author(s):

Sukumar Rajendran ◽

Sandeep Kumar Mathivanan ◽

Prabhu Jayagopal ◽

Maheshwari Venkatasen ◽

Thanapal Pandi ◽

...

Keyword(s):

Deep Learning ◽

Emotion Recognition ◽

Speech Emotion Recognition ◽

Learning Techniques

Download Full-text

Unsupervised Feature Learning for Speech Emotion Recognition Based on Autoencoder

Electronics ◽

10.3390/electronics10172086 ◽

2021 ◽

Vol 10 (17) ◽

pp. 2086

Author(s):

Yangwei Ying ◽

Yuanwu Tu ◽

Hong Zhou

Keyword(s):

Emotion Recognition ◽

Data Augmentation ◽

Feature Learning ◽

Human Potential ◽

Speech Emotion Recognition ◽

Unsupervised Feature Learning ◽

Learning Techniques ◽

Speech Data ◽

Data Division ◽

Speech Features

Speech signals contain abundant information on personal emotions, which plays an important part in the representation of human potential characteristics and expressions. However, the deficiency of emotion speech data affects the development of speech emotion recognition (SER), which also limits the promotion of recognition accuracy. Currently, the most effective approach is to make use of unsupervised feature learning techniques to extract speech features from available speech data and generate emotion classifiers with these features. In this paper, we proposed to implement autoencoders such as a denoising autoencoder (DAE) and an adversarial autoencoder (AAE) to extract the features from LibriSpeech for model pre-training, and then conducted experiments on the Interactive Emotional Dyadic Motion Capture (IEMOCAP) datasets for classification. Considering the imbalance of data distribution in IEMOCAP, we developed a novel data augmentation approach to optimize the overlap shift between consecutive segments and redesigned the data division. The best classification accuracy reached 78.67% (weighted accuracy, WA) and 76.89% (unweighted accuracy, UA) with AAE. Compared with state-of-the-art results to our knowledge (76.18% of WA and 76.36% of UA with the supervised learning method), we achieved a slight advantage. This suggests that using unsupervised learning benefits the development of SER and provides a new approach to eliminate the problem of data scarcity.

Download Full-text

Deep Learning Techniques for Speech Emotion Recognition: A Review

2019 29th International Conference Radioelektronika (RADIOELEKTRONIKA) ◽

10.1109/radioelek.2019.8733432 ◽

2019 ◽

Cited By ~ 4

Author(s):

Sandeep Kumar Pandey ◽

H. S. Shekhawat ◽

S. R. M. Prasanna

Keyword(s):

Deep Learning ◽

Emotion Recognition ◽

Speech Emotion Recognition ◽

Learning Techniques

Download Full-text