scholarly journals Deep Emotion Recognition in Dynamic Data using Facial, Speech and Textual Cues: A Survey

Author(s):  
Tao Zhang ◽  
Zhenhua Tan

With the development of social media and human-computer interaction, video has become one of the most common data formats. As a research hotspot, emotion recognition system is essential to serve people by perceiving people’s emotional state in videos. In recent years, a large number of studies focus on tackling the issue of emotion recognition based on three most common modalities in videos, that is, face, speech and text. The focus of this paper is to sort out the relevant studies of emotion recognition using facial, speech and textual cues due to the lack of review papers concentrating on the three modalities. On the other hand, because of the effective leverage of deep learning techniques to learn latent representation for emotion recognition, this paper focuses on the emotion recognition method based on deep learning techniques. In this paper, we firstly introduce widely accepted emotion models for the purpose of interpreting the definition of emotion. Then we introduce the state-of-the-art for emotion recognition based on unimodality including facial expression recognition, speech emotion recognition and textual emotion recognition. For multimodal emotion recognition, we summarize the feature-level and decision-level fusion methods in detail. In addition, the description of relevant benchmark datasets, the definition of metrics and the performance of the state-of-the-art in recent years are also outlined for the convenience of readers to find out the current research progress. Ultimately, we explore some potential research challenges and opportunities to give researchers reference for the enrichment of emotion recognition-related researches.

2021 ◽  
Author(s):  
Tao Zhang ◽  
Zhenhua Tan

With the development of social media and human-computer interaction, video has become one of the most common data formats. As a research hotspot, emotion recognition system is essential to serve people by perceiving people’s emotional state in videos. In recent years, a large number of studies focus on tackling the issue of emotion recognition based on three most common modalities in videos, that is, face, speech and text. The focus of this paper is to sort out the relevant studies of emotion recognition using facial, speech and textual cues due to the lack of review papers concentrating on the three modalities. On the other hand, because of the effective leverage of deep learning techniques to learn latent representation for emotion recognition, this paper focuses on the emotion recognition method based on deep learning techniques. In this paper, we firstly introduce widely accepted emotion models for the purpose of interpreting the definition of emotion. Then we introduce the state-of-the-art for emotion recognition based on unimodality including facial expression recognition, speech emotion recognition and textual emotion recognition. For multimodal emotion recognition, we summarize the feature-level and decision-level fusion methods in detail. In addition, the description of relevant benchmark datasets, the definition of metrics and the performance of the state-of-the-art in recent years are also outlined for the convenience of readers to find out the current research progress. Ultimately, we explore some potential research challenges and opportunities to give researchers reference for the enrichment of emotion recognition-related researches.


Sensors ◽  
2021 ◽  
Vol 21 (13) ◽  
pp. 4486
Author(s):  
Niall O’Mahony ◽  
Sean Campbell ◽  
Lenka Krpalkova ◽  
Anderson Carvalho ◽  
Joseph Walsh ◽  
...  

Fine-grained change detection in sensor data is very challenging for artificial intelligence though it is critically important in practice. It is the process of identifying differences in the state of an object or phenomenon where the differences are class-specific and are difficult to generalise. As a result, many recent technologies that leverage big data and deep learning struggle with this task. This review focuses on the state-of-the-art methods, applications, and challenges of representation learning for fine-grained change detection. Our research focuses on methods of harnessing the latent metric space of representation learning techniques as an interim output for hybrid human-machine intelligence. We review methods for transforming and projecting embedding space such that significant changes can be communicated more effectively and a more comprehensive interpretation of underlying relationships in sensor data is facilitated. We conduct this research in our work towards developing a method for aligning the axes of latent embedding space with meaningful real-world metrics so that the reasoning behind the detection of change in relation to past observations may be revealed and adjusted. This is an important topic in many fields concerned with producing more meaningful and explainable outputs from deep learning and also for providing means for knowledge injection and model calibration in order to maintain user confidence.


Computers ◽  
2020 ◽  
Vol 9 (2) ◽  
pp. 37 ◽  
Author(s):  
Luca Cappelletti ◽  
Tommaso Fontana ◽  
Guido Walter Di Donato ◽  
Lorenzo Di Tucci ◽  
Elena Casiraghi ◽  
...  

Missing data imputation has been a hot topic in the past decade, and many state-of-the-art works have been presented to propose novel, interesting solutions that have been applied in a variety of fields. In the past decade, the successful results achieved by deep learning techniques have opened the way to their application for solving difficult problems where human skill is not able to provide a reliable solution. Not surprisingly, some deep learners, mainly exploiting encoder-decoder architectures, have also been designed and applied to the task of missing data imputation. However, most of the proposed imputation techniques have not been designed to tackle “complex data”, that is high dimensional data belonging to datasets with huge cardinality and describing complex problems. Precisely, they often need critical parameters to be manually set or exploit complex architecture and/or training phases that make their computational load impracticable. In this paper, after clustering the state-of-the-art imputation techniques into three broad categories, we briefly review the most representative methods and then describe our data imputation proposals, which exploit deep learning techniques specifically designed to handle complex data. Comparative tests on genome sequences show that our deep learning imputers outperform the state-of-the-art KNN-imputation method when filling gaps in human genome sequences.


2019 ◽  
Vol 4 (4) ◽  
pp. 828-849 ◽  
Author(s):  
Daniel C. Elton ◽  
Zois Boukouvalas ◽  
Mark D. Fuge ◽  
Peter W. Chung

We review a recent groundswell of work which uses deep learning techniques to generate and optimize molecules.


2020 ◽  
Vol 5 (4&5) ◽  
pp. 9
Author(s):  
D. Karthika Renuka ◽  
C. Akalya Devi ◽  
R. Kiruba Tharani ◽  
G. Pooventhiran

Sensors ◽  
2021 ◽  
Vol 21 (4) ◽  
pp. 1249
Author(s):  
Babak Joze Abbaschian ◽  
Daniel Sierra-Sosa ◽  
Adel Elmaghraby

The advancements in neural networks and the on-demand need for accurate and near real-time Speech Emotion Recognition (SER) in human–computer interactions make it mandatory to compare available methods and databases in SER to achieve feasible solutions and a firmer understanding of this open-ended problem. The current study reviews deep learning approaches for SER with available datasets, followed by conventional machine learning techniques for speech emotion recognition. Ultimately, we present a multi-aspect comparison between practical neural network approaches in speech emotion recognition. The goal of this study is to provide a survey of the field of discrete speech emotion recognition.


2016 ◽  
Vol 5 (2) ◽  
pp. 113-122 ◽  
Author(s):  
Apoorva Ganapathy ◽  

The developments in neural systems and the high demand requirement for exact and close actual Speech Emotion Recognition in human-computer interfaces mark it compulsory to liken existing methods and datasets in speech emotion detection to accomplish practicable clarifications and a securer comprehension of this unrestricted issue. The present investigation assessed deep learning methods for speech emotion detection with accessible datasets, tracked by predictable machine learning methods for SER. Finally, we present-day a multi-aspect assessment between concrete neural network methods in SER. The objective of this investigation is to deliver a review of the area of distinct SER.


2021 ◽  
Vol 15 ◽  
Author(s):  
Shiqing Zhang ◽  
Ruixin Liu ◽  
Xin Tao ◽  
Xiaoming Zhao

Automatic speech emotion recognition (SER) is a challenging component of human-computer interaction (HCI). Existing literatures mainly focus on evaluating the SER performance by means of training and testing on a single corpus with a single language setting. However, in many practical applications, there are great differences between the training corpus and testing corpus. Due to the diversity of different speech emotional corpus or languages, most previous SER methods do not perform well when applied in real-world cross-corpus or cross-language scenarios. Inspired by the powerful feature learning ability of recently-emerged deep learning techniques, various advanced deep learning models have increasingly been adopted for cross-corpus SER. This paper aims to provide an up-to-date and comprehensive survey of cross-corpus SER, especially for various deep learning techniques associated with supervised, unsupervised and semi-supervised learning in this area. In addition, this paper also highlights different challenges and opportunities on cross-corpus SER tasks, and points out its future trends.


Sign in / Sign up

Export Citation Format

Share Document