A Generalizable Speech Emotion Recognition Model Reveals Depression and Remission

Mapping Intimacies ◽

10.1101/2021.09.01.458536 ◽

2021 ◽

Author(s):

Lasse Hansen ◽

Yan-Ping Zhang ◽

Detlef Wolf ◽

Konstantinos Sechidis ◽

Nicolai Ladegaard ◽

...

Keyword(s):

Emotion Recognition ◽

Background Noise ◽

Small Sample ◽

Speech Emotion Recognition ◽

First Episode ◽

Control Group ◽

Group Model ◽

Healthy Controls ◽

Voice Analysis ◽

Recognition Model

Objective: Affective disorders have long been associated with atypical voice patterns, however, current work on automated voice analysis often suffers from small sample sizes and untested generalizability. This study investigated a generalizable approach to aid clinical evaluation of depression and remission from voice. Methods: A Mixture-of-Experts machine learning model was trained to infer happy/sad emotional state using three publicly available emotional speech corpora. We examined the model's predictive ability to classify the presence of depression on Danish speaking healthy controls (N = 42), patients with first-episode major depressive disorder (MDD) (N = 40), and the same patients in remission (N = 25) based on recorded clinical interviews. The model was evaluated on raw data, data cleaned for background noise, and speaker diarized data. Results: The model showed reliable separation between healthy controls and depressed patients at the first visit, obtaining an AUC of 0.71. Further, we observed a reliable treatment effect in the depression group, with speech from patients in remission being indistinguishable from that of the control group. Model predictions were stable throughout the interview, suggesting that as little as 20-30 seconds of speech is enough to accurately screen a patient. Background noise (but not speaker diarization) heavily impacted predictions, suggesting that a controlled environment and consistent preprocessing pipelines are crucial for correct characterizations. Conclusion: A generalizable speech emotion recognition model can effectively reveal changes in speaker depressive states before and after treatment in patients with MDD. Data collection settings and data cleaning are crucial when considering automated voice analysis for clinical purposes.

Download Full-text

S150. EMOTIONAL BEHAVIOUR IN HIGH-RISK AND FIRST-EPISODE PSYCHOSIS

Schizophrenia Bulletin ◽

10.1093/schbul/sbaa031.216 ◽

2020 ◽

Vol 46 (Supplement_1) ◽

pp. S93-S93

Author(s):

Irina Falkenberg ◽

Huai-Hsuan Tseng ◽

Gemma Modinos ◽

Barbara Wild ◽

Philip McGuire ◽

...

Keyword(s):

High Risk ◽

Emotion Recognition ◽

Psychotic Disorders ◽

First Episode Psychosis ◽

First Episode ◽

Healthy Controls ◽

Emotional Faces ◽

Facial Movements ◽

Episode Psychosis ◽

Ultra High Risk

Abstract Background Studies indicate that people with schizophrenia and first-episode psychosis experience deficits in their ability to accurately detect and display emotions through facial expressions, and that functioning and symptoms are associated with these deficits. This study aims to examine how emotion recognition and facial emotion expression are related to functioning and symptoms in a sample of individuals at ultra-high risk, first-episode psychosis and healthy controls. Methods During fMRI, we combined the presentation of emotional faces with the instruction to react with facial movements predetermined and assigned. 18 patients with first-episode psychosis (FEP), 18 individuals at ultra high risk of psychosis (UHR) and 22 healthy controls (HCs) were examined while viewing happy, sad, or neutral faces and were instructed to simultaneously move the corners of their mouths either (a). upwards or (b). downwards, or (c). to refrain from movement. The subjects’ facial movements were recorded with an MR-compatible video camera. Results Neurofunctional and behavioral response to emotional faces were measured. Analyses have only recently commenced and are ongoing. Full results of the clinical and functional impact of behavioral and neuroimaging results will be presented at the meeting. Discussion Increased knowledge about abnormalities in emotion recognition and behaviour as well as their neural correlates and their impact on clinical measures and functional outcome can inform the development of novel treatment approaches to improve social skills early in the course of schizophrenia and psychotic disorders.

Download Full-text

Two-level discriminative speech emotion recognition model with wave field dynamics: A personalized speech emotion recognition method

Computer Communications ◽

10.1016/j.comcom.2021.09.013 ◽

2021 ◽

Author(s):

Ning Jia ◽

Chunjun Zheng

Keyword(s):

Emotion Recognition ◽

Wave Field ◽

Speech Emotion Recognition ◽

Recognition Method ◽

Recognition Model

Download Full-text

Construction and Research of E-sports Speech Emotion Recognition Model

Lecture Notes in Electrical Engineering - Innovative Computing ◽

10.1007/978-981-16-4258-6_4 ◽

2022 ◽

pp. 23-31

Author(s):

Jason C. Hung ◽

Jin-Che Chen

Keyword(s):

Emotion Recognition ◽

Speech Emotion Recognition ◽

Recognition Model

Download Full-text

Speech Emotion Recognition Based on Selective Interpolation Synthetic Minority Over-Sampling Technique in Small Sample Environment

Sensors ◽

10.3390/s20082297 ◽

2020 ◽

Vol 20 (8) ◽

pp. 2297

Author(s):

Zhen-Tao Liu ◽

Bao-Han Wu ◽

Dan-Yun Li ◽

Peng Xiao ◽

Jun-Wei Mao

Keyword(s):

Emotion Recognition ◽

Feature Selection Method ◽

Sampling Technique ◽

Small Sample ◽

Speech Emotion Recognition ◽

Gradient Boosting ◽

Data Imbalance ◽

The Arts ◽

The Impact ◽

Sample Environment

Speech emotion recognition often encounters the problems of data imbalance and redundant features in different application scenarios. Researchers usually design different recognition models for different sample conditions. In this study, a speech emotion recognition model for a small sample environment is proposed. A data imbalance processing method based on selective interpolation synthetic minority over-sampling technique (SISMOTE) is proposed to reduce the impact of sample imbalance on emotion recognition results. In addition, feature selection method based on variance analysis and gradient boosting decision tree (GBDT) is introduced, which can exclude the redundant features that possess poor emotional representation. Results of experiments of speech emotion recognition on three databases (i.e., CASIA, Emo-DB, SAVEE) show that our method obtains average recognition accuracy of 90.28% (CASIA), 75.00% (SAVEE) and 85.82% (Emo-DB) for speaker-dependent speech emotion recognition which is superior to some state-of-the-arts works.

Download Full-text

Cognitive deficits and white matter abnormalities in never-treated first-episode schizophrenia

Translational Psychiatry ◽

10.1038/s41398-020-01049-0 ◽

2020 ◽

Vol 10 (1) ◽

Author(s):

Mi Yang ◽

Shan Gao ◽

Xiangyang Zhang

Keyword(s):

Cognitive Impairment ◽

White Matter ◽

Diffusion Tensor ◽

Internal Capsule ◽

First Episode ◽

Control Group ◽

Pathophysiological Mechanism ◽

Healthy Controls ◽

Healthy Control ◽

First Episode Schizophrenia

Abstract Cognitive impairment is viewed as a core symptom of schizophrenia (SCZ), but its pathophysiological mechanism remains unclear. White matter (WM) disruption is considered to be a central abnormality that may contribute to cognitive impairment in SCZ patients. However, few studies have addressed the association between cognition and WM integrity in never-treated first-episode (NTFE) patients with SCZ. In this study, we used the MATRICS Consensus Cognitive Battery (MCCB) to evaluate cognitive function in NTFE patients (n = 39) and healthy controls (n = 30), and associated it with whole-brain fractional anisotropy (FA) values obtained via voxel-based diffusion tensor imaging. We found that FA was lower in five brain areas of SCZ patients, including the cingulate gyrus, internal capsule, corpus callosum, cerebellum, and brainstem. Compared with the healthy control group, the MCCB’s total score and 8 out of 10 subscores were significantly lower in NTFE patients (all p < 0.001). Moreover, in patients but not healthy controls, the performance in the Trail Making Test was negatively correlated with the FA value in the left cingulate. Our findings provide evidence that WM disconnection is involved in some cognitive impairment in the early course of SCZ.

Download Full-text

CNN-based Speech Emotion Recognition Model Applying Transfer Learning and Attention Mechanism

Journal of KIISE ◽

10.5626/jok.2020.47.7.665 ◽

2020 ◽

Vol 47 (7) ◽

pp. 665-673

Author(s):

Jung Hyun Lee ◽

Ui Nyoung Yoon ◽

Geun-Sik Jo

Keyword(s):

Emotion Recognition ◽

Transfer Learning ◽

Attention Mechanism ◽

Speech Emotion Recognition ◽

Recognition Model

Download Full-text

Emotion Analysis from Human Voice Using Various Prosodic Features and Text Analysis

Journal of Computational and Theoretical Nanoscience ◽

10.1166/jctn.2020.9055 ◽

2020 ◽

Vol 17 (9) ◽

pp. 4244-4247

Author(s):

Vybhav Jain ◽

S. B. Rajeshwari ◽

Jagadish S. Kallimani

Keyword(s):

Emotion Recognition ◽

Sentiment Analysis ◽

Text Analysis ◽

Good Accuracy ◽

Age Groups ◽

Speech Emotion Recognition ◽

Prosodic Features ◽

Voice Analysis ◽

Emotion Analysis ◽

Human Voice

Emotion Analysis is a dynamic field of research with the aim to provide a method to recognize the emotions of a person only from their voice. It is more famously recognized as the Speech Emotion Recognition (SER) problem. This problem has been studied upon from more than a decade with results coming from either Voice Analysis or Text Analysis. Individually, both these methods have shown a good accuracy up till now. But, the use of both of these methods in unison has showed a much more better result than either one of those parts considered individually. When different people of different age groups are talking, it is important to understand their emotions behind what they say as this will in turn help us in reacting better. To try and achieve this, the paper implements a model which performs Emotion Analysis based on both Tone and Text Analysis. The prosodic features of the tone are analyzed and then the speech is converted to text. Once the text has been extracted from the speech, Sentiment Analysis is done on the extracted text to further improve the accuracy of the Emotion Recognition.

Download Full-text

Effects of comorbid alcohol use disorder on the clinical outcomes of first-episode schizophrenia: a nationwide population-based study

Annals of General Psychiatry ◽

10.1186/s12991-021-00353-3 ◽

2021 ◽

Vol 20 (1) ◽

Author(s):

Soojin Ahn ◽

Youngjae Choi ◽

Woohyeok Choi ◽

Young Tak Jo ◽

Harin Kim ◽

...

Keyword(s):

Alcohol Use ◽

Clinical Outcomes ◽

Alcohol Use Disorder ◽

Small Sample ◽

First Episode ◽

Control Group ◽

Diagnostic Code ◽

Psychiatric Admissions ◽

First Episode Schizophrenia ◽

The Impact

Abstract Background Alcohol use disorder (AUD) is a common psychiatric comorbidity in schizophrenia, associated with poor clinical outcomes and medication noncompliance. Most previous studies on the effect of alcohol use in patients with schizophrenia had limitations of small sample size or a cross-sectional design. Therefore, we used a nationwide population database to investigate the impact of AUD on clinical outcomes of schizophrenia. Methods Data from the Health Insurance Review Agency database in South Korea from January 1, 2007 to December 31, 2016 were used. Among 64,442 patients with first-episode schizophrenia, 1598 patients with comorbid AUD were selected based on the diagnostic code F10. We performed between- and within-group analyses to compare the rates of psychiatric admissions and emergency room (ER) visits, and medication possession ratio (MPR) between the patients with comorbid AUD and control patients matched for the onset age, sex, and observation period. Results The rates of psychiatric admissions and ER visits in both groups decreased after the time point of diagnosis of AUD; however, the decrease was significantly greater in the patients with comorbid AUD compared to the control patients. While the comorbid AUD group showed an increase in MPR after the diagnosis of AUD, MPR decreased in the control group. The rates of psychiatric admissions, ER visits, and MPR were worse in the comorbid AUD group both before and after the diagnosis of AUD. Conclusions The results emphasize an importance of psychiatric comorbidities, especially AUD, in first-episode schizophrenia and the necessity of further research for confirmative findings of the association of AUD with clinical outcomes of schizophrenia.

Download Full-text

Speech emotion recognition model based on Bi-GRU and Focal Loss

Pattern Recognition Letters ◽

10.1016/j.patrec.2020.11.009 ◽

2020 ◽

Vol 140 ◽

pp. 358-365

Author(s):

Zijiang Zhu ◽

Weihuang Dai ◽

Yi Hu ◽

Junshan Li

Keyword(s):

Emotion Recognition ◽

Speech Emotion Recognition ◽

Recognition Model ◽

Model Based

Download Full-text

Speech Emotion Recognition on Small Sample Learning by Hybrid WGAN-LSTM Networks

Journal of Circuits System and Computers ◽

10.1142/s0218126622500736 ◽

2021 ◽

Author(s):

Cunwei Sun ◽

Luping Ji ◽

Hailing Zhong

Keyword(s):

Emotion Recognition ◽

Language Processing ◽

Short Term Memory ◽

Small Sample ◽

New Method ◽

Small Samples ◽

Speech Emotion Recognition ◽

Generative Adversarial Network ◽

Adversarial Network ◽

In Series

The speech emotion recognition based on the deep networks on small samples is often a very challenging problem in natural language processing. The massive parameters of a deep network are much difficult to be trained reliably on small-quantity speech samples. Aiming at this problem, we propose a new method through the systematical cooperation of Generative Adversarial Network (GAN) and Long Short Term Memory (LSTM). In this method, it utilizes the adversarial training of GAN’s generator and discriminator on speech spectrogram images to implement sufficient sample augmentation. A six-layer convolution neural network (CNN), followed in series by a two-layer LSTM, is designed to extract features from speech spectrograms. For accelerating the training of networks, the parameters of discriminator are transferred to our feature extractor. By the sample augmentation, a well-trained feature extraction network and an efficient classifier could be achieved. The tests and comparisons on two publicly available datasets, i.e., EMO-DB and IEMOCAP, show that our new method is effective, and it is often superior to some state-of-the-art methods.

Download Full-text