scholarly journals A Generalizable Speech Emotion Recognition Model Reveals Depression and Remission

2021 ◽  
Author(s):  
Lasse Hansen ◽  
Yan-Ping Zhang ◽  
Detlef Wolf ◽  
Konstantinos Sechidis ◽  
Nicolai Ladegaard ◽  
...  

Objective: Affective disorders have long been associated with atypical voice patterns, however, current work on automated voice analysis often suffers from small sample sizes and untested generalizability. This study investigated a generalizable approach to aid clinical evaluation of depression and remission from voice. Methods: A Mixture-of-Experts machine learning model was trained to infer happy/sad emotional state using three publicly available emotional speech corpora. We examined the model's predictive ability to classify the presence of depression on Danish speaking healthy controls (N = 42), patients with first-episode major depressive disorder (MDD) (N = 40), and the same patients in remission (N = 25) based on recorded clinical interviews. The model was evaluated on raw data, data cleaned for background noise, and speaker diarized data. Results: The model showed reliable separation between healthy controls and depressed patients at the first visit, obtaining an AUC of 0.71. Further, we observed a reliable treatment effect in the depression group, with speech from patients in remission being indistinguishable from that of the control group. Model predictions were stable throughout the interview, suggesting that as little as 20-30 seconds of speech is enough to accurately screen a patient. Background noise (but not speaker diarization) heavily impacted predictions, suggesting that a controlled environment and consistent preprocessing pipelines are crucial for correct characterizations. Conclusion: A generalizable speech emotion recognition model can effectively reveal changes in speaker depressive states before and after treatment in patients with MDD. Data collection settings and data cleaning are crucial when considering automated voice analysis for clinical purposes.

2020 ◽  
Vol 46 (Supplement_1) ◽  
pp. S93-S93
Author(s):  
Irina Falkenberg ◽  
Huai-Hsuan Tseng ◽  
Gemma Modinos ◽  
Barbara Wild ◽  
Philip McGuire ◽  
...  

Abstract Background Studies indicate that people with schizophrenia and first-episode psychosis experience deficits in their ability to accurately detect and display emotions through facial expressions, and that functioning and symptoms are associated with these deficits. This study aims to examine how emotion recognition and facial emotion expression are related to functioning and symptoms in a sample of individuals at ultra-high risk, first-episode psychosis and healthy controls. Methods During fMRI, we combined the presentation of emotional faces with the instruction to react with facial movements predetermined and assigned. 18 patients with first-episode psychosis (FEP), 18 individuals at ultra high risk of psychosis (UHR) and 22 healthy controls (HCs) were examined while viewing happy, sad, or neutral faces and were instructed to simultaneously move the corners of their mouths either (a). upwards or (b). downwards, or (c). to refrain from movement. The subjects’ facial movements were recorded with an MR-compatible video camera. Results Neurofunctional and behavioral response to emotional faces were measured. Analyses have only recently commenced and are ongoing. Full results of the clinical and functional impact of behavioral and neuroimaging results will be presented at the meeting. Discussion Increased knowledge about abnormalities in emotion recognition and behaviour as well as their neural correlates and their impact on clinical measures and functional outcome can inform the development of novel treatment approaches to improve social skills early in the course of schizophrenia and psychotic disorders.


Sensors ◽  
2020 ◽  
Vol 20 (8) ◽  
pp. 2297
Author(s):  
Zhen-Tao Liu ◽  
Bao-Han Wu ◽  
Dan-Yun Li ◽  
Peng Xiao ◽  
Jun-Wei Mao

Speech emotion recognition often encounters the problems of data imbalance and redundant features in different application scenarios. Researchers usually design different recognition models for different sample conditions. In this study, a speech emotion recognition model for a small sample environment is proposed. A data imbalance processing method based on selective interpolation synthetic minority over-sampling technique (SISMOTE) is proposed to reduce the impact of sample imbalance on emotion recognition results. In addition, feature selection method based on variance analysis and gradient boosting decision tree (GBDT) is introduced, which can exclude the redundant features that possess poor emotional representation. Results of experiments of speech emotion recognition on three databases (i.e., CASIA, Emo-DB, SAVEE) show that our method obtains average recognition accuracy of 90.28% (CASIA), 75.00% (SAVEE) and 85.82% (Emo-DB) for speaker-dependent speech emotion recognition which is superior to some state-of-the-arts works.


2020 ◽  
Vol 10 (1) ◽  
Author(s):  
Mi Yang ◽  
Shan Gao ◽  
Xiangyang Zhang

Abstract Cognitive impairment is viewed as a core symptom of schizophrenia (SCZ), but its pathophysiological mechanism remains unclear. White matter (WM) disruption is considered to be a central abnormality that may contribute to cognitive impairment in SCZ patients. However, few studies have addressed the association between cognition and WM integrity in never-treated first-episode (NTFE) patients with SCZ. In this study, we used the MATRICS Consensus Cognitive Battery (MCCB) to evaluate cognitive function in NTFE patients (n = 39) and healthy controls (n = 30), and associated it with whole-brain fractional anisotropy (FA) values obtained via voxel-based diffusion tensor imaging. We found that FA was lower in five brain areas of SCZ patients, including the cingulate gyrus, internal capsule, corpus callosum, cerebellum, and brainstem. Compared with the healthy control group, the MCCB’s total score and 8 out of 10 subscores were significantly lower in NTFE patients (all p < 0.001). Moreover, in patients but not healthy controls, the performance in the Trail Making Test was negatively correlated with the FA value in the left cingulate. Our findings provide evidence that WM disconnection is involved in some cognitive impairment in the early course of SCZ.


2020 ◽  
Vol 17 (9) ◽  
pp. 4244-4247
Author(s):  
Vybhav Jain ◽  
S. B. Rajeshwari ◽  
Jagadish S. Kallimani

Emotion Analysis is a dynamic field of research with the aim to provide a method to recognize the emotions of a person only from their voice. It is more famously recognized as the Speech Emotion Recognition (SER) problem. This problem has been studied upon from more than a decade with results coming from either Voice Analysis or Text Analysis. Individually, both these methods have shown a good accuracy up till now. But, the use of both of these methods in unison has showed a much more better result than either one of those parts considered individually. When different people of different age groups are talking, it is important to understand their emotions behind what they say as this will in turn help us in reacting better. To try and achieve this, the paper implements a model which performs Emotion Analysis based on both Tone and Text Analysis. The prosodic features of the tone are analyzed and then the speech is converted to text. Once the text has been extracted from the speech, Sentiment Analysis is done on the extracted text to further improve the accuracy of the Emotion Recognition.


2021 ◽  
Vol 20 (1) ◽  
Author(s):  
Soojin Ahn ◽  
Youngjae Choi ◽  
Woohyeok Choi ◽  
Young Tak Jo ◽  
Harin Kim ◽  
...  

Abstract Background Alcohol use disorder (AUD) is a common psychiatric comorbidity in schizophrenia, associated with poor clinical outcomes and medication noncompliance. Most previous studies on the effect of alcohol use in patients with schizophrenia had limitations of small sample size or a cross-sectional design. Therefore, we used a nationwide population database to investigate the impact of AUD on clinical outcomes of schizophrenia. Methods Data from the Health Insurance Review Agency database in South Korea from January 1, 2007 to December 31, 2016 were used. Among 64,442 patients with first-episode schizophrenia, 1598 patients with comorbid AUD were selected based on the diagnostic code F10. We performed between- and within-group analyses to compare the rates of psychiatric admissions and emergency room (ER) visits, and medication possession ratio (MPR) between the patients with comorbid AUD and control patients matched for the onset age, sex, and observation period. Results The rates of psychiatric admissions and ER visits in both groups decreased after the time point of diagnosis of AUD; however, the decrease was significantly greater in the patients with comorbid AUD compared to the control patients. While the comorbid AUD group showed an increase in MPR after the diagnosis of AUD, MPR decreased in the control group. The rates of psychiatric admissions, ER visits, and MPR were worse in the comorbid AUD group both before and after the diagnosis of AUD. Conclusions The results emphasize an importance of psychiatric comorbidities, especially AUD, in first-episode schizophrenia and the necessity of further research for confirmative findings of the association of AUD with clinical outcomes of schizophrenia.


2020 ◽  
Vol 140 ◽  
pp. 358-365
Author(s):  
Zijiang Zhu ◽  
Weihuang Dai ◽  
Yi Hu ◽  
Junshan Li

Author(s):  
Cunwei Sun ◽  
Luping Ji ◽  
Hailing Zhong

The speech emotion recognition based on the deep networks on small samples is often a very challenging problem in natural language processing. The massive parameters of a deep network are much difficult to be trained reliably on small-quantity speech samples. Aiming at this problem, we propose a new method through the systematical cooperation of Generative Adversarial Network (GAN) and Long Short Term Memory (LSTM). In this method, it utilizes the adversarial training of GAN’s generator and discriminator on speech spectrogram images to implement sufficient sample augmentation. A six-layer convolution neural network (CNN), followed in series by a two-layer LSTM, is designed to extract features from speech spectrograms. For accelerating the training of networks, the parameters of discriminator are transferred to our feature extractor. By the sample augmentation, a well-trained feature extraction network and an efficient classifier could be achieved. The tests and comparisons on two publicly available datasets, i.e., EMO-DB and IEMOCAP, show that our new method is effective, and it is often superior to some state-of-the-art methods.


Sign in / Sign up

Export Citation Format

Share Document