scholarly journals A Generalizable Speech Emotion Recognition Model Reveals Depression and Remission

Author(s):  
Lasse Hansen ◽  
Yan‐Ping Zhang ◽  
Detlef Wolf ◽  
Konstantinos Sechidis ◽  
Nicolai Ladegaard ◽  
...  
2020 ◽  
Vol 140 ◽  
pp. 358-365
Author(s):  
Zijiang Zhu ◽  
Weihuang Dai ◽  
Yi Hu ◽  
Junshan Li

2020 ◽  
Vol 23 (4) ◽  
pp. 799-806
Author(s):  
Kittisak Jermsittiparsert ◽  
Abdurrahman Abdurrahman ◽  
Parinya Siriattakul ◽  
Ludmila A. Sundeeva ◽  
Wahidah Hashim ◽  
...  

2021 ◽  
Author(s):  
Lasse Hansen ◽  
Yan-Ping Zhang ◽  
Detlef Wolf ◽  
Konstantinos Sechidis ◽  
Nicolai Ladegaard ◽  
...  

Objective: Affective disorders have long been associated with atypical voice patterns, however, current work on automated voice analysis often suffers from small sample sizes and untested generalizability. This study investigated a generalizable approach to aid clinical evaluation of depression and remission from voice. Methods: A Mixture-of-Experts machine learning model was trained to infer happy/sad emotional state using three publicly available emotional speech corpora. We examined the model's predictive ability to classify the presence of depression on Danish speaking healthy controls (N = 42), patients with first-episode major depressive disorder (MDD) (N = 40), and the same patients in remission (N = 25) based on recorded clinical interviews. The model was evaluated on raw data, data cleaned for background noise, and speaker diarized data. Results: The model showed reliable separation between healthy controls and depressed patients at the first visit, obtaining an AUC of 0.71. Further, we observed a reliable treatment effect in the depression group, with speech from patients in remission being indistinguishable from that of the control group. Model predictions were stable throughout the interview, suggesting that as little as 20-30 seconds of speech is enough to accurately screen a patient. Background noise (but not speaker diarization) heavily impacted predictions, suggesting that a controlled environment and consistent preprocessing pipelines are crucial for correct characterizations. Conclusion: A generalizable speech emotion recognition model can effectively reveal changes in speaker depressive states before and after treatment in patients with MDD. Data collection settings and data cleaning are crucial when considering automated voice analysis for clinical purposes.


Sign in / Sign up

Export Citation Format

Share Document