Predictive Modeling of Psychiatric Illness using Electronic Health Records and a Novel Machine Learning Approach with Artificial Intelligence
Background: Generalized anxiety disorder (GAD) and major depressive disorder (MDD) are highly prevalent and impairing problems, but frequently go undetected, leading to substantial treatment delays. Electronic medical records (EMRs) collect a great deal of biometric markers and patient characteristics that could foster the detection of GAD and MDD in primary care settings. Methods: We approach the problem of predicting MDD and GAD using a novel machine learning pipeline. The pipeline constitutes an ensemble of algorithmically distinct machine learning methods, including deep learning. A sample of 4,184 undergraduate students completed the study, undergoing a general health screening and completing a psychiatric assessment for MDD and GAD. Using 59 biomedical and demographic features from the general health survey and an additional set of engineered features, we trained the model to predict GAD and MDD. Results: We assessed the model's performance on a held-out test set and found an AUC of 0.72 and 0.66 for GAD, and MDD, respectively. Additionally, we used advanced techniques (Shapley values) to illuminate which features had the greatest impact on prediction for each disease. The top predictive features for MDD were “difficulty memorizing lessons”, “financial difficulties” and “alcohol consumption”. The top predictive features for GAD were the necessity for a control examination, being overweight/obese, and irregular meal consumption. Conclusions: Our results indicate a successful application of machine learning methods in detection of GAD and MDD based on EMR-like data. By identifying biomarkers of GAD and MDD, these results may be used in future research to aid in the early detection of MDD and GAD.