Machine Learning to Classify Suicidal Thoughts and Behaviors: Implementation Within the Common Data Elements Used by the Military Suicide Research Consortium

2021 ◽  
pp. 216770262096106
Author(s):  
Andrew K. Littlefield ◽  
Jeffrey T. Cooke ◽  
Courtney L. Bagge ◽  
Catherine R. Glenn ◽  
Evan M. Kleiman ◽  
...  

Suicide rates among military-connected populations have increased over the past 15 years. Meta-analytic studies indicate prediction of suicide outcomes is lacking. Machine-learning approaches have been promoted to enhance classification models for suicide-related outcomes. In the present study, we compared the performance of three primary machine-learning approaches (i.e., elastic net, random forests, stacked ensembles) and a traditional statistical approach, generalized linear modeling (i.e., logistic regression), to classify suicide thoughts and behaviors using data from the Military Suicide Research Consortium’s Common Data Elements (CDE; n = 5,977–6,058 across outcomes). Models were informed by (a) selected items from the CDE or (b) factor scores based on exploratory and confirmatory factor analyses on the selected CDE items. Results indicated similar classification performance across models and sets of features. In this study, we suggest the need for robust evidence before adopting more complex classification models and identify measures that are particularly relevant in classifying suicide-related outcomes.

Assessment ◽  
2018 ◽  
Vol 26 (6) ◽  
pp. 963-975 ◽  
Author(s):  
Ian H. Stanley ◽  
Jennifer M. Buchman-Schmitt ◽  
Carol Chu ◽  
Megan L. Rogers ◽  
Anna R. Gai ◽  
...  

Suicide rates within the U.S. military are elevated, necessitating greater efforts to identify those at increased risk. This study utilized a multigroup confirmatory factor analysis to examine measurement invariance of the Military Suicide Research Consortium Common Data Elements (CDEs) across current service members ( n = 2,015), younger veterans (<35 years; n = 377), and older veterans (≥35 years; n = 1,001). Strong factorial invariance was supported with adequate model fit observed for current service members, younger veterans, and older veterans. The structures of all models were generally comparable with few exceptions. The Military Suicide Research Consortium CDEs demonstrate at least adequate model fit for current military service members and veterans, regardless of age. Thus, the CDEs can be validly used across military and veteran populations. Given similar latent structures, research findings in one group may inform clinical and policy decision making for the other.


2018 ◽  
Vol 30 (6) ◽  
pp. 767-778 ◽  
Author(s):  
Fallon B. Ringer ◽  
Kelly A. Soberay ◽  
Megan L. Rogers ◽  
Christopher R. Hagan ◽  
Carol Chu ◽  
...  

2020 ◽  
pp. 1-10
Author(s):  
Anna R. Gai ◽  
Fallon Ringer ◽  
Katherine Schafer ◽  
Sean Dougherty ◽  
Matthew Schneider ◽  
...  

2020 ◽  
Author(s):  
Namik Kirlic ◽  
Elisabeth Akeman ◽  
Danielle DeVille ◽  
Henry Yeh ◽  
Kelly T. Cosgrove ◽  
...  

Background: An estimated 1100 college students die by suicide each year. Our ability to predict who is at risk for suicide, as well as our knowledge of resilience factors protecting against it, remains limited. We used a machine learning (ML) framework in conjunction with a large battery of self-report and demographic measures to select features contributing most to observed variability in suicidal thoughts and behaviors (STBs) in college.Method: First-year university students completed demographic and clinically-relevant self-report measures at the beginning of the first semester of college (baseline; n=356), and at end-of-year (n=228). Suicide Behaviors Questionnaire-Revised (SBQ-R) assessed STBs. A ML pipeline with 55 and 57 variables using stacking and nested cross-validation to avoid overfitting was conducted to examine predictors of baseline and end-of-year STBs, respectively. Results: For baseline SBQ-R score, the identified ML algorithm explained 28.3% of variance (95%CI: 28-28.5%), with depression severity, meaning and purpose in life, and social isolation among the most important predictors. For end-of-year SBQ-R score, the identified algorithm explained 5.6% of variance [95%CI: 5.1-6.1%], with baseline SBQ-R score, emotional suppression, and positive emotional experiences among the most important predictors.Limitations: External validation of the model with another independent sample is needed for further demonstrating its replicability.Conclusions: ML analyses replicated known factors contributing to STBs, and identified novel, potentially modifiable risk and resilience factors. Intervention programing on college campuses aiming to reduce depressive symptomatology, promote positive affect and social connectedness, and foster a sense of meaning and purpose, may be effective in reducing STBs.


2020 ◽  
Vol 15 (1) ◽  
Author(s):  
Julie Chih-yu Chen ◽  
Andrea D. Tyler

Abstract Background The advent of metagenomic sequencing provides microbial abundance patterns that can be leveraged for sample origin prediction. Supervised machine learning classification approaches have been reported to predict sample origin accurately when the origin has been previously sampled. Using metagenomic datasets provided by the 2019 CAMDA challenge, we evaluated the influence of variable technical, analytical and machine learning approaches for result interpretation and novel source prediction. Results Comparison between 16S rRNA amplicon and shotgun sequencing approaches as well as metagenomic analytical tools showed differences in normalized microbial abundance, especially for organisms present at low abundance. Shotgun sequence data analyzed using Kraken2 and Bracken, for taxonomic annotation, had higher detection sensitivity. As classification models are limited to labeling pre-trained origins, we took an alternative approach using Lasso-regularized multivariate regression to predict geographic coordinates for comparison. In both models, the prediction errors were much higher in Leave-1-city-out than in 10-fold cross validation, of which the former realistically forecasted the increased difficulty in accurately predicting samples from new origins. This challenge was further confirmed when applying the model to a set of samples obtained from new origins. Overall, the prediction performance of the regression and classification models, as measured by mean squared error, were comparable on mystery samples. Due to higher prediction error rates for samples from new origins, we provided an additional strategy based on prediction ambiguity to infer whether a sample is from a new origin. Lastly, we report increased prediction error when data from different sequencing protocols were included as training data. Conclusions Herein, we highlight the capacity of predicting sample origin accurately with pre-trained origins and the challenge of predicting new origins through both regression and classification models. Overall, this work provides a summary of the impact of sequencing technique, protocol, taxonomic analytical approaches, and machine learning approaches on the use of metagenomics for prediction of sample origin.


Author(s):  
Brian Carnahan ◽  
Gérard Meyer ◽  
Lois-Ann Kuntz

Multivariate classification models play an increasingly important role in human factors research. In the past, these models have been based primarily on discriminant analysis and logistic regression. Models developed from machine learning research offer the human factors professional a viable alternative to these traditional statistical classification methods. To illustrate this point, two machine learning approaches - genetic programming and decision tree induction - were used to construct classification models designed to predict whether or not a student truck driver would pass his or her commercial driver license (CDL) examination. The models were developed and validated using the curriculum scores and CDL exam performances of 37 student truck drivers who had completed a 320-hr driver training course. Results indicated that the machine learning classification models were superior to discriminant analysis and logistic regression in terms of predictive accuracy. Actual or potential applications of this research include the creation of models that more accurately predict human performance outcomes.


RSC Advances ◽  
2016 ◽  
Vol 6 (12) ◽  
pp. 9857-9871 ◽  
Author(s):  
Jiansong Fang ◽  
Xiaocong Pang ◽  
Rong Yan ◽  
Wenwen Lian ◽  
Chao Li ◽  
...  

The classification models were constructed to discover neuroprotective compounds against glutamate or H2O2-induced neurotoxicity through machine learning approaches.


2019 ◽  
Vol 1 (Supplement_1) ◽  
pp. i15-i15
Author(s):  
Michael Wells ◽  
Adam Robin ◽  
Laila Poisson ◽  
Houtan Noushmehr ◽  
James Snyder

Abstract INTRODUCTION: Brain metastatic disease (BM) is ripe for discovery using computational tools like machine learning (ML) due to disease complexity and multidimensional critical data (imaging, genomics, primary disease, drug exposures)1. Leveraging real-world-evidence’ (RWE) from routine health data to inform clinical management is hindered by fragmented unstructured data and semantic heterogeneity2. Clinical data in EHR and institutional registries are typically free text narratives absent common data elements (CDE). Curating existing data into CDE with machine learning (ML) may inform contemporary approaches (RWE, N-of-1 trials, and precision medicine) that are dependent on large high-quality datasets. Harvesting existing institutional registries may expand demographic representation, confirm benchmarks of established treatments, and provide test environment for prospective ML applications. METHOD: An R-based deep convoluted neural network (DNN) using keras and an API for Tensorflow python was trained on physician narratives of 2000 BM cases and 8000 other CNS conditions labeled by diagnosis spanning 17 years3,4. The ML model was tested with 405 non-labeled narratives to: A) Identify BM from other CNS conditions (i.e. glioma, meningioma, non-tumor). B) Evaluate word embedding using GLoVe5 to standardize abbreviations and misspellings by assigning terms to CDE by training the model to plot “mets”, “metastases” and “spine” with the 20 most similar contextual words. RESULTS: DNN architecture achieved 97% accuracy in distinguishing BM (n=178) for others (n=227). “Mets” and “metastasis” have a connected contextual network suggesting shared meaning, whereas spine did not share a network. CONCLUSIONS: ML can identify BM cases in free-text registries which can serve as a quality control measure and aid data aggregation. Standardizing shorthand terminology to CDE with DNN trained in word embedding can possibly address semantic heterogeneity and facilitate data automation. Solutions are needed to compile and automate quality BM data across institutions to achieve the volume and complexity required for contemporary analysis using ML.


Sign in / Sign up

Export Citation Format

Share Document