A Machine Learning Approach to Identify Predictors of Frequent Vaping and Vulnerable Californian Youth Subgroups

Abstract Introduction Machine learning presents a unique opportunity to improve electronic cigarette (vaping) monitoring in youth. Here we built a random forest model to predict frequent vaping status among Californian youth and to identify contributing factors and vulnerable populations. Methods In this prospective cohort study, 1,281 ever-vaping twelfth-grade students from metropolitan Los Angeles were surveyed in Fall and in 6-month in Spring. Frequent vaping was measured at the 6-month follow-up as nicotine-containing vaping on 20 or more days in past 30 days. Predictors (n=131) encompassed sociodemographic characteristics, substance use and perceptions, health status, and characteristics of the household, school and neighborhood. A random forest was developed to identify the top ten predictors of frequent vaping and interactions by sociodemographic variables. Results Forty participants (3.1%) reported frequent vaping at the follow-up. The random forest outperformed a logistic regression model in prediction (C-Index=0.87 vs. 0.77). Higher past-month nicotine concentration in vape, more daily vaping sessions, and greater nicotine dependence were the top three of the ten most important predictors of frequent vaping. Interactions were found between age and perceived discrimination, and between age and race/ethnicity, as those who were younger than their classmates and either reported experiencing discrimination frequently or identified as Asian or Native American/Pacific Islander were at increased risk of becoming frequent vapers. Conclusions Machine learning can produce models that accurately predict progression of vaping behaviours among youth. The potential association between frequent vaping and perceived discrimination warrants more in-depth analyses to confirm if discrimination constitutes a cause of increased vaping.

Download Full-text

Uncovering Los Angeles Tourists' Patterns Using Geospatial Analysis and Supervised Machine Learning with Random Forest Predictors

2019 International Conference on Computational Science and Computational Intelligence (CSCI) ◽

10.1109/csci49370.2019.00239 ◽

2019 ◽

Author(s):

Yuan-Yuan Lee ◽

Yi Ling Chang

Keyword(s):

Machine Learning ◽

Random Forest ◽

Los Angeles ◽

Geospatial Analysis ◽

Supervised Machine Learning

Download Full-text

S94. PREDICTION OF CANNABIS RELAPSE IN CLINICAL HIGH-RISK INDIVIDUALS AND RECENT ONSET PSYCHOSIS - PRELIMINARY RESULTS FROM THE PRONIA STUDY

Schizophrenia Bulletin ◽

10.1093/schbul/sbaa031.160 ◽

2020 ◽

Vol 46 (Supplement_1) ◽

pp. S69-S70

Author(s):

Nora Penzel ◽

Rachele Sanfelici ◽

Linda Betz ◽

Linda Antonucci ◽

Peter Falkai ◽

...

Keyword(s):

Machine Learning ◽

High Risk ◽

Random Forest ◽

Cannabis Use ◽

Environmental Data ◽

Clinical High Risk ◽

Recent Onset ◽

Cannabis Consumption ◽

General Functioning

Abstract Background Evidence exists that cannabis consumption is associated with the development of psychosis. Further, continued cannabis use in individuals with recent onset psychosis (ROP) increases the risk for rehospitalization, high symptom severity and low general functioning. Clear inter-individual differences in the vulnerability to the harmful effects of the drug have been pointed out. These findings emphasize the importance of investigating the inter-individual variability in the role of cannabis use in ROP and to understand how cannabis use relates to subclinical conditions that predate the full-blown disease in clinical high-risk (CHR). Specific symptoms have been linked with continued cannabis consume, still research is lacking on how different factors contribute together to an elevated risk of cannabis relapse. Multivariate techniques have the capacity to extract complex patterns from high dimensional data and apply generalized rules to unseen cases. The aim of the study is therefore to assess the predictability of cannabis relapse in ROP and CHR by applying machine learning to clinical and environmental data. Methods All participants were recruited within the multi-site, longitudinal PRONIA study (www.pronia.eu). 112 individuals (58 ROP and 54 CHR) from 8 different European research centres reported lifetime cannabis consume at baseline and were abstinent for at least 4 weeks. We defined cannabis relapse as any cannabis consume between baseline and 9 months follow-up reported by the individual. To predict cannabis relapse, we trained a random forest algorithm implemented in the mlr package, R version 3.5.2. on 183 baseline variables including clinical symptoms, general functioning, demographics and consume patterns within a repeated-nested cross-validation framework. The data underwent pre-processing through pruning of non-informative variables and median-imputation for missing values. The number of trees was set to 500, while the number of nodes, sample fraction and mtry were optimized. All hyperparameters were tuned with the model-based optimization implemented in the mlrMBO R package. Results After 9 months 50 individuals (48 % ROP, 52 % CHR) have relapsed on cannabis use. Relapse was over all timepoints associated with more severe psychotic symptoms measured by PANSS positive and PANSS general (p<0.05) and a significant interaction between positive symptoms and time of measurement (p<0.05). Our random forest classifier could predict cannabis relapse with a balanced accuracy, sensitivity, and specificity of, respectively, 66.5 %, 66.0 % and 67.0 %. The most predictive variables were a higher cumulative frequency of cannabis consumption in the last 3 months, worse general functioning in the last month, higher density of place of living, younger age and a shorter interval time since the last consumption. Discussion Our results using a state-of-the-art machine learning approach suggest that the multivariate signature of baseline demographic and clinical data could predict follow up cannabis relapse above chance level in CHR and ROP. Our findings revealing that cannabis relapse is associated with more severe symptoms is in line with previous literature and emphasizes the need for targeted treatment towards abstinence from cannabis. The information of demographic and clinical patterns might be useful in order to specifically address therapeutic strategies in individuals at higher risk for relapse. This might include special programs for younger patients and taking into account the place of living, like urban areas. Further research is needed in order to validate our model in an independent sample.

Download Full-text

Women With Diabetes Are at Increased Relative Risk of Heart Failure Compared to Men: Insights From UK Biobank

Frontiers in Cardiovascular Medicine ◽

10.3389/fcvm.2021.658726 ◽

2021 ◽

Vol 8 ◽

Author(s):

Sucharitha Chadalavada ◽

Magnus T. Jensen ◽

Nay Aung ◽

Jackie Cooper ◽

Karim Lekadir ◽

...

Keyword(s):

Heart Failure ◽

Cox Regression ◽

Cardiac Risk ◽

Uk Biobank ◽

Contributing Factors ◽

Increased Risk ◽

Study Population ◽

The Uk

Aims: To investigate the effect of diabetes on mortality and incident heart failure (HF) according to sex, in the low risk population of UK Biobank. To evaluate potential contributing factors for any differences seen in HF end-point.Methods: The entire UK Biobank study population were included. Participants that withdrew consent or were diagnosed with diabetes after enrolment were excluded from the study. Univariate and multivariate cox regression models were used to assess endpoints of mortality and incident HF, with median follow-up periods of 9 years and 8 years respectively.Results: A total of 493,167 participants were included, hereof 22,685 with diabetes (4.6%). Two thousand four hundred fifty four died and 1,223 were diagnosed or admitted with HF during the follow up periods of 9 and 8 years respectively. Overall, the mortality and HF risk were almost doubled in those with diabetes compared to those without diabetes (hazard ratio (HR) of 1.9 for both mortality and heart failure) in the UK Biobank population. Women with diabetes (both types) experience a 22% increased risk of HF compared to men (HR of 2.2 (95% CI: 1.9–2.5) vs. 1.8 (1.7–2.0) respectively). Women with type 1 diabetes (T1DM) were associated with 88% increased risk of HF compared to men (HR 4.7 (3.6–6.2) vs. 2.5 (2.0–3.0) respectively), while the risk of HF for type 2 diabetes (T2DM) was 17% higher in women compared to men (2.0 (1.7–2.3) vs. 1.7 (1.6–1.9) respectively). The increased risk of HF in women was independent of confounding factors. The findings were similar in a model with all-cause mortality as a competing risk. This interaction between sex, diabetes and outcome of HF is much more prominent for T1DM (p = 0.0001) than T2DM (p = 0.1).Conclusion: Women with diabetes, particularly those with T1DM, experience a greater increase in risk of heart failure compared to men with diabetes, which cannot be explained by the increased prevalence of cardiac risk factors in this cohort.

Download Full-text

Machine Learning Approach Using Routine Immediate Postoperative Laboratory Values for Predicting Postoperative Mortality

Journal of Personalized Medicine ◽

10.3390/jpm11121271 ◽

2021 ◽

Vol 11 (12) ◽

pp. 1271

Author(s):

Jaehyeong Cho ◽

Jimyung Park ◽

Eugene Jeong ◽

Jihye Shin ◽

Sangjeong Ahn ◽

...

Keyword(s):

Machine Learning ◽

Random Forest ◽

Prediction Models ◽

External Validation ◽

Model Development ◽

Postoperative Mortality ◽

Random Forest Model ◽

Forest Model ◽

Laboratory Values ◽

Increased Risk

Background: Several prediction models have been proposed for preoperative risk stratification for mortality. However, few studies have investigated postoperative risk factors, which have a significant influence on survival after surgery. This study aimed to develop prediction models using routine immediate postoperative laboratory values for predicting postoperative mortality. Methods: Two tertiary hospital databases were used in this research: one for model development and another for external validation of the resulting models. The following algorithms were utilized for model development: LASSO logistic regression, random forest, deep neural network, and XGBoost. We built the models on the lab values from immediate postoperative blood tests and compared them with the SASA scoring system to demonstrate their efficacy. Results: There were 3817 patients who had immediate postoperative blood test values. All models trained on immediate postoperative lab values outperformed the SASA model. Furthermore, the developed random forest model had the best AUROC of 0.82 and AUPRC of 0.13, and the phosphorus level contributed the most to the random forest model. Conclusions: Machine learning models trained on routine immediate postoperative laboratory values outperformed previously published approaches in predicting 30-day postoperative mortality, indicating that they may be beneficial in identifying patients at increased risk of postoperative death.

Download Full-text

A Predictive Model for Kidney Transplant Graft Survival using Machine Learning

10.5121/csit.2020.101609 ◽

2020 ◽

Author(s):

Eric S. Pahl ◽

W. Nick Street ◽

Hans J. Johnson ◽

Alan I. Reed

Keyword(s):

Machine Learning ◽

Random Forest ◽

Cox Regression ◽

Risk Index ◽

Error Rates ◽

Machine Learning Method ◽

Learning Method ◽

Kaplan Meier ◽

End Stage

Kidney transplantation is the best treatment for end-stage renal failure patients. The predominant method used for kidney quality assessment is the Cox regression-based, kidney donor risk index. A machine learning method may provide improved prediction of transplant outcomes and help decision-making. A popular tree-based machine learning method, random forest, was trained and evaluated with the same data originally used to develop the risk index (70,242 observations from 1995-2005). The random forest successfully predicted an additional 2,148 transplants than the risk index with equal type II error rates of 10%. Predicted results were analyzed with follow-up survival outcomes up to 240 months after transplant using Kaplan-Meier analysis and confirmed that the random forest performed significantly better than the risk index (p<0.05). The random forest predicted significantly more successful and longer-surviving transplants than the risk index. Random forests and other machine learning models may improve transplant decisions.

Download Full-text

Predictors of venous thromboembolism in cancer patients using machine learning.

Journal of Clinical Oncology ◽

10.1200/jco.2021.39.15_suppl.e24042 ◽

2021 ◽

Vol 39 (15_suppl) ◽

pp. e24042-e24042

Author(s):

Ayse Ece Cali Daylan ◽

Danai Khemasuwan ◽

Hyun S. Kim ◽

Parvathy Geetha ◽

Sylvia Vania Alarcon Velasco ◽

...

Keyword(s):

Machine Learning ◽

Risk Factors ◽

Lung Cancer ◽

Venous Thromboembolism ◽

Cancer Patients ◽

Hematological Malignancies ◽

Cancer Subtypes ◽

Kaplan Meier ◽

Increased Risk

e24042 Background: The increased risk of venous thromboembolism (VTE) in cancer patients is clearly documented. However, given the heterogeneity and increased risk of bleeding in cancer population, patient selection for thromboprophylaxis is still challenging. Methods: In order to predict risk factors of VTE in cancer patients, we performed a retrospective study of 706 patients who were diagnosed with either solid or hematological malignancies between 2015 and 2019. Demographics, body mass index, complete blood count with differential, kidney function tests, electrolytes, liver function tests, lipid profile and cancer staging were recorded. Random forest analysis with bagging was used to rank these variables and the Kaplan-Meier survival analysis was implemented to stratify cancer subtypes based on the risk of VTE occurrence. Results: The mean follow-up time was 19 months. 8.2% of the patients developed VTE. Based on the random forest analysis, the most important five factors in prediction of VTE in cancer patients were determined as cancer subtype, white blood cell count, platelets, neutrophil and hemoglobin. At one-year mark, the risk of VTE in lung cancer and hematological malignancies was found to be significantly higher than breast, colorectal and endometrial cancer (p<0.05). Conclusions: Machine learning approach is infrequently used in risk factor prediction of VTE in cancer patients. The risk factors identified by the machine learning algorithm in our study are consistent with prior studies and show a clear difference in risk of VTE in various cancer subtypes. Moreover, hematological malignancies and lung cancer patients may develop VTE earlier than other cancer subtypes based on the Kaplan-Meier analysis. Further prospective studies with longer follow up are needed to better risk-stratify cancer patients and explore the temporal associations of VTE risk factors. [Table: see text]

Download Full-text

Spatial Models or Random Forest? Evaluating the Use of Spatially Explicit Machine Learning Methods to Predict Employment Density around New Transit Stations in Los Angeles

Geographical Analysis ◽

10.1111/gean.12273 ◽

2021 ◽

Author(s):

Kevin Credit

Keyword(s):

Machine Learning ◽

Random Forest ◽

Los Angeles ◽

Spatial Models ◽

Spatially Explicit ◽

Learning Methods ◽

Machine Learning Methods ◽

Employment Density

Download Full-text

Abstract 17348: Cardiovascular Risk Prediction in Diabetes From Machine Learning: The ACCORD Study

Circulation ◽

10.1161/circ.142.suppl_3.17348 ◽

2020 ◽

Vol 142 (Suppl_3) ◽

Author(s):

WENJUN FAN ◽

David N Wong ◽

Xiaowei Li ◽

Nathan Wong

Keyword(s):

Machine Learning ◽

Blood Pressure ◽

Cardiovascular Risk ◽

Random Forest ◽

Systolic Blood Pressure ◽

Total Cholesterol ◽

Risk Prediction ◽

Risk Calculator ◽

Ascvd Risk

Background: Atherosclerotic cardiovascular disease (ASCVD) risk prediction in persons with type 2 diabetes (T2DM) using existing calculators is imprecise. We aimed to develop a machine-learning (ML) model for prediction of ASCVD events in adults with T2DM. Methods: We utilized subjects with T2DM from the Action to Control Cardiovascular Risk in Diabetes (ACCORD) trial without known CVD and calculated their 10-year ASCVD risk using the ACC/AHA pooled cohort risk calculator (PCRC) predicting the composite outcome of myocardial infarction, non-fatal stroke and cardiovascular death using age, gender, race, systolic blood pressure (SBP), antihypertensive medication use, total cholesterol, high-density lipoprotein cholesterol, current smoking status and diabetes mellitus status. We developed an ASCVD risk calculator based on Random Forest (RF) ML algorithms using follow-up data from ACCORD with the same 9 predictors. 5-fold stratified random split was applied as cross-validation strategies. Results: A total of 6581 T2DM participants without baseline ASCVD were included in our final sample with a median follow up of 9.1 years. The performance of PCRC was modest with an AUC=0.604. In contrast, the ML model had much better performance with a RF AUC=0.866. The figure shows the rank of feature importance (%) from random forest modeling (from high to low): age, systolic blood pressure, total cholesterol, HDL-C, female gender, White ethnicity, current smoker, hypertension treatment. Conclusion: The ML ASCVD Risk Calculator outperforms the AHA/ACC PCRC in predicting ASCVD outcomes among those with T2DM from the ACCORD trial. Age, SBP, total cholesterol and HDL-C were the most important features in ASCVD prediction among those with T2DM. Future studies need to validate these and other ML algorithms and to explore their applicability in guidelines.

Download Full-text

P166 MICROBIOME AND EPIGENETIC PREDICTORS OF RESPONSE TO BIOLOGIC THERAPY IN PATIENTS WITH ULCERATIVE COLITIS

Inflammatory Bowel Diseases ◽

10.1093/ibd/zaa010.108 ◽

2020 ◽

Vol 26 (Supplement_1) ◽

pp. S41-S42

Author(s):

Rajesh Shah ◽

Lanlan Shen

Keyword(s):

Machine Learning ◽

Ulcerative Colitis ◽

Dna Methylation ◽

Random Forest ◽

Beta Diversity ◽

Response To Therapy ◽

Tissue Samples ◽

Predictors Of Response ◽

Machine Learning Model

Abstract Introduction Current ulcerative colitis (UC) treatments have variable efficacy and may take several weeks to assess improvement. Emerging data suggest the intestinal microbiota may serve as a biomarker and mechanistically may influence immune system activity through epigenetic regulation of host gene expression. The aim of this pilot project was to examine whether the colonic mucosal microbiota and mucosal DNA methylation patterns are associated with a response to treatment in UC. Methods We conducted a retrospective cross-sectional study of patients with active UC. Fresh frozen colon biopsy samples were obtained from the Texas Medical Center IBD Tissue Bank. Patients were included if they had a colonoscopy performed to assess disease activity and follow up through 14 weeks to assess response. Disease activity was defined using the Partial Mayo Score and response was defined as a 2 point or greater decrease in the score after 14 weeks of follow up. 16s rRNA gene sequencing and DNA methylation pyrosequencing were used to define the microbiome and quantify DNA methylation. Comparisons were performed of alpha and beta diversity, taxonomy and DNA methylation between responders and non-responders at 14 weeks. Additionally, a random forest machine learning model was developed to identify predictors of response. Results We identified 16 patients with tissue samples from the cecum/ascending, transverse and rectum/sigmoid available for analysis. After excluding patients with limited rectal disease, recent antibiotic use and inadequate follow up, 4 patients (2 per group) were included and thus provided 12 tissue samples for analysis. The mean age of responders was 38 and non-responders was 24. 50% of patients were male and all were Caucasian. Responders had a numerically lower alpha diversity, though this did not reach statistical significance (p=0.18). Responders and non-responders separated in unweighted (p=0.001) and weighted (p=0.001) beta diversity analysis. Responders had a greater abundance of Firmicutes and Bacteroidetes and lower abundance of Proteobacteria compared to non-responders. This corresponded to responders having a greater abundance of genera for Bifidobacterium, Faecalibacterium and Roseburia. Additionally, we saw increased methylation in the P16, HOXC5 and B4GALNT1 genes of responders compared to non-responders. Finally, in a random forest machine learning model, predictors of response included left sided disease extent and OTUs for the genus Roseburia. Discussion Patients with UC and response to therapy had a significantly different pre-treatment microbiome and methylation of genes related to intestinal barrier function, including B4GALNT1. Larger studies will be needed to validate these findings, but these results suggest the microbiome and DNA methylation changes may be effective biomarkers of response to therapy and warrant further study.

Download Full-text

Machine Learning Prediction of Progression in FEV1 in the COPDGene Study

10.1101/2022.01.10.22268804 ◽

2022 ◽

Author(s):

Adel Boueiz ◽

Zhonghui Xu ◽

Yale Chang ◽

Aria Masoomi ◽

Andrew Gregory ◽

...

Keyword(s):

Machine Learning ◽

Disease Progression ◽

Model Performance ◽

Roc Curves ◽

Imaging Features ◽

Predictive Variables ◽

Increased Risk ◽

Random Forest Models ◽

Advanced Stages

Background: The heterogeneous nature of COPD complicates the identification of the predictors of disease progression and consequently the development of effective therapies. We aimed to improve the prediction of disease progression in COPD by using machine learning and incorporating a rich dataset of phenotypic features. Methods: We included 4,496 smokers with available data from their enrollment and 5-year follow-up visits in the Genetic Epidemiology of COPD (COPDGene) study. We constructed supervised random forest models to predict 5-year progression in FEV1 from 46 baseline demographic, clinical, physiologic, and imaging features. Using cross-validation, we randomly partitioned participants into training and testing samples. We also validated the results in the COPDGene 10-year follow-up visit. Results: Predicting the change in FEV1 over time is more challenging than simply predicting the future absolute FEV1 level. Nevertheless, the area under the ROC curves for the prediction of subjects in the top quartile of observed disease progression was 0.70 in the 10-year follow-up data. The model performance accuracy was best for GOLD1-2 subjects and it was harder to achieve accurate prediction in advanced stages of the disease. Predictive variables differed in their relative importance as well as for the predictions by GOLD grade. Conclusion: This state-of-the art approach along with deep phenotyping predicts FEV1 progression with reasonable accuracy. There is significant room for improvement in future models. This prediction model facilitates the identification of smokers at increased risk for rapid disease progression. Such findings may be useful in the selection of patient populations for targeted clinical trials.

Download Full-text