A proof-of-concept study applying machine learning methods to putative risk factors for eating disorders: results from the multi-centre European project on healthy eating

2021 ◽  
pp. 1-10
Author(s):  
I. Krug ◽  
J. Linardon ◽  
C. Greenwood ◽  
G. Youssef ◽  
J. Treasure ◽  
...  

Abstract Background Despite a wide range of proposed risk factors and theoretical models, prediction of eating disorder (ED) onset remains poor. This study undertook the first comparison of two machine learning (ML) approaches [penalised logistic regression (LASSO), and prediction rule ensembles (PREs)] to conventional logistic regression (LR) models to enhance prediction of ED onset and differential ED diagnoses from a range of putative risk factors. Method Data were part of a European Project and comprised 1402 participants, 642 ED patients [52% with anorexia nervosa (AN) and 40% with bulimia nervosa (BN)] and 760 controls. The Cross-Cultural Risk Factor Questionnaire, which assesses retrospectively a range of sociocultural and psychological ED risk factors occurring before the age of 12 years (46 predictors in total), was used. Results All three statistical approaches had satisfactory model accuracy, with an average area under the curve (AUC) of 86% for predicting ED onset and 70% for predicting AN v. BN. Predictive performance was greatest for the two regression methods (LR and LASSO), although the PRE technique relied on fewer predictors with comparable accuracy. The individual risk factors differed depending on the outcome classification (EDs v. non-EDs and AN v. BN). Conclusions Even though the conventional LR performed comparably to the ML approaches in terms of predictive accuracy, the ML methods produced more parsimonious predictive models. ML approaches offer a viable way to modify screening practices for ED risk that balance accuracy against participant burden.

2020 ◽  
Author(s):  
Rongyu Wei ◽  
Shuqun Li ◽  
Liying Ren ◽  
Junxiong Yu ◽  
Weijia Liao

Abstract Background: There are limitations in judging the occurrence of lymph node metastasis (LNM) in hepatocellular carcinoma (HCC) before surgery. The purpose of this study was to establish a preoperative nomogram for predicting the risk of LNM in HCC and to explore its clinical utility.Methods: A total of 195 HCC patients undergoing radical hepatectomy were retrospectively analyzed. According to the presence or absence of LNM, the patients were divided into two groups, and the clinical characteristics of the two groups were compared. Risk factors for LNM were assessed based on logistic regression, and a nomogram was established. The receiver operating characteristic (ROC) curve was used to calculate area under the curve (AUC) of the logistic regression model, and the predictive accuracy of the nomogram was evaluated by the concordance index (C-index). The clinical efficacy of the nomogram was detected by decision curve analysis (DCA).Results: Logistic analysis revealed hepatitis B surface antigen (HBsAg) (HR = 3.50, 95% CI, 1.30-9.42, P = 0.013), globulin (HR = 2.46, 95% CI, 1.05-5.75, P = 0.039), neutrophil to lymphocyte ratio (NLR) (HR = 7.64, 95% CI, 3.22-18.11, P < 0.001) and tumor size (HR = 3.86, 95% CI, 1.26-11.88 P = 0.018) were independent risk factors for lymph node metastasis in HCC. The nomogram was established based on the above 4 variables, and the AUC was 0.835 (95% CI, 0.780-0.890). The calibration curve showed that the model has good predictive ability, and DCA indicates good predictive effect.Conclusions: The nomogram established by analyzing the preoperative clinical characteristics is a simple tool that can predict the risk of lymph node metastasis in HCC patients and guide clinicians to make better clinical decisions.


2009 ◽  
Vol 3 (1) ◽  
pp. 81-95 ◽  
Author(s):  
Francesco Macrina ◽  
Paolo Emilio Puddu ◽  
Alfonso Sciangula ◽  
Fausto Trigilia ◽  
Marco Totaro ◽  
...  

Background:There are few comparative reports on the overall accuracy of neural networks (NN), assessed only versus multiple logistic regression (LR), to predict events in cardiovascular surgery studies and none has been performed among acute aortic dissection (AAD) Type A patients.Objectives:We aimed at investigating the predictive potential of 30-day mortality by a large series of risk factors in AAD Type A patients comparing the overall performance of NN versus LR.Methods:We investigated 121 plus 87 AAD Type A patients consecutively operated during 7 years in two Centres. Forced and stepwise NN and LR solutions were obtained and compared, using receiver operating characteristic area under the curve (AUC) and their 95% confidence intervals (CI) and Gini’s coefficients. Both NN and LR models were re-applied to data from the second Centre to adhere to a methodological imperative with NN.Results:Forced LR solutions provided AUC 87.9±4.1% (CI: 80.7 to 93.2%) and 85.7±5.2% (CI: 78.5 to 91.1%) in the first and second Centre, respectively. Stepwise NN solution of the first Centre had AUC 90.5±3.7% (CI: 83.8 to 95.1%). The Gini’s coefficients for LR and NN stepwise solutions of the first Centre were 0.712 and 0.816, respectively. When the LR and NN stepwise solutions were re-applied to the second Centre data, Gini’s coefficients were, respectively, 0.761 and 0.850. Few predictors were selected in common by LR and NN models: the presence of pre-operative shock, intubation and neurological symptoms, immediate post-operative presence of dialysis in continuous and the quantity of post-operative bleeding in the first 24 h. The length of extracorporeal circulation, post-operative chronic renal failure and the year of surgery were specifically detected by NN.Conclusions:Different from the International Registry of AAD, operative and immediate post-operative factors were seen as potential predictors of short-term mortality. We report a higher overall predictive accuracy with NN than with LR. However, the list of potential risk factors to predict 30-day mortality after AAD Type A by NN model is not enlarged significantly.


2021 ◽  
Vol 99 (Supplement_3) ◽  
pp. 15-16
Author(s):  
Pablo A S Fonseca ◽  
Massimo Tornatore ◽  
Angela Cánovas

Abstract Reduced fertility is one of the main causes of economic losses in dairy farms. The cost of a stillbirth is estimated in US$ 938 per case in Holstein herds. Machine learning (ML) is gaining popularity in the livestock sector as a mean to identify hidden patterns and due to its potential to address dimensionality problems. Here we investigate the application of ML algorithms for the prediction of cows with higher stillbirth susceptibility in two scenarios: cows with &gt;25% and &gt;33.33% of stillbirths among birth records. These thresholds correspond to percentiles 75 (still_75) and 90 (still_90), respectively. A total of 10,570 cows and 50,541 birth records were collected to perform a haplotype-based genome-wide association study. Five-hundred significant pseudo single nucleotide polymorphisms (pseudo-SNPs) (False-Discovery Rate&lt; 0.05) were used as input features of ML-based predictions to determine if the cow is in the top-75 and top-90 percentiles. Table 1 shows the classification performance of the investigated ML and linear models. The ML models outperformed linear models for both thresholds. In general, still_75 showed higher F1 values compared to still_90, suggesting a lower misclassification ratio when a less stringent threshold is used. We observe that accuracy of the models in our study is higher when compared to ML-based prediction accuracies in other breeds, e.g. compared to the accuracies of 0.46 and 0.67 that were achieved using SNPs for body weight in Brahman and fertility traits in Nellore, respectively. Xgboost algorithm shows the highest balanced accuracy (BA; 0.625), F1-score (0.588) and area under the curve (AUC; 0.688), suggesting that xgboost can achieve the highest predictive performance and the lowest difference in misclassification ratio between classes. The ML applied over haplotype libraries is an interesting approach for the detection of animals with higher susceptibility to stillbirths due to highest predictive accuracy and relatively lower misclassification ratio.


2020 ◽  
Author(s):  
John Booth ◽  
Ben Margetts ◽  
William Bryant ◽  
Richard Issitt ◽  
John Ciaran Hutchinson ◽  
...  

Introduction: Sudden unexpected death in infancy (SUDI) represents the commonest presentation of postneonatal death, yet despite full postmortem examination (autopsy), the cause of death is only determined in around 45% of cases, the majority remaining unexplained. In order to aid counselling and understand how to improve the investigation, we explored whether machine learning could be used to derive data driven insights for prediction of infant autopsy outcome. Methods: A paediatric autopsy database containing >7,000 cases in total with >300 variables per case, was analysed with cases categorised both by stage of examination (external, internal and internal with histology), and autopsy outcome classified as explained-(medical cause of death identified) or unexplained. For the purposes of this study only cases from infant and child deaths aged ≤ 2 years were included (N=3100). Following this, decision tree, random forest, and gradient boosting models were iteratively trained and evaluated for each stage of the post-mortem examination and compared using predictive accuracy metrics. Results: Data from 3,100 infant and young child autopsies were included. The naive decision tree model using initial external examination data had a predictive performance of 68% for determining whether a medical cause of death could be identified. Model performance increased when internal examination data was included and a core set of data items were identified using model feature importance as key variables for determining autopsy outcome. The most effective model was the XG Boost, with overall predictive performance of 80%, demonstrating age at death, and cardiovascular or respiratory histological findings as the most important variables associated with determining cause of death. Conclusion: This study demonstrates the feasibility of using machine learning models to objectively determine component importance of complex medical procedures, in this case infant autopsy, to inform clinical practice. It further highlights the value of collecting routine clinical procedural data according to defined standards. This approach can be applied to a wide range of clinical and operational healthcare scenarios providing objective, evidence-based information for uses such counselling, decision making and policy development.


2021 ◽  
Author(s):  
Yipeng Cheng ◽  
Danni A Gadd ◽  
Christian Gieger ◽  
Karla Monterrubio-Gómez ◽  
Yufei Zhang ◽  
...  

Type 2 diabetes mellitus (T2D) is one of the most prevalent diseases in the world and presents a major health and economic burden, a notable proportion of which could be alleviated with improved early prediction and intervention. While standard risk factors including age, obesity, and hypertension have shown good predictive performance, we show that the use of CpG DNA methylation information leads to a significant improvement in the prediction of 10-year T2D incidence risk. Whilst previous studies have been largely constrained by linear assumptions and the use of CpGs one-at-the-time, we have adopted a more flexible approach based on a range of linear and tree-ensemble models for classification and time-to-event prediction. Using the Generation Scotland cohort (n=9,537) our best performing model (Area Under the Curve (AUC)=0.880, Precision Recall AUC (PRAUC)=0.539, McFadden's R2=0.316) used a LASSO Cox proportional-hazards predictor and showed notable improvement in onset prediction, above and beyond standard risk factors (AUC=0.860, PRAUC=0.444 R2=0.261). Replication of the main finding was observed in an external test dataset (the German-based KORA study, p=3.7x10-4). Tree-ensemble methods provided comparable performance and future improvements to these models are discussed. Finally, we introduce MethylPipeR, an R package with accompanying user interface, for systematic and reproducible development of complex trait and incident disease predictors. While MethylPipeR was applied to incident T2D prediction with DNA methylation in our experiments, the package is designed for generalised development of predictive models and is applicable to a wide range of omics data and target traits.


2021 ◽  
Author(s):  
Adrian G. Zucco ◽  
Rudi Agius ◽  
Rebecka Svanberg ◽  
Kasper S. Moestrup ◽  
Ramtin Z. Marandi ◽  
...  

Interpretable risk assessment of SARS-CoV-2 positive patients can aid clinicians to implement precision medicine. Here we trained a machine learning model to predict mortality within 12 weeks of a first positive SARS-CoV-2 test. By leveraging data on 33,928 confirmed SARS-CoV-2 cases in eastern Denmark, we considered 2,723 variables extracted from electronic health records (EHR) including demographics, diagnoses, medications, laboratory test results and vital parameters. A discrete-time framework for survival modelling enabled us to predict personalized survival curves and explain individual risk factors. Performances of weighted concordance index 0.95 and precision-recall area under the curve 0.71 were measured on the test set. Age, sex, number of medications, previous hospitalizations and lymphocyte counts were identified as top mortality risk factors. Our explainable survival model developed on EHR data also revealed temporal dynamics of the 22 selected risk factors. Upon further validation, this model may allow direct reporting of personalized survival probabilities in routine care.


Author(s):  
Kazutaka Uchida ◽  
Junichi Kouno ◽  
Shinichi Yoshimura ◽  
Norito Kinjo ◽  
Fumihiro Sakakibara ◽  
...  

AbstractIn conjunction with recent advancements in machine learning (ML), such technologies have been applied in various fields owing to their high predictive performance. We tried to develop prehospital stroke scale with ML. We conducted multi-center retrospective and prospective cohort study. The training cohort had eight centers in Japan from June 2015 to March 2018, and the test cohort had 13 centers from April 2019 to March 2020. We use the three different ML algorithms (logistic regression, random forests, XGBoost) to develop models. Main outcomes were large vessel occlusion (LVO), intracranial hemorrhage (ICH), subarachnoid hemorrhage (SAH), and cerebral infarction (CI) other than LVO. The predictive abilities were validated in the test cohort with accuracy, positive predictive value, sensitivity, specificity, area under the receiver operating characteristic curve (AUC), and F score. The training cohort included 3178 patients with 337 LVO, 487 ICH, 131 SAH, and 676 CI cases, and the test cohort included 3127 patients with 183 LVO, 372 ICH, 90 SAH, and 577 CI cases. The overall accuracies were 0.65, and the positive predictive values, sensitivities, specificities, AUCs, and F scores were stable in the test cohort. The classification abilities were also fair for all ML models. The AUCs for LVO of logistic regression, random forests, and XGBoost were 0.89, 0.89, and 0.88, respectively, in the test cohort, and these values were higher than the previously reported prediction models for LVO. The ML models developed to predict the probability and types of stroke at the prehospital stage had superior predictive abilities.


2010 ◽  
Vol 11 (3) ◽  
pp. 199-208 ◽  
Author(s):  
F B S Briggs ◽  
P P Ramsay ◽  
E Madden ◽  
J M Norris ◽  
V M Holers ◽  
...  

Hypertension ◽  
2021 ◽  
Vol 78 (5) ◽  
pp. 1595-1604
Author(s):  
Fabrizio Buffolo ◽  
Jacopo Burrello ◽  
Alessio Burrello ◽  
Daniel Heinrich ◽  
Christian Adolf ◽  
...  

Primary aldosteronism (PA) is the cause of arterial hypertension in 4% to 6% of patients, and 30% of patients with PA are affected by unilateral and surgically curable forms. Current guidelines recommend screening for PA ≈50% of patients with hypertension on the basis of individual factors, while some experts suggest screening all patients with hypertension. To define the risk of PA and tailor the diagnostic workup to the individual risk of each patient, we developed a conventional scoring system and supervised machine learning algorithms using a retrospective cohort of 4059 patients with hypertension. On the basis of 6 widely available parameters, we developed a numerical score and 308 machine learning-based models, selecting the one with the highest diagnostic performance. After validation, we obtained high predictive performance with our score (optimized sensitivity of 90.7% for PA and 92.3% for unilateral PA [UPA]). The machine learning-based model provided the highest performance, with an area under the curve of 0.834 for PA and 0.905 for diagnosis of UPA, with optimized sensitivity of 96.6% for PA, and 100.0% for UPA, at validation. The application of the predicting tools allowed the identification of a subgroup of patients with very low risk of PA (0.6% for both models) and null probability of having UPA. In conclusion, this score and the machine learning algorithm can accurately predict the individual pretest probability of PA in patients with hypertension and circumvent screening in up to 32.7% of patients using a machine learning-based model, without omitting patients with surgically curable UPA.


2017 ◽  
Vol 79 (02) ◽  
pp. 123-130 ◽  
Author(s):  
Whitney Muhlestein ◽  
Dallin Akagi ◽  
Justiss Kallos ◽  
Peter Morone ◽  
Kyle Weaver ◽  
...  

Objective Machine learning (ML) algorithms are powerful tools for predicting patient outcomes. This study pilots a novel approach to algorithm selection and model creation using prediction of discharge disposition following meningioma resection as a proof of concept. Materials and Methods A diversity of ML algorithms were trained on a single-institution database of meningioma patients to predict discharge disposition. Algorithms were ranked by predictive power and top performers were combined to create an ensemble model. The final ensemble was internally validated on never-before-seen data to demonstrate generalizability. The predictive power of the ensemble was compared with a logistic regression. Further analyses were performed to identify how important variables impact the ensemble. Results Our ensemble model predicted disposition significantly better than a logistic regression (area under the curve of 0.78 and 0.71, respectively, p = 0.01). Tumor size, presentation at the emergency department, body mass index, convexity location, and preoperative motor deficit most strongly influence the model, though the independent impact of individual variables is nuanced. Conclusion Using a novel ML technique, we built a guided ML ensemble model that predicts discharge destination following meningioma resection with greater predictive power than a logistic regression, and that provides greater clinical insight than a univariate analysis. These techniques can be extended to predict many other patient outcomes of interest.


Sign in / Sign up

Export Citation Format

Share Document