scholarly journals Machine learning identifies SNPs predictive of advanced coronary artery calcium in ClinSeq® and Framingham Heart Study cohorts

2017 ◽  
Author(s):  
Cihan Oguz ◽  
Shurjo K Sen ◽  
Adam R Davis ◽  
Yi-Ping Fu ◽  
Christopher J O’Donnell ◽  
...  

ABSTRACTOne goal of personalized medicine is leveraging the emerging tools of data science to guide medical decision-making. Achieving this using disparate data sources is most daunting for polygenic traits and requires systems level approaches. To this end, we employed random forests (RF) and neural networks (NN) for predictive modeling of coronary artery calcification (CAC), which is an intermediate end-phenotype of coronary artery disease (CAD). Model inputs were derived from advanced cases in the ClinSeq® discovery cohort (n=16) and the FHS replication cohort (n=36) from 89th−99th CAC score percentile range, and age-matching controls (ClinSeq® n=16, FHS n=36) with no detectable CAC (all subjects were Caucasian males). These inputs included clinical variables (CLIN), genotypes of 57 SNPs associated with CAC in past GWAS (SNP Set-1), and an alternative set of 56 SNPs (SNP Set-2) ranked highest in terms of their nominal correlation with advanced CAC state in the discovery cohort. Predictive performance was assessed by computing the areas under receiver operating characteristics curves (AUC). Within the discovery cohort, RF models generated AUC values of 0.69 with CLIN, 0.72 with SNP Set-1, and 0.77 with their combination. In the replication cohort, SNP Set-1 was again more predictive (AUC=0.78) than CLIN (AUC=0.61), but also more predictive than the combination (AUC=0.75). In contrast, in both cohorts, SNP Set-2 generated enhanced predictive performance with or without CLIN (AUC> 0.8). Using the 21 SNPs of SNP Set-2 that produced optimal predictive performance in both cohorts, we developed NN models trained with ClinSeq® data and tested with FHS data and replicated the high predictive accuracy (AUC>0.8) with several topologies, thereby identifying several potential susceptibility loci for advanced CAD. Several CAD-related biological processes were found to be enriched in the network of genes constructed from these loci. In both cohorts, SNP Set-1 derived from past CAC GWAS yielded lower performance than SNP Set-2 derived from “extreme” CAC cases within the discovery cohort. Machine learning tools hold promise for surpassing the capacity of conventional GWAS-based approaches for creating predictive models utilizing the complex interactions between disease predictors intrinsic to the pathogenesis of polygenic disorders.

2021 ◽  
pp. 1-10
Author(s):  
I. Krug ◽  
J. Linardon ◽  
C. Greenwood ◽  
G. Youssef ◽  
J. Treasure ◽  
...  

Abstract Background Despite a wide range of proposed risk factors and theoretical models, prediction of eating disorder (ED) onset remains poor. This study undertook the first comparison of two machine learning (ML) approaches [penalised logistic regression (LASSO), and prediction rule ensembles (PREs)] to conventional logistic regression (LR) models to enhance prediction of ED onset and differential ED diagnoses from a range of putative risk factors. Method Data were part of a European Project and comprised 1402 participants, 642 ED patients [52% with anorexia nervosa (AN) and 40% with bulimia nervosa (BN)] and 760 controls. The Cross-Cultural Risk Factor Questionnaire, which assesses retrospectively a range of sociocultural and psychological ED risk factors occurring before the age of 12 years (46 predictors in total), was used. Results All three statistical approaches had satisfactory model accuracy, with an average area under the curve (AUC) of 86% for predicting ED onset and 70% for predicting AN v. BN. Predictive performance was greatest for the two regression methods (LR and LASSO), although the PRE technique relied on fewer predictors with comparable accuracy. The individual risk factors differed depending on the outcome classification (EDs v. non-EDs and AN v. BN). Conclusions Even though the conventional LR performed comparably to the ML approaches in terms of predictive accuracy, the ML methods produced more parsimonious predictive models. ML approaches offer a viable way to modify screening practices for ED risk that balance accuracy against participant burden.


2019 ◽  
Author(s):  
Donald Salami ◽  
Carla Alexandra Sousa ◽  
Maria do Rosário Oliveira Martins ◽  
César Capinha

ABSTRACTThe geographical spread of dengue is a global public health concern. This is largely mediated by the importation of dengue from endemic to non-endemic areas via the increasing connectivity of the global air transport network. The dynamic nature and intrinsic heterogeneity of the air transport network make it challenging to predict dengue importation.Here, we explore the capabilities of state-of-the-art machine learning algorithms to predict dengue importation. We trained four machine learning classifiers algorithms, using a 6-year historical dengue importation data for 21 countries in Europe and connectivity indices mediating importation and air transport network centrality measures. Predictive performance for the classifiers was evaluated using the area under the receiving operating characteristic curve, sensitivity, and specificity measures. Finally, we applied practical model-agnostic methods, to provide an in-depth explanation of our optimal model’s predictions on a global and local scale.Our best performing model achieved high predictive accuracy, with an area under the receiver operating characteristic score of 0.94 and a maximized sensitivity score of 0.88. The predictor variables identified as most important were the source country’s dengue incidence rate, population size, and volume of air passengers. Network centrality measures, describing the positioning of European countries within the air travel network, were also influential to the predictions.We demonstrated the high predictive performance of a machine learning model in predicting dengue importation and the utility of the model-agnostic methods to offer a comprehensive understanding of the reasons behind the predictions. Similar approaches can be utilized in the development of an operational early warning surveillance system for dengue importation.


2021 ◽  
Vol 23 (Supplement_6) ◽  
pp. vi143-vi144
Author(s):  
Omaditya Khanna ◽  
Anahita Fathi Kazerooni ◽  
Jose A Garcia ◽  
Chiharu Sako ◽  
Sherjeel Arif ◽  
...  

Abstract PURPOSE Although WHO grade I meningiomas are considered ‘benign’ tumors, an elevated Ki-67 is one crucial factor that has been shown to influence clinical outcomes. In this study, we use standard pre-operative MRI and develop a machine learning (ML) model to predict the Ki-67 in WHO grade I meningiomas. METHODS A retrospective analysis was performed of 306 patients that underwent surgical resection. The mean and median Ki-67 of tumor specimens were 4.84 ± 4.03% (range: 0.3–33.6) and 3.7% (Q1:2.3%, Q3:6%), respectively. Pre-operative MRI was used to perform radiomic feature extraction (N=2,520) followed by ML modeling using least absolute shrinkage and selection operator (LASSO) wrapped with support vector machine (SVM) through nested cross-validation on a discovery cohort (N=230), to stratify tumors based on Ki-67 < 5% and ≥ 5%. A replication cohort (N=76) was kept ‘unseen’ in order to provide insights regarding the generalizability of our predictive model. RESULTS A total of 60 radiomic features extracted from seven different MRI sequences were used in the final model. With this model, an AUC of 0.84 (95% CI: 0.78-0.90), with associated sensitivity and specificity of 84.1% and 73.3%, respectively, were achieved in the discovery cohort. The selected features in the trained predictive model were then applied to the subjects of the replication cohort and the model was applied independently in this cohort. An AUC of 0.83 (95% CI: 0.73-0.94), with a sensitivity of 82.6% and specificity of 85.5% was obtained for this independent testing. Furthermore, the model performed commendably when applied to all skull base and non-skull base tumors in our patient cohort, evidenced by comparable AUC values of 0.86 and 0.83, respectively. CONCLUSION The results of this study may provide enhanced diagnostics to the surgeon pre-operatively such that it can guide surgical strategy and individual patient treatment paradigms.


Genes ◽  
2020 ◽  
Vol 11 (12) ◽  
pp. 1446
Author(s):  
Tanyaporn Pattarabanjird ◽  
Corban Cress ◽  
Anh Nguyen ◽  
Angela Taylor ◽  
Stefan Bekiranov ◽  
...  

Background: Machine learning (ML) has emerged as a powerful approach for predicting outcomes based on patterns and inferences. Improving prediction of severe coronary artery disease (CAD) has the potential for personalizing prevention and treatment strategies and for identifying individuals that may benefit from cardiac catheterization. We developed a novel ML approach combining traditional cardiac risk factors (CRF) with a single nucleotide polymorphism (SNP) in a gene associated with human CAD (ID3 rs11574) to enhance prediction of CAD severity; Methods: ML models incorporating CRF along with ID3 genotype at rs11574 were evaluated. The most predictive model, a deep neural network, was used to classify patients into high (>32) and low level (≤32) Gensini severity score. This model was trained on 325 and validated on 82 patients. Prediction performance of the model was summarized by a confusion matrix and area under the receiver operating characteristics curve (ROC-AUC); and Results: Our neural network predicted severity score with 81% and 87% accuracy for the low and the high groups respectively with an ROC-AUC of 0.84 for 82 patients in the test group. The addition of ID3 rs11574 to CRF significantly enhanced prediction accuracy from 65% to 81% in the low group, and 72% to 84% in the high group. Age, high-density lipoprotein (HDL), and systolic blood pressure were the top 3 contributors in predicting severity score; Conclusions: Our neural network including ID3 rs11574 improved prediction of CAD severity over use of Framingham score, which may potentially be helpful for clinical decision making in patients at increased risk of complications from coronary angiography.


2020 ◽  
Vol 10 (1) ◽  
Author(s):  
Qiangrong Zhai ◽  
Zi Lin ◽  
Hongxia Ge ◽  
Yang Liang ◽  
Nan Li ◽  
...  

AbstractThe number of critically ill patients has increased globally along with the rise in emergency visits. Mortality prediction for critical patients is vital for emergency care, which affects the distribution of emergency resources. Traditional scoring systems are designed for all emergency patients using a classic mathematical method, but risk factors in critically ill patients have complex interactions, so traditional scoring cannot as readily apply to them. As an accurate model for predicting the mortality of emergency department critically ill patients is lacking, this study’s objective was to develop a scoring system using machine learning optimized for the unique case of critical patients in emergency departments. We conducted a retrospective cohort study in a tertiary medical center in Beijing, China. Patients over 16 years old were included if they were alive when they entered the emergency department intensive care unit system from February 2015 and December 2015. Mortality up to 7 days after admission into the emergency department was considered as the primary outcome, and 1624 cases were included to derive the models. Prospective factors included previous diseases, physiologic parameters, and laboratory results. Several machine learning tools were built for 7-day mortality using these factors, for which their predictive accuracy (sensitivity and specificity) was evaluated by area under the curve (AUC). The AUCs were 0.794, 0.840, 0.849 and 0.822 respectively, for the SVM, GBDT, XGBoost and logistic regression model. In comparison with the SAPS 3 model (AUC = 0.826), the discriminatory capability of the newer machine learning methods, XGBoost in particular, is demonstrated to be more reliable for predicting outcomes for emergency department intensive care unit patients.


2019 ◽  
Vol 37 (15_suppl) ◽  
pp. 2581-2581 ◽  
Author(s):  
Paul Johannet ◽  
Nicolas Coudray ◽  
George Jour ◽  
Douglas MacArthur Donnelly ◽  
Shirin Bajaj ◽  
...  

2581 Background: There is growing interest in optimizing patient selection for treatment with immune checkpoint inhibitors (ICIs). We postulate that phenotypic features present in metastatic melanoma tissue reflect the biology of tumor cells, immune cells, and stromal tissue, and hence can provide predictive information about tumor behavior. Here, we test the hypothesis that machine learning algorithms can be trained to predict the likelihood of response and/or toxicity to ICIs. Methods: We examined 124 stage III/IV melanoma patients who received anti-CTLA-4 (n = 81), anti-PD-1 (n = 25), or combination (n = 18) therapy as first line. The tissue analyzed was resected before treatment with ICIs. In total, 340 H&E slides were digitized and annotated for three regions of interest: tumor, lymphocytes, and stroma. The slides were then partitioned into training (n = 285), validation (n = 26), and test (n = 29) sets. Slides were tiled (299x299 pixels) at 20X magnification. We trained a deep convolutional neural network (DCNN) to automatically segment the images into each of the three regions and then deconstruct images into their component features to detect non-obvious patterns with objectivity and reproducibility. We then trained the DCNN for two classifications: 1) complete/partial response versus progression of disease (POD), and 2) severe versus no immune-related adverse events (irAEs). Predictive accuracy was estimated by area under the curve (AUC) of receiver operating characteristics (ROC). Results: The DCNN identified tumor within LN with AUC 0.987 and within ST with AUC 0.943. Prediction of POD based on ST-only always performed better than prediction based on LN-only (AUC 0.84 compared to 0.61, respectively). The DCNN had an average AUC 0.69 when analyzing only tumor regions from both LN and ST data sets and AUC 0.68 when analyzing tumor and lymphocyte regions. Severe irAEs were predicted with limited accuracy (AUC 0.53). Conclusions: Our results support the potential application of machine learning on pre-treatment histologic slides to predict response to ICIs. It also revealed their limited value in predicting toxicity. We are currently investigating whether the predictive capability of the algorithm can be further improved by incorporating additional immunologic biomarkers.


2016 ◽  
Vol 16 (01) ◽  
pp. 1640010 ◽  
Author(s):  
YING-TSANG LO ◽  
HAMIDO FUJITA ◽  
TUN-WEN PAI

Background: Coronary artery disease (CAD) is one of the most representative cardiovascular diseases. Early and accurate prediction of CAD based on physiological measurements can reduce the risk of heart attack through medicine therapy, healthy diet, and regular physical activity. Methods:Four heart disease datasets from the UC Irvine Machine Learning Repository were combined and re-examined to remove incomplete entries, and a total of 822 cases were utilized in this study. Seven machine learning methods, including Naïve Bayes, artificial neural networks (ANNs), sequential minimal optimization (SMO), k-nearest neighbor (KNN), AdaBoost, J48, and random forest, were adopted to analyze the collected datasets for CAD prediction. By combining co-expressed observations and an ensemble voting mechanism, we designed and evaluated a new medical decision classifier for CAD prediction. The TOPSIS (Technique for Order Preference by Similarity to an Ideal Solution) algorithm was applied to determine the best prediction method for CAD diagnosis. Results: Features of systolic blood pressure, cholesterol, heart rate, and ST depression are considered to be the most significant differences between patients with and without CADs. We show that the prediction capability of seven machine learning classifiers can be enhanced by integrating combinations of observed co-expressed features. Finally, compared to the use of any single classifier, the proposed voting mechanism achieved optimal performance according to TOPSIS.


2021 ◽  
Vol 16 (1) ◽  
Author(s):  
Hooman Zabeti ◽  
Nick Dexter ◽  
Amir Hosein Safari ◽  
Nafiseh Sedaghat ◽  
Maxwell Libbrecht ◽  
...  

Abstract Motivation Prediction of drug resistance and identification of its mechanisms in bacteria such as Mycobacterium tuberculosis, the etiological agent of tuberculosis, is a challenging problem. Solving this problem requires a transparent, accurate, and flexible predictive model. The methods currently used for this purpose rarely satisfy all of these criteria. On the one hand, approaches based on testing strains against a catalogue of previously identified mutations often yield poor predictive performance; on the other hand, machine learning techniques typically have higher predictive accuracy, but often lack interpretability and may learn patterns that produce accurate predictions for the wrong reasons. Current interpretable methods may either exhibit a lower accuracy or lack the flexibility needed to generalize them to previously unseen data. Contribution In this paper we propose a novel technique, inspired by group testing and Boolean compressed sensing, which yields highly accurate predictions, interpretable results, and is flexible enough to be optimized for various evaluation metrics at the same time. Results We test the predictive accuracy of our approach on five first-line and seven second-line antibiotics used for treating tuberculosis. We find that it has a higher or comparable accuracy to that of commonly used machine learning models, and is able to identify variants in genes with previously reported association to drug resistance. Our method is intrinsically interpretable, and can be customized for different evaluation metrics. Our implementation is available at github.com/hoomanzabeti/INGOT_DR and can be installed via The Python Package Index (Pypi) under ingotdr. This package is also compatible with most of the tools in the Scikit-learn machine learning library.


2020 ◽  
Vol 10 (1) ◽  
Author(s):  
Donald Salami ◽  
Carla Alexandra Sousa ◽  
Maria do Rosário Oliveira Martins ◽  
César Capinha

Abstract The geographical spread of dengue is a global public health concern. This is largely mediated by the importation of dengue from endemic to non-endemic areas via the increasing connectivity of the global air transport network. The dynamic nature and intrinsic heterogeneity of the air transport network make it challenging to predict dengue importation. Here, we explore the capabilities of state-of-the-art machine learning algorithms to predict dengue importation. We trained four machine learning classifiers algorithms, using a 6-year historical dengue importation data for 21 countries in Europe and connectivity indices mediating importation and air transport network centrality measures. Predictive performance for the classifiers was evaluated using the area under the receiving operating characteristic curve, sensitivity, and specificity measures. Finally, we applied practical model-agnostic methods, to provide an in-depth explanation of our optimal model’s predictions on a global and local scale. Our best performing model achieved high predictive accuracy, with an area under the receiver operating characteristic score of 0.94 and a maximized sensitivity score of 0.88. The predictor variables identified as most important were the source country’s dengue incidence rate, population size, and volume of air passengers. Network centrality measures, describing the positioning of European countries within the air travel network, were also influential to the predictions. We demonstrated the high predictive performance of a machine learning model in predicting dengue importation and the utility of the model-agnostic methods to offer a comprehensive understanding of the reasons behind the predictions. Similar approaches can be utilized in the development of an operational early warning surveillance system for dengue importation.


Author(s):  
Arti Saxena ◽  
Vijay Kumar

In the healthcare industry, sources look after different customers with diverse diseases and complications. Thus, at the source, a great amount of data in all aspects like status of the patients, behaviour of the diseases, etc. are collected, and now it becomes the job of the practitioner at source to use the available data for diagnosing the diseases accurately and then prescribe the relevant treatment. Machine learning techniques are useful to deal with large datasets, with an aim to produce meaningful information from the raw information for the purpose of decision making. The inharmonious behavior of the data is the motivation behind the development of new tools and demonstrates the available information to some meaningful information for decision making. As per the literature, healthcare of patients can be analyzed through machine learning tools, and henceforth, in the article, a Bayesian kernel method for medical decision-making problems has been discussed, which suits the purpose of researchers in the enhancement of their research in the domain of medical decision making.


Sign in / Sign up

Export Citation Format

Share Document