Prediction of Genotype Positivity in Patients with Hypertrophic Cardiomyopathy Using Machine Learning

Author(s):  
Lusha W. Liang ◽  
Michael A. Fifer ◽  
Kohei Hasegawa ◽  
Mathew S. Maurer ◽  
Muredach P. Reilly ◽  
...  

Background - Genetic testing can determine family screening strategies and has prognostic and diagnostic value in hypertrophic cardiomyopathy (HCM). However, it can also pose a significant psychosocial burden. Conventional scoring systems offer modest ability to predict genotype positivity. The aim of our study was to develop a novel prediction model for genotype positivity in patients with HCM by applying machine learning (ML) algorithms. Methods - We constructed three ML models using readily available clinical and cardiac imaging data of 102 patients from Columbia University with HCM who had undergone genetic testing (the training set). We validated model performance on 76 patients with HCM from Massachusetts General Hospital (the test set). Within the test set, we compared the area under the receiver operating characteristic curves (AUCs) for the ML models against the AUCs generated by the Toronto HCM Genotype Score ("the Toronto score") and Mayo HCM Genotype Predictor ("the Mayo score") using the Delong test and net reclassification improvement (NRI). Results - Overall, 63 of the 178 patients (35%) were genotype positive. The random forest ML model developed in the training set demonstrated an AUC of 0.92 (95% CI 0.85-0.99) in predicting genotype positivity in the test set, significantly outperforming the Toronto score (AUC 0.77, 95% CI 0.65-0.90, p=0.004, NRI: p<0.001) and the Mayo score (AUC 0.79, 95% CI 0.67-0.92, p=0.01, NRI: p=0.001). The gradient boosted decision tree ML model also achieved significant NRI over the Toronto score (p<0.001) and the Mayo score (p=0.03), with an AUC of 0.87 (95% CI 0.75-0.99). Compared to the Toronto and Mayo scores, all three ML models had higher sensitivity, positive predictive value, and negative predictive value. Conclusions - Our ML models demonstrated a superior ability to predict genotype positivity in patients with HCM compared to conventional scoring systems in an external validation test set.

Circulation ◽  
2019 ◽  
Vol 140 (11) ◽  
pp. 899-909 ◽  
Author(s):  
Martin P. Than ◽  
John W. Pickering ◽  
Yader Sandoval ◽  
Anoop S.V. Shah ◽  
Athanasios Tsanas ◽  
...  

Background: Variations in cardiac troponin concentrations by age, sex, and time between samples in patients with suspected myocardial infarction are not currently accounted for in diagnostic approaches. We aimed to combine these variables through machine learning to improve the assessment of risk for individual patients. Methods: A machine learning algorithm (myocardial-ischemic-injury-index [MI 3 ]) incorporating age, sex, and paired high-sensitivity cardiac troponin I concentrations, was trained on 3013 patients and tested on 7998 patients with suspected myocardial infarction. MI 3 uses gradient boosting to compute a value (0–100) reflecting an individual’s likelihood of a diagnosis of type 1 myocardial infarction and estimates the sensitivity, negative predictive value, specificity and positive predictive value for that individual. Assessment was by calibration and area under the receiver operating characteristic curve. Secondary analysis evaluated example MI 3 thresholds from the training set that identified patients as low risk (99% sensitivity) and high risk (75% positive predictive value), and performance at these thresholds was compared in the test set to the 99th percentile and European Society of Cardiology rule-out pathways. Results: Myocardial infarction occurred in 404 (13.4%) patients in the training set and 849 (10.6%) patients in the test set. MI 3 was well calibrated with a very high area under the receiver operating characteristic curve of 0.963 [0.956–0.971] in the test set and similar performance in early and late presenters. Example MI 3 thresholds identifying low- and high-risk patients in the training set were 1.6 and 49.7, respectively. In the test set, MI 3 values were <1.6 in 69.5% with a negative predictive value of 99.7% (99.5–99.8%) and sensitivity of 97.8% (96.7–98.7%), and were ≥49.7 in 10.6% with a positive predictive value of 71.8% (68.9–75.0%) and specificity of 96.7% (96.3–97.1%). Using these thresholds, MI 3 performed better than the European Society of Cardiology 0/3-hour pathway (sensitivity, 82.5% [74.5–88.8%]; specificity, 92.2% [90.7–93.5%]) and the 99th percentile at any time point (sensitivity, 89.6% [87.4–91.6%]); specificity, 89.3% [88.6–90.0%]). Conclusions: Using machine learning, MI 3 provides an individualized and objective assessment of the likelihood of myocardial infarction, which can be used to identify low- and high-risk patients who may benefit from earlier clinical decisions. Clinical Trial Registration: URL: https://www.anzctr.org.au . Unique identifier: ACTRN12616001441404.


Author(s):  
Giulia Lorenzoni ◽  
Nicolò Sella ◽  
Annalisa Boscolo ◽  
Danila Azzolina ◽  
Patrizia Bartolotta ◽  
...  

Abstract Background Since the beginning of coronavirus disease 2019 (COVID-19), the development of predictive models has sparked relevant interest due to the initial lack of knowledge about diagnosis, treatment, and prognosis. The present study aimed at developing a model, through a machine learning approach, to predict intensive care unit (ICU) mortality in COVID-19 patients based on predefined clinical parameters. Results Observational multicenter cohort study. All COVID-19 adult patients admitted to 25 ICUs belonging to the VENETO ICU network (February 28th 2020-april 4th 2021) were enrolled. Patients admitted to the ICUs before 4th March 2021 were used for model training (“training set”), while patients admitted after the 5th of March 2021 were used for external validation (“test set 1”). A further group of patients (“test set 2”), admitted to the ICU of IRCCS Ca’ Granda Ospedale Maggiore Policlinico of Milan, was used for external validation. A SuperLearner machine learning algorithm was applied for model development, and both internal and external validation was performed. Clinical variables available for the model were (i) age, gender, sequential organ failure assessment score, Charlson Comorbidity Index score (not adjusted for age), Palliative Performance Score; (ii) need of invasive mechanical ventilation, non-invasive mechanical ventilation, O2 therapy, vasoactive agents, extracorporeal membrane oxygenation, continuous venous-venous hemofiltration, tracheostomy, re-intubation, prone position during ICU stay; and (iii) re-admission in ICU. One thousand two hundred ninety-three (80%) patients were included in the “training set”, while 124 (8%) and 199 (12%) patients were included in the “test set 1” and “test set 2,” respectively. Three different predictive models were developed. Each model included different sets of clinical variables. The three models showed similar predictive performances, with a training balanced accuracy that ranged between 0.72 and 0.90, while the cross-validation performance ranged from 0.75 to 0.85. Age was the leading predictor for all the considered models. Conclusions Our study provides a useful and reliable tool, through a machine learning approach, for predicting ICU mortality in COVID-19 patients. In all the estimated models, age was the variable showing the most important impact on mortality.


2018 ◽  
Vol 7 (2.21) ◽  
pp. 339 ◽  
Author(s):  
K Ulaga Priya ◽  
S Pushpa ◽  
K Kalaivani ◽  
A Sartiha

In Banking Industry loan Processing is a tedious task in identifying the default customers. Manual prediction of default customers might turn into a bad loan in future. Banks possess huge volume of behavioral data from which they are unable to make a judgement about prediction of loan defaulters. Modern techniques like Machine Learning will help to do analytical processing using Supervised Learning and Unsupervised Learning Technique. A data model for predicting default customers using Random forest Technique has been proposed. Data model Evaluation is done on training set and based on the performance parameters final prediction is done on the Test set. This is an evident that Random Forest technique will help the bank to predict the loan Defaulters with utmost accuracy.  


Information ◽  
2020 ◽  
Vol 11 (6) ◽  
pp. 332
Author(s):  
Ernest Kwame Ampomah ◽  
Zhiguang Qin ◽  
Gabriel Nyame

Forecasting the direction and trend of stock price is an important task which helps investors to make prudent financial decisions in the stock market. Investment in the stock market has a big risk associated with it. Minimizing prediction error reduces the investment risk. Machine learning (ML) models typically perform better than statistical and econometric models. Also, ensemble ML models have been shown in the literature to be able to produce superior performance than single ML models. In this work, we compare the effectiveness of tree-based ensemble ML models (Random Forest (RF), XGBoost Classifier (XG), Bagging Classifier (BC), AdaBoost Classifier (Ada), Extra Trees Classifier (ET), and Voting Classifier (VC)) in forecasting the direction of stock price movement. Eight different stock data from three stock exchanges (NYSE, NASDAQ, and NSE) are randomly collected and used for the study. Each data set is split into training and test set. Ten-fold cross validation accuracy is used to evaluate the ML models on the training set. In addition, the ML models are evaluated on the test set using accuracy, precision, recall, F1-score, specificity, and area under receiver operating characteristics curve (AUC-ROC). Kendall W test of concordance is used to rank the performance of the tree-based ML algorithms. For the training set, the AdaBoost model performed better than the rest of the models. For the test set, accuracy, precision, F1-score, and AUC metrics generated results significant to rank the models, and the Extra Trees classifier outperformed the other models in all the rankings.


Molecules ◽  
2019 ◽  
Vol 24 (10) ◽  
pp. 2006 ◽  
Author(s):  
Liadys Mora Lagares ◽  
Nikola Minovski ◽  
Marjana Novič

P-glycoprotein (P-gp) is a transmembrane protein that actively transports a wide variety of chemically diverse compounds out of the cell. It is highly associated with the ADMET (absorption, distribution, metabolism, excretion and toxicity) properties of drugs/drug candidates and contributes to decreasing toxicity by eliminating compounds from cells, thereby preventing intracellular accumulation. Therefore, in the drug discovery and toxicological assessment process it is advisable to pay attention to whether a compound under development could be transported by P-gp or not. In this study, an in silico multiclass classification model capable of predicting the probability of a compound to interact with P-gp was developed using a counter-propagation artificial neural network (CP ANN) based on a set of 2D molecular descriptors, as well as an extensive dataset of 2512 compounds (1178 P-gp inhibitors, 477 P-gp substrates and 857 P-gp non-active compounds). The model provided a good classification performance, producing non error rate (NER) values of 0.93 for the training set and 0.85 for the test set, while the average precision (AvPr) was 0.93 for the training set and 0.87 for the test set. An external validation set of 385 compounds was used to challenge the model’s performance. On the external validation set the NER and AvPr values were 0.70 for both indices. We believe that this in silico classifier could be effectively used as a reliable virtual screening tool for identifying potential P-gp ligands.


2018 ◽  
Vol 36 (30_suppl) ◽  
pp. 314-314 ◽  
Author(s):  
Robert Michael Daly ◽  
Dmitriy Gorenshteyn ◽  
Lior Gazit ◽  
Stefania Sokolowski ◽  
Kevin Nicholas ◽  
...  

314 Background: Acute care accounts for half of cancer expenditures and is a measure of poor quality care. Identifying patients at high risk for ED visits enables institutions to target symptom management resources to those most likely to benefit. Risk stratification models developed to date have not been meaningfully employed in oncology, and there is a need for clinically relevant models to improve patient care. Methods: We established a predictive analytics framework for clinical use with attention to the modeling technique, clinician feedback, and application metrics. The model employs EHR data from initial visit to first antineoplastic administration for new patients at our institution from January 2014 to June 2017. The binary dependent variable is occurrence of an ED visit within the first 6 months of treatment. From over 1,400 data features, the model was refined to include 400 clinically relevant ones spanning demographics, pathology, clinician notes, labs, medications, and psychosocial information. Clinician review was performed to confirm EHR data input validity. The final regularized multivariate logistic regression model was chosen based on clinical and statistical significance. Parameter selection and model evaluation utilized the positive predictive value for the top 25% of observations ranked by model-determined risk. The final model was evaluated using a test set containing 20% of randomly held out data. The model was calibrated based on a 5-fold cross-validation scheme over the training set. Results: There are 5,752 antineoplastic starts in our training set, and 1,457 in our test set. The positive predictive value of this model for the top 25% riskiest new start antineoplastic patients is 0.53. The 400 clinically relevant features draw from multiple areas in the EHR. For example, those features found to increase risk include: combination chemotherapy, low albumin, social work needs, and opioid use, whereas those found to decrease risk include stage 1 disease, never smoker status, and oral antineoplastic therapy. Conclusions: We have constructed a framework to build a clinically relevant model. We are now piloting it to identify those likely to benefit from a home-based, digital symptom management intervention.


2019 ◽  
Author(s):  
Maxime Thibault ◽  
Denis Lebel

AbstractThe objective of this study was to determine if it is feasible to use machine learning to evaluate how a medication order is contextually appropriate for a patient, in order to assist order review by pharmacists. A neural network was constructed using as input the sequence of word2vec embeddings of the 30 previous orders, as well as the currently active medications, pharmacological classes and ordering department, to predict the next order. The model was trained with data from 2013 to 2017, optimized using 5-fold cross-validation, and tested on orders from 2018. A survey was developed to obtain pharmacist ratings on a sample of 20 orders, which were compared with predictions. The training set included 1 022 272 orders. The test set included 95 310 orders. Baseline training set top 1, top 10 and top 30 accuracy using a dummy classifier were respectively 4.5%, 23.6% and 44.1%. Final test set accuracies were, respectively, 44.4%, 69.9% and 80.4%. Populations in which the model performed the best were obstetrics and gynecology patients and newborn babies (either in or out of neonatal intensive care). Pharmacists agreed poorly on their ratings of sampled orders with a Fleiss kappa of 0.283. The breakdown of metrics by population showed better performance in patients following less variable order patterns, indicating potential usefulness in triaging routine orders to less extensive pharmacist review. We conclude that machine learning has potential for helping pharmacists review medication orders. Future studies should aim at evaluating the clinical benefits of using such a model in practice.


2022 ◽  
Vol 8 ◽  
Author(s):  
Jinzhang Li ◽  
Ming Gong ◽  
Yashutosh Joshi ◽  
Lizhong Sun ◽  
Lianjun Huang ◽  
...  

BackgroundAcute renal failure (ARF) is the most common major complication following cardiac surgery for acute aortic syndrome (AAS) and worsens the postoperative prognosis. Our aim was to establish a machine learning prediction model for ARF occurrence in AAS patients.MethodsWe included AAS patient data from nine medical centers (n = 1,637) and analyzed the incidence of ARF and the risk factors for postoperative ARF. We used data from six medical centers to compare the performance of four machine learning models and performed internal validation to identify AAS patients who developed postoperative ARF. The area under the curve (AUC) of the receiver operating characteristic (ROC) curve was used to compare the performance of the predictive models. We compared the performance of the optimal machine learning prediction model with that of traditional prediction models. Data from three medical centers were used for external validation.ResultsThe eXtreme Gradient Boosting (XGBoost) algorithm performed best in the internal validation process (AUC = 0.82), which was better than both the logistic regression (LR) prediction model (AUC = 0.77, p &lt; 0.001) and the traditional scoring systems. Upon external validation, the XGBoost prediction model (AUC =0.81) also performed better than both the LR prediction model (AUC = 0.75, p = 0.03) and the traditional scoring systems. We created an online application based on the XGBoost prediction model.ConclusionsWe have developed a machine learning model that has better predictive performance than traditional LR prediction models as well as other existing risk scoring systems for postoperative ARF. This model can be utilized to provide early warnings when high-risk patients are found, enabling clinicians to take prompt measures.


Author(s):  
Mohd Riyaz Lattoo ◽  
Shabir Ahmad Mir ◽  
Nayeemul Hassan Ganie ◽  
Shabir Hussain Rather

Background: Acute appendicitis is one of the most common cause of acute abdomen surgery. Several scoring systems have been adopted by physicians to aid in the diagnosis and decrease the negative appendicectomy rate. Tzanakis scoring system is one such score. Objective of present study was the validation of this scoring system in our population and compare its accuracy with histopathological examination (HPE).Methods: A retrospective study was carried out at the Department of Surgery at Mohammad Afzal Beigh Memorial Hospital Anantnag India. Tzanakis score was calculated in 288 patients who underwent appendicectomy from September 2016-2018 and HPE results were analysed.Results: 276 patients were eligible for the study. The sensitivity and specificity of Tzanakis score in diagnosing appendicitis was 90.66% and 73.68% respectively. The overall diagnostic accuracy was 86.23% with positive predictive value of 97.89% and negative predictive value of 36.84%.Conclusions: Tzanakis scoring system is an accurate modality in establishing the diagnosis of acute appendicitis and preventing a negative laparotomy.


2021 ◽  
Vol 11 ◽  
Author(s):  
Yinghao Meng ◽  
Hao Zhang ◽  
Qi Li ◽  
Fang Liu ◽  
Xu Fang ◽  
...  

PurposeTo develop and validate a machine learning classifier based on multidetector computed tomography (MDCT), for the preoperative prediction of tumor–stroma ratio (TSR) expression in patients with pancreatic ductal adenocarcinoma (PDAC).Materials and MethodsIn this retrospective study, 227 patients with PDAC underwent an MDCT scan and surgical resection. We quantified the TSR by using hematoxylin and eosin staining and extracted 1409 arterial and portal venous phase radiomics features for each patient, respectively. Moreover, we used the least absolute shrinkage and selection operator logistic regression algorithm to reduce the features. The extreme gradient boosting (XGBoost) was developed using a training set consisting of 167 consecutive patients, admitted between December 2016 and December 2017. The model was validated in 60 consecutive patients, admitted between January 2018 and April 2018. We determined the XGBoost classifier performance based on its discriminative ability, calibration, and clinical utility.ResultsWe observed low and high TSR in 91 (40.09%) and 136 (59.91%) patients, respectively. A log-rank test revealed significantly longer survival for patients in the TSR-low group than those in the TSR-high group. The prediction model revealed good discrimination in the training (area under the curve [AUC]= 0.93) and moderate discrimination in the validation set (AUC= 0.63). While the sensitivity, specificity, accuracy, positive predictive value, and negative predictive value for the training set were 94.06%, 81.82%, 0.89, 0.89, and 0.90, respectively, those for the validation set were 85.71%, 48.00%, 0.70, 0.70, and 0.71, respectively.ConclusionsThe CT radiomics-based XGBoost classifier provides a potentially valuable noninvasive tool to predict TSR in patients with PDAC and optimize risk stratification.


Sign in / Sign up

Export Citation Format

Share Document