scholarly journals Machine Learning and Statistical Approaches for Classification of Risk of Coronary Artery Disease using Plasma Cytokines.

2020 ◽  
Author(s):  
Seema Singh Saharan ◽  
Pankaj Nagar ◽  
Kate Townsend Creasy ◽  
Eveline O. Stock ◽  
James Feng ◽  
...  

Abstract BackgroundAs per the 2017 WHO fact sheet, Coronary Artery Disease (CAD) is the primary cause of death in the world, and accounts for 31% of total fatalities. The unprecedented 17.6 million deaths caused by CAD in 2016 underscores the urgent need to facilitate proactive and accelerated pre-emptive diagnosis. The innovative and emerging Machine Learning (ML) techniques can be leveraged to facilitate early detection of CAD which is a crucial factor in saving lives. The standard techniques like angiography, that provide reliable evidence are invasive and typically very expensive and risky. In contrast, ML model generated diagnosis is non-invasive, fast, accurate and affordable. Therefore, it can be used as a supplement or precursor to the conventional methods. This research demonstrates the implementation of K Nearest Neighbor (k-NN) and Random Forest ML algorithms to achieve a targeted “At Risk” CAD classification using an emerging set of 35 cytokine biomarkers that are strongly indicative predictive variables that can be potential targets for therapy. To ensure better generalizability, mechanisms such as data balancing, k-fold cross validation for hyperparameter tuning, feature selection via feature importance identification were integrated within the models.ResultsA total of 5 classifiers were developed, with two built using 35 cytokine predictive features and three built using a subset of cytokines, selected by variable importance techniques namely Random Forest, ReliefF and Boruta. The best Area under Receiver Operating Characteristic (AUROC) based accuracy of .99 was achieved by the Random Forest classifier with 35 cytokine biomarkers. The second-best AUROC accuracy was achieved by the k-NN model using cytokines selected by the Random Forest variable importance selection mechanism.ConclusionsPresently, as large-scale efforts are gaining momentum to enable early, fast, reliable, affordable, and accessible detection of individuals at risk for CAD, the application of powerful ML algorithms can be leveraged as a supplement to the conventional treatments such as angiography. The early detection can be further improved by incorporating 65 novel and sensitive cytokines biomarkers. Investigation of the emerging role of cytokines in CAD can materially enhance the detection of risk and the discovery of mechanisms of disease that can lead to new therapeutic modalities.

2021 ◽  
Vol 14 (1) ◽  
Author(s):  
Seema Singh Saharan ◽  
Pankaj Nagar ◽  
Kate Townsend Creasy ◽  
Eveline O. Stock ◽  
James Feng ◽  
...  

Abstract Background As per the 2017 WHO fact sheet, Coronary Artery Disease (CAD) is the primary cause of death in the world, and accounts for 31% of total fatalities. The unprecedented 17.6 million deaths caused by CAD in 2016 underscores the urgent need to facilitate proactive and accelerated pre-emptive diagnosis. The innovative and emerging Machine Learning (ML) techniques can be leveraged to facilitate early detection of CAD which is a crucial factor in saving lives. The standard techniques like angiography, that provide reliable evidence are invasive and typically expensive and risky. In contrast, ML model generated diagnosis is non-invasive, fast, accurate and affordable. Therefore, ML algorithms can be used as a supplement or precursor to the conventional methods. This research demonstrates the implementation and comparative analysis of K Nearest Neighbor (k-NN) and Random Forest ML algorithms to achieve a targeted “At Risk” CAD classification using an emerging set of 35 cytokine biomarkers that are strongly indicative predictive variables that can be potential targets for therapy. To ensure better generalizability, mechanisms such as data balancing, repeated k-fold cross validation for hyperparameter tuning, were integrated within the models. To determine the separability efficacy of “At Risk” CAD versus Control achieved by the models, Area under Receiver Operating Characteristic (AUROC) metric is used which discriminates the classes by exhibiting tradeoff between the false positive and true positive rates. Results A total of 2 classifiers were developed, both built using 35 cytokine predictive features. The best AUROC score of .99 with a 95% Confidence Interval (CI) (.982,.999) was achieved by the Random Forest classifier using 35 cytokine biomarkers. The second-best AUROC score of .954 with a 95% Confidence Interval (.929,.979) was achieved by the k-NN model using 35 cytokines. A p-value of less than 7.481e-10 obtained by an independent t-test validated that Random Forest classifier was significantly better than the k-NN classifier with regards to the AUROC score. Presently, as large-scale efforts are gaining momentum to enable early, fast, reliable, affordable, and accessible detection of individuals at risk for CAD, the application of powerful ML algorithms can be leveraged as a supplement to conventional methods such as angiography. Early detection can be further improved by incorporating 65 novel and sensitive cytokine biomarkers. Investigation of the emerging role of cytokines in CAD can materially enhance the detection of risk and the discovery of mechanisms of disease that can lead to new therapeutic modalities.


2020 ◽  
Author(s):  
Seema Singh Saharan ◽  
Pankaj Nagar ◽  
Kate Townsend Creasy ◽  
Eveline O. Stock ◽  
James Feng ◽  
...  

Abstract Background As per the 2017 WHO fact sheet, Coronary Artery Disease (CAD) is the primary cause of death in the world, and accounts for 31% of total fatalities. The unprecedented 17.6 million deaths caused by CAD in 2016 underscores the urgent need to facilitate proactive and accelerated pre-emptive diagnosis. The current research took an innovative approach to implement K Nearest Neighbor (k-NN) and ensemble Random Forest Machine Learning algorithms to achieve a targeted “At Risk” Coronary Artery Disease (CAD) classification. To ensure better generalizability mechanisms like k-fold cross validation, hyperparameter tuning and statistical significance (p<.05) were employed. The classification is also unique from the aspect of incorporating 35 cytokines as biomarkers within the predictive feature space of Machine Learning algorithms.Results A total of seven classifiers were developed, with four built using 35 cytokine predictive features and three built using 9 cytokines statistically significant (p<.05) across CAD versus Control groups determined by independent two sample t tests. The best prediction accuracy of 100% was achieved by Random Forest ensemble using nine significant cytokines. Significant cytokines were selected to decrease the noise level of the data, allowing for better classification. Additionally, from the bio-medical perspective, it was enlightening to empirically observe the interplay of the cytokines. Compared to Controls, moderately correlated (correlation coefficient r=.5) cytokines “IL1-β”, “IL-10” were both significant and down regulated in the CAD group. Both cytokines were primarily responsible for the Random forest generated 100% classification. In conjunction with Machine Learning (ML) algorithms, the traditional statistical techniques like correlation and t tests were leveraged to obtain insights that brought forth a role for cytokines in the investigation of CAD risk.Conclusions Presently, as large-scale efforts are gaining momentum to enable early detection of individuals at risk for CAD by the application of novel and powerful ML algorithms, detection can be further improved by incorporating additional biomarkers. Investigation of emerging role of cytokines in CAD can materially enhance the detection of risk and the discovery of mechanisms of disease that can lead to new therapeutic approaches.


2021 ◽  
Author(s):  
Seema Singh Saharan ◽  
Pankaj Nagar ◽  
Kate Townsend Creasy ◽  
Eveline O. Stock ◽  
James Feng ◽  
...  

Abstract Background As per the 2017 WHO fact sheet, Coronary Artery Disease (CAD) is the primary cause of death in the world, and accounts for 31% of total fatalities. The unprecedented 17.6 million deaths caused by CAD in 2016 underscores the urgent need to facilitate proactive and accelerated pre-emptive diagnosis. The innovative and emerging Machine Learning (ML) techniques can be leveraged to facilitate early detection of CAD which is a crucial factor in saving lives. The standard techniques like angiography, that provide reliable evidence are invasive and typically expensive and risky. In contrast, ML model generated diagnosis is non-invasive, fast, accurate and affordable. Therefore, ML algorithms can be used as a supplement or precursor to the conventional methods. This research demonstrates the implementation and comparative analysis of K Nearest Neighbor (k-NN) and Random Forest ML algorithms to achieve a targeted “At Risk” CAD classification using an emerging set of 35 cytokine biomarkers that are strongly indicative predictive variables that can be potential targets for therapy. To ensure better generalizability, mechanisms such as data balancing, repeated k-fold cross validation for hyperparameter tuning, were integrated within the models. To determine the separability efficacy of “At Risk” CAD versus Control achieved by the models, Area under Receiver Operating Characteristic (AUROC) metric is used which discriminates the classes by exhibiting tradeoff between the false positive and true positive rates.Results A total of 2 classifiers were developed, both built using 35 cytokine predictive features. The best AUROC score of .99 with a 95% Confidence Interval(CI) (.982,.999) was achieved by the Random Forest classifier using 35 cytokine biomarkers. The second-best AUROC score of .954 with a 95% Confidence Interval (.929,.979) was achieved by the k-NN model using 35 cytokines. A p-value of less than 7.481e-10 obtained by an independent t-test validated that Random Forest classifier was significantly better than the k-NN classifier with regards to the AUROC score.Presently, as large-scale efforts are gaining momentum to enable early, fast, reliable, affordable, and accessible detection of individuals at risk for CAD, the application of powerful ML algorithms can be leveraged as a supplement to conventional methods such as angiography. Early detection can be further improved by incorporating 65 novel and sensitive cytokine biomarkers. Investigation of the emerging role of cytokines in CAD can materially enhance the detection of risk and the discovery of mechanisms of disease that can lead to new therapeutic modalities.


2016 ◽  
Vol 16 (01) ◽  
pp. 1640010 ◽  
Author(s):  
YING-TSANG LO ◽  
HAMIDO FUJITA ◽  
TUN-WEN PAI

Background: Coronary artery disease (CAD) is one of the most representative cardiovascular diseases. Early and accurate prediction of CAD based on physiological measurements can reduce the risk of heart attack through medicine therapy, healthy diet, and regular physical activity. Methods:Four heart disease datasets from the UC Irvine Machine Learning Repository were combined and re-examined to remove incomplete entries, and a total of 822 cases were utilized in this study. Seven machine learning methods, including Naïve Bayes, artificial neural networks (ANNs), sequential minimal optimization (SMO), k-nearest neighbor (KNN), AdaBoost, J48, and random forest, were adopted to analyze the collected datasets for CAD prediction. By combining co-expressed observations and an ensemble voting mechanism, we designed and evaluated a new medical decision classifier for CAD prediction. The TOPSIS (Technique for Order Preference by Similarity to an Ideal Solution) algorithm was applied to determine the best prediction method for CAD diagnosis. Results: Features of systolic blood pressure, cholesterol, heart rate, and ST depression are considered to be the most significant differences between patients with and without CADs. We show that the prediction capability of seven machine learning classifiers can be enhanced by integrating combinations of observed co-expressed features. Finally, compared to the use of any single classifier, the proposed voting mechanism achieved optimal performance according to TOPSIS.


2021 ◽  
Vol 8 ◽  
Author(s):  
Chen Wang ◽  
Yue Zhao ◽  
Bingyu Jin ◽  
Xuedong Gan ◽  
Bin Liang ◽  
...  

Early identification of coronary artery disease (CAD) can prevent the progress of CAD and effectually lower the mortality rate, so we intended to construct and validate a machine learning model to predict the risk of CAD based on conventional risk factors and lab test data. There were 3,112 CAD patients and 3,182 controls enrolled from three centers in China. We compared the baseline and clinical characteristics between two groups. Then, Random Forest algorithm was used to construct a model to predict CAD and the model was assessed by receiver operating characteristic (ROC) curve. In the development cohort, the Random Forest model showed a good AUC 0.948 (95%CI: 0.941–0.954) to identify CAD patients from controls, with a sensitivity of 90%, a specificity of 85.4%, a positive predictive value of 0.863 and a negative predictive value of 0.894. Validation of the model also yielded a favorable discriminatory ability with the AUC, sensitivity, specificity, positive predictive value, and negative predictive value of 0.944 (95%CI: 0.934–0.955), 89.5%, 85.8%, 0.868, and 0.886 in the validation cohort 1, respectively, and 0.940 (95%CI: 0.922–0.960), 79.5%, 94.3%, 0.932, and 0.823 in the validation cohort 2, respectively. An easy-to-use tool that combined 15 indexes to assess the CAD risk was constructed and validated using Random Forest algorithm, which showed favorable predictive capability (http://45.32.120.149:3000/randomforest). Our model is extremely valuable for clinical practice, which will be helpful for the management and primary prevention of CAD patients.


Diagnostics ◽  
2021 ◽  
Vol 11 (6) ◽  
pp. 961
Author(s):  
Yu-Cheng Hsu ◽  
I-Jung Tsai ◽  
Hung Hsu ◽  
Po-Wen Hsu ◽  
Ming-Hui Cheng ◽  
...  

Machine learning (ML) algorithms have been applied to predicting coronary artery disease (CAD). Our purpose was to utilize autoantibody isotypes against four different unmodified and malondialdehyde (MDA)-modified peptides among Taiwanese with CAD and healthy controls (HCs) for CAD prediction. In this study, levels of MDA, MDA-modified protein (MDA-protein) adducts, and autoantibody isotypes against unmodified peptides and MDA-modified peptides were measured with enzyme-linked immunosorbent assay (ELISA). To improve the performance of ML, we used decision tree (DT), random forest (RF), and support vector machine (SVM) coupled with five-fold cross validation and parameters optimization. Levels of plasma MDA and MDA-protein adducts were higher in CAD patients than in HCs. IgM anti-IGKC76–99 MDA and IgM anti-A1AT284–298 MDA decreased the most in patients with CAD compared to HCs. In the experimental results of CAD prediction, the decision tree classifier achieved an area under the curve (AUC) of 0.81; the random forest classifier achieved an AUC of 0.94; the support vector machine achieved an AUC of 0.65 for differentiating between CAD patients with stenosis rates of 70% and HCs. In this study, we demonstrated that autoantibody isotypes imported into machine learning algorithms can lead to accurate models for clinical use.


2020 ◽  
Vol 15 ◽  
Author(s):  
Elham Shamsara ◽  
Sara Saffar Soflaei ◽  
Mohammad Tajfard ◽  
Ivan Yamshchikov ◽  
Habibollah Esmaili ◽  
...  

Background: Coronary artery disease (CAD) is an important cause of mortality and morbidity globally. Objective : The early prediction of the CAD would be valuable in identifying individuals at risk, and in focusing resources on its prevention. In this paper, we aimed to establish a diagnostic model to predict CAD by using three approaches of ANN (pattern recognition-ANN, LVQ-ANN, and competitive ANN). Methods: One promising method for early prediction of disease based on risk factors is machine learning. Among different machine learning algorithms, the artificial neural network (ANN) algo-rithms have been applied widely in medicine and a variety of real-world classifications. ANN is a non-linear computational model, that is inspired by the human brain to analyze and process complex datasets. Results: Different methods of ANN that are investigated in this paper indicates in both pattern recognition ANN and LVQ-ANN methods, the predictions of Angiography+ class have high accuracy. Moreover, in CNN the correlations between the individuals in cluster ”c” with the class of Angiography+ is strongly high. This accuracy indicates the significant difference among some of the input features in Angiography+ class and the other two output classes. A comparison among the chosen weights in these three methods in separating control class and Angiography+ shows that hs-CRP, FSG, and WBC are the most substantial excitatory weights in recognizing the Angiography+ individuals although, HDL-C and MCH are determined as inhibitory weights. Furthermore, the effect of decomposition of a multi-class problem to a set of binary classes and random sampling on the accuracy of the diagnostic model is investigated. Conclusion : This study confirms that pattern recognition-ANN had the most accuracy of performance among different methods of ANN. That’s due to the back-propagation procedure of the process in which the network classify input variables based on labeled classes. The results of binarization show that decomposition of the multi-class set to binary sets could achieve higher accuracy.


Sign in / Sign up

Export Citation Format

Share Document