scholarly journals Analysis of Potential Genetic Biomarkers Using Machine Learning Methods and Immune Infiltration Regulatory Mechanisms Underlying Atrial Fibrillation Running Title: Identification of Biomarkers for Af via Machine Learning

Author(s):  
Li-Da Wu ◽  
Feng Li ◽  
Jia-Yi Chen ◽  
Jie Zhang ◽  
Ling-Ling Qian ◽  
...  

Abstract Objective: We aimed to screen out biomarkers for atrial fibrillation (AF) based on machine learning methods and evaluate the degree of immune infiltration in AF patients in detail.Methods: Two datasets (GSE41177 and GSE79768) related to AF in GEO database were included. Differentially expressed genes (DEGs) were screened out using “limma” package. Candidate biomarkers for AF were identified using machine learning methods of the LASSO regression algorithm and SVM-RFE algorithm. Receiver operating characteristic (ROC) curve was employed to assess the diagnostic effectiveness of biomarkers, which was further validated in the GSE14795 dataset. Moreover, we used CIBERSORT to study the proportion of infiltrating immune cells in each sample, and the Spearman method was used to explore the correlation between biomarkers and immune cells.Results: 129 DEGs were identified, and CYBB, CXCR2, and S100A4 were identified as key biomarkers of AF using LASSO regression and SVM-RFE algorithm, and the diagnostic value was further validated in GSE14795. Immune infiltration analysis indicated that, compared with sinus rhythm (SR), the atrial samples of patients with AF contained a higher T cells gamma delta, neutrophils and mast cells resting, whereas T cells follicular helper were relatively lower. Correlation analysis demonstrated that CYBB, CXCR2, and S100A4 were significantly correlated with the infiltrating immune cells.Conclusions: In conclusion, this study suggested that CYBB, CXCR2, and S100A4 are key biomarkers correlated with infiltrating immune cells in AF, and infiltrating immune cells play pivotal roles in AF.

2019 ◽  
Vol 109 (2) ◽  
pp. 251-277 ◽  
Author(s):  
Nastasiya F. Grinberg ◽  
Oghenejokpeme I. Orhobor ◽  
Ross D. King

Abstract In phenotype prediction the physical characteristics of an organism are predicted from knowledge of its genotype and environment. Such studies, often called genome-wide association studies, are of the highest societal importance, as they are of central importance to medicine, crop-breeding, etc. We investigated three phenotype prediction problems: one simple and clean (yeast), and the other two complex and real-world (rice and wheat). We compared standard machine learning methods; elastic net, ridge regression, lasso regression, random forest, gradient boosting machines (GBM), and support vector machines (SVM), with two state-of-the-art classical statistical genetics methods; genomic BLUP and a two-step sequential method based on linear regression. Additionally, using the clean yeast data, we investigated how performance varied with the complexity of the biological mechanism, the amount of observational noise, the number of examples, the amount of missing data, and the use of different data representations. We found that for almost all the phenotypes considered, standard machine learning methods outperformed the methods from classical statistical genetics. On the yeast problem, the most successful method was GBM, followed by lasso regression, and the two statistical genetics methods; with greater mechanistic complexity GBM was best, while in simpler cases lasso was superior. In the wheat and rice studies the best two methods were SVM and BLUP. The most robust method in the presence of noise, missing data, etc. was random forests. The classical statistical genetics method of genomic BLUP was found to perform well on problems where there was population structure. This suggests that standard machine learning methods need to be refined to include population structure information when this is present. We conclude that the application of machine learning methods to phenotype prediction problems holds great promise, but that determining which methods is likely to perform well on any given problem is elusive and non-trivial.


2017 ◽  
Author(s):  
Nastasiya F. Grinberg ◽  
Oghenejokpeme I. Orhobor ◽  
Ross D. King

AbstractIn phenotype prediction, the physical characteristics of an organism are predicted from knowledge of its genotype and environment. Such studies are of the highest societal importance, as they are of central importance to medicine, crop-breeding, etc. We investigated three phenotype prediction problems: one simple and clean (yeast), and the other two complex and real-world (rice and wheat). We compared standard machine learning methods (elastic net, ridge regression, lasso regression, random forest, gradient boosting machines (GBM), and support vector machines (SVM)), with two state-of-the-art classical statistical genetics methods (including genomic BLUP). Additionally, using the clean yeast data, we investigated how performance varied with the complexity of the biological mechanism, the amount of observational noise, the number of examples, the amount of missing data, and the use of different data representations. We found that for almost all phenotypes considered standard machine learning methods outperformed the methods from classical statistical genetics. On the yeast problem, the most successful method was GBM, followed by lasso regression, and the two statistical genetics methods; with greater mechanistic complexity GBM was best, while in simpler cases lasso was superior. When applied to the wheat and rice studies the best two methods were SVM and BLUP. The most robust method in the presence of noise, missing data, etc. was random forests. The classical statistical genetics method of genomic BLUP was found to perform well on problems where there was population structure, which suggests one way to improve standard machine learning methods when population structure is present. We conclude that the application of machine learning methods to phenotype prediction problems holds great promise.


Author(s):  
Guanghan Li ◽  
Jian Liu ◽  
Jingping Wu ◽  
Yan Tian ◽  
Liyong Ma ◽  
...  

Background: The incidence rate of renal disease is high which can cause end-stage renal disease. Ultrasound is a commonly used imaging method, including conventional ultrasound, color ultrasound, elastography etc. Machine learning is a potential method which has been widely used in clinical. Objective: To compare the diagnostic performance of different ultrasonic image measurement parameters for kidney diseases, and to compare different machine learning methods with human-reading method. Methods: 94 patients with pathologically diagnosed renal diseases and 109 normal controls were included in this study. The patients were examined by conventional ultrasound, color ultrasound and shear wave elasticity respectively. Ultrasonic data were analyzed by Support vector machine (SVM), random forest(RF), K-nearest neighbor (KNN) and artificial neural network (ANN), respectively, and compared with the human-reading method. Results: Only ultrasound elastography data have diagnostic value for renal diseases. The accuracy of SVM, RF, KNN and ANN methods are 80.98%,80.32%,78.03%and79.67% respectively, while the accuracy of human-reading is 78.33%. In the data of machine learning ultrasound elastography, the elastic hardness parameters of renal cortex are most important. Conclusion: Ultrasound elastography is of highest diagnostic value in machine learning for nephropathy,the diagnostic efficiency of machine learning method is slightly higher than that of human-reading method, and the diagnostic ability of SVM method is higher than other methods.


2020 ◽  
Vol 10 (3) ◽  
pp. 82
Author(s):  
Man Hung ◽  
Evelyn Lauren ◽  
Eric Hon ◽  
Julie Xu ◽  
Bianca Ruiz-Negrón ◽  
...  

Atrial fibrillation (AF) cases are expected to increase over the next several decades, due to the rise in the elderly population. One promising treatment option for AF is catheter ablation, which is increasing in use. We investigated the hospital readmissions data for AF patients undergoing catheter ablation, and used machine learning models to explore the risk factors behind these readmissions. We analyzed data from the 2013 Nationwide Readmissions Database on cases with AF, and determined the relative importance of factors in predicting 30-day readmissions for AF with catheter ablation. Various machine learning methods, such as k-nearest neighbors, decision tree, and support vector machine were utilized to develop predictive models with their accuracy, precision, sensitivity, specificity, and area under the curve computed and compared. We found that the most important variables in predicting 30-day hospital readmissions in patients with AF undergoing catheter ablation were the age of the patient, the total number of discharges from a hospital, and the number of diagnoses on the patient’s record, among others. Out of the methods used, k-nearest neighbor had the highest prediction accuracy of 85%, closely followed by decision tree, while support vector machine was less desirable for these data. Hospital readmissions for AF with catheter ablation can be predicted with relatively high accuracy, utilizing machine learning methods. As patient age, the total number of hospital discharges, and the total number of patient diagnoses increase, the risk of hospital readmissions increases.


2020 ◽  
Vol 7 ◽  
pp. 233339282096188
Author(s):  
Man Hung ◽  
Eric S. Hon ◽  
Evelyn Lauren ◽  
Julie Xu ◽  
Gary Judd ◽  
...  

Background: Atrial fibrillation (AF) in the elderly population is projected to increase over the next several decades. Catheter ablation shows promise as a treatment option and is becoming increasingly available. We examined 90-day hospital readmission for AF patients undergoing catheter ablation and utilized machine learning methods to explore the risk factors associated with these readmission trends. Methods: Data from the 2013 Nationwide Readmissions Database on AF cases were used to predict 90-day readmissions for AF with catheter ablation. Multiple machine learning methods such as k-Nearest Neighbors, Decision Tree, and Support Vector Machine were employed to determine variable importance and build risk prediction models. Accuracy, precision, sensitivity, specificity, and area under the curve were compared for each model. Results: The 90-day hospital readmission rate was 17.6%; the average age of the patients was 64.9 years; 62.9% of patients were male. Important variables in predicting 90-day hospital readmissions in patients with AF undergoing catheter ablation included the age of the patient, number of diagnoses on the patient’s record, and the total number of discharges from a hospital. The k-Nearest Neighbor had the best performance with a prediction accuracy of 85%. This was closely followed by Decision Tree, but Support Vector Machine was less ideal. Conclusions: Machine learning methods can produce accurate models in predicting hospital readmissions for patients with AF. The likelihood of readmission to the hospital increases as the patient age, total number of hospital discharges, and total number of patient diagnoses increase. Findings from this study can inform quality improvement in healthcare and in achieving patient-centered care.


Sign in / Sign up

Export Citation Format

Share Document