Improving the prediction of cardiovascular risk with machine-learning and DNA methylation data

Author(s):  
Giovanni Cugliari ◽  
Silvia Benevenuta ◽  
Simonetta Guarrera ◽  
Carlotta Sacerdote ◽  
Salvatore Panico ◽  
...  
2021 ◽  
Vol 8 ◽  
Author(s):  
Ayşegül Kutlay ◽  
Yeşim Aydin Son

Introduction: Despite the significant progress in understanding cancer biology, the deduction of metastasis is still a challenge in the clinic. Transcriptional regulation is one of the critical mechanisms underlying cancer development. Even though mRNA, microRNA, and DNA methylation mechanisms have a crucial impact on the metastatic outcome, there are no comprehensive data mining models that combine all transcriptional regulation aspects for metastasis prediction. This study focused on identifying the regulatory impact of genetic biomarkers for monitoring metastatic molecular signatures of melanoma by investigating the consolidated effect of miRNA, mRNA, and DNA methylation.Method: We developed multiple machine learning models to distinguish the metastasis by integrating miRNA, mRNA, and DNA methylation markers. We used the TCGA melanoma dataset to differentiate between metastatic melanoma samples by assessing a set of predictive models. For this purpose, machine learning models using a support vector machine with different kernels, artificial neural networks, random forests, AdaBoost, and Naïve Bayes are compared. An iterative combination of differentially expressed miRNA, mRNA, and methylation signatures is used as a candidate marker to reveal each new biomarker category’s impact. In each iteration, the performances of the combined models are calculated. During all comparisons, the choice of the feature selection method and under and oversampling approaches are analyzed. Selected biomarkers of the highest performing models are further analyzed for the biological interpretation of functional enrichment.Results: In the initial model, miRNA biomarkers can identify metastatic melanoma with an 81% F-score. The addition of mRNA markers upon miRNA increased the F-score to 92%. In the final integrated model, the addition of the methylation data resulted in a similar F-score of 92% but produced a stable model with low variance across multiple trials.Conclusion: Our results support the role of miRNA regulation in metastatic melanoma as miRNA markers model metastasis outcomes with high accuracy. Moreover, the integrated evaluation of miRNA with mRNA and methylation biomarkers increases the model’s power. It populates selected biomarkers on the metastasis-associated pathways of melanoma, such as the “osteoclast”, “Rap1 signaling”, and “chemokine signaling” pathways.Source Code:https://github.com/aysegul-kt/MelonomaMetastasisPrediction/


Epigenomics ◽  
2019 ◽  
Vol 11 (13) ◽  
pp. 1469-1486 ◽  
Author(s):  
Sailalitha Bollepalli ◽  
Tellervo Korhonen ◽  
Jaakko Kaprio ◽  
Simon Anders ◽  
Miina Ollikainen

Aim: Smoking strongly influences DNA methylation, with current and never smokers exhibiting different methylation profiles. Methods: To advance the practical applicability of the smoking-associated methylation signals, we used machine learning methodology to train a classifier for smoking status prediction. Results: We show the prediction performance of our classifier on three independent whole-blood datasets demonstrating its robustness and global applicability. Furthermore, we examine the reasons for biologically meaningful misclassifications through comprehensive phenotypic evaluation. Conclusion: The major contribution of our classifier is its global applicability without a need for users to determine a threshold value for each dataset to predict the smoking status. We provide an R package, EpiSmokEr (Epigenetic Smoking status Estimator), facilitating the use of our classifier to predict smoking status in future studies.


2020 ◽  
Vol 110 ◽  
pp. 101976
Author(s):  
Laura Macías-García ◽  
María Martínez-Ballesteros ◽  
José María Luna-Romera ◽  
José M. García-Heredia ◽  
Jorge García-Gutiérrez ◽  
...  

Author(s):  
Eliana Portilla-Fernández ◽  
Shih-Jen Hwang ◽  
Rory Wilson ◽  
Jane Maddock ◽  
W. David Hill ◽  
...  

AbstractCommon carotid intima-media thickness (cIMT) is an index of subclinical atherosclerosis that is associated with ischemic stroke and coronary artery disease (CAD). We undertook a cross-sectional epigenome-wide association study (EWAS) of measures of cIMT in 6400 individuals. Mendelian randomization analysis was applied to investigate the potential causal role of DNA methylation in the link between atherosclerotic cardiovascular risk factors and cIMT or clinical cardiovascular disease. The CpG site cg05575921 was associated with cIMT (beta = −0.0264, p value = 3.5 × 10–8) in the discovery panel and was replicated in replication panel (beta = −0.07, p value = 0.005). This CpG is located at chr5:81649347 in the intron 3 of the aryl hydrocarbon receptor repressor gene (AHRR). Our results indicate that DNA methylation at cg05575921 might be in the pathway between smoking, cIMT and stroke. Moreover, in a region-based analysis, 34 differentially methylated regions (DMRs) were identified of which a DMR upstream of ALOX12 showed the strongest association with cIMT (p value = 1.4 × 10–13). In conclusion, our study suggests that DNA methylation may play a role in the link between cardiovascular risk factors, cIMT and clinical cardiovascular disease.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Hanyu Zhang ◽  
Ruoyi Cai ◽  
James Dai ◽  
Wei Sun

AbstractWe introduce a new computational method named EMeth to estimate cell type proportions using DNA methylation data. EMeth is a reference-based method that requires cell type-specific DNA methylation data from relevant cell types. EMeth improves on the existing reference-based methods by detecting the CpGs whose DNA methylation are inconsistent with the deconvolution model and reducing their contributions to cell type decomposition. Another novel feature of EMeth is that it allows a cell type with known proportions but unknown reference and estimates its methylation. This is motivated by the case of studying methylation in tumor cells while bulk tumor samples include tumor cells as well as other cell types such as infiltrating immune cells, and tumor cell proportion can be estimated by copy number data. We demonstrate that EMeth delivers more accurate estimates of cell type proportions than several other methods using simulated data and in silico mixtures. Applications in cancer studies show that the proportions of T regulatory cells estimated by DNA methylation have expected associations with mutation load and survival time, while the estimates from gene expression miss such associations.


2010 ◽  
Vol 20 (12) ◽  
pp. 1719-1729 ◽  
Author(s):  
M. D. Robinson ◽  
C. Stirzaker ◽  
A. L. Statham ◽  
M. W. Coolen ◽  
J. Z. Song ◽  
...  

2016 ◽  
Vol 118 (1) ◽  
pp. 119-131 ◽  
Author(s):  
Jia Zhong ◽  
Golareh Agha ◽  
Andrea A. Baccarelli

2021 ◽  
Author(s):  
Nawar Shara ◽  
Kelley M. Anderson ◽  
Noor Falah ◽  
Maryam F. Ahmad ◽  
Darya Tavazoei ◽  
...  

BACKGROUND Healthcare data are fragmenting as patients seek care from diverse sources. Consequently, patient care is negatively impacted by disparate health records. Machine learning (ML) offers a disruptive force in its ability to inform and improve patient care and outcomes [6]. However, the differences that exist in each individual’s health records, combined with the lack of health-data standards, in addition to systemic issues that render the data unreliable and that fail to create a single view of each patient, create challenges for ML. While these problems exist throughout healthcare, they are especially prevalent within maternal health, and exacerbate the maternal morbidity and mortality (MMM) crisis in the United States. OBJECTIVE Maternal patient records were extracted from the electronic health records (EHRs) of a large tertiary healthcare system and made into patient-specific, complete datasets through a systematic method so that a machine-learning-based (ML-based) risk-assessment algorithm could effectively identify maternal cardiovascular risk prior to evidence of diagnosis or intervention within the patient’s record. METHODS We outline the effort that was required to define the specifications of the computational systems, the dataset, and access to relevant systems, while ensuring data security, privacy laws, and policies were met. Data acquisition included the concatenation, anonymization, and normalization of health data across multiple EHRs in preparation for its use by a proprietary risk-stratification algorithm designed to establish patient-specific baselines to identify and establish cardiovascular risk based on deviations from the patient’s baselines to inform early interventions. RESULTS Patient records can be made actionable for the goal of effectively employing machine learning (ML), specifically to identify cardiovascular risk in pregnant patients. CONCLUSIONS Upon acquiring data, including the concatenation, anonymization, and normalization of said data across multiple EHRs, the use of a machine-learning-based (ML-based) tool can provide early identification of cardiovascular risk in pregnant patients. CLINICALTRIAL N/A


Sign in / Sign up

Export Citation Format

Share Document