A novel machine learning framework for phenotype prediction based on genome-wide DNA methylation data

Author(s):  
Vinay Vittal Karagod ◽  
Kaushik Sinha
2010 ◽  
Vol 20 (12) ◽  
pp. 1719-1729 ◽  
Author(s):  
M. D. Robinson ◽  
C. Stirzaker ◽  
A. L. Statham ◽  
M. W. Coolen ◽  
J. Z. Song ◽  
...  

Data in Brief ◽  
2018 ◽  
Vol 19 ◽  
pp. 1046-1057 ◽  
Author(s):  
Giovanni Scala ◽  
Veer Marwah ◽  
Pia Kinaret ◽  
Jukka Sund ◽  
Vittorio Fortino ◽  
...  

Circulation ◽  
2017 ◽  
Vol 135 (suppl_1) ◽  
Author(s):  
Xiaoling Wang ◽  
Yue Pan ◽  
Haidong Zhu ◽  
Guang Hao ◽  
Xin Wang ◽  
...  

Background: Several large-scale epigenome wide association studies on obesity-related DNA methylation changes have been published and in total identified 46 CpG sites. These studies were conducted in middle-aged and older adults of Caucasians and African Americans (AAs) using leukocytes. To what extend these signals are independent of cell compositions as well as to what extend they may influence gene expression have not been systematically investigated. Furthermore, the high prevalence of obesity comorbidities in middle-aged or older population may hide or bias obesity itself related DNA methylation changes. Methods: In this study of healthy AA youth and young adults, genome wide DNA methylation data from leukocytes were obtained from three independent studies: EpiGO study (96 obese cases vs. 92 lean controls, aged 14-21, 50% females, test of interest is obesity status), LACHY study (284 participants from general population, aged 14-18, 50% females, test of interest is BMI), and Georgia Stress and Heart study (298 participants from general population, aged 18-38, 52% females, test of interest is BMI) using the Infinium HumanMethylation450 BeadChip. Genome wide DNA methylation data from purified neutrophils as well as genome wide gene expression data from leukocytes using Illumina HT12 V4 array were also obtained for the EpiGO samples. Results: The meta-analysis on the 3 cohorts identified 76 obesity related CpG sites in leukocytes with p<1х10 -7 . Out of the 46 previously identified CpG sites, 36 can be replicated in this AA youth and young adult sample with same direction and p<0.05. Out of the 107 CpG sites including the 36 replicated ones and the 71 newly identified ones, 71 CpG sites (66%) had their relationship with obesity replicated in purified neutrophils (p<0.05). The analysis on the cis regulation of the 107 CpG sites on gene expression showed that 59 CpG sites had at least one gene within 250kb having expression difference between obese cases and lean controls. Furthermore, out of the 59 CpG sites, 6 showed significantly negative correlations and 1 showed significantly positive correlation with the differentially expressed genes. These CpG sites located in SOCS3, CISH, ABCG1, PIM3 and PTGDS genes. Conclusion: In this study of AA youth and young adults, we identified novel CpG sites associated with obesity and replicated majority of the CpG sites previously identified in middle-aged and older adults. For the first time, we showed that majority of the obesity related CpG sites identified from leukocytes are not driven by cell compositions and provided the direct link between DNA methylation-gene expression-obesity status for 7 CpG sites in 5 genes.


Author(s):  
Xiangyu Luo ◽  
Joel Schwartz ◽  
Andrea Baccarelli ◽  
Zhonghua Liu

Abstract Epigenome-wide mediation analysis aims to identify DNA methylation CpG sites that mediate the causal effects of genetic/environmental exposures on health outcomes. However, DNA methylations in the peripheral blood tissues are usually measured at the bulk level based on a heterogeneous population of white blood cells. Using the bulk level DNA methylation data in mediation analysis might cause confounding bias and reduce study power. Therefore, it is crucial to get fine-grained results by detecting mediation CpG sites in a cell-type-specific way. However, there is a lack of methods and software to achieve this goal. We propose a novel method (Mediation In a Cell-type-Specific fashion, MICS) to identify cell-type-specific mediation effects in genome-wide epigenetic studies using only the bulk-level DNA methylation data. MICS follows the standard mediation analysis paradigm and consists of three key steps. In step1, we assess the exposure-mediator association for each cell type; in step 2, we assess the mediator-outcome association for each cell type; in step 3, we combine the cell-type-specific exposure-mediator and mediator-outcome associations using a multiple testing procedure named MultiMed [Sampson JN, Boca SM, Moore SC, et al. FWER and FDR control when testing multiple mediators. Bioinformatics 2018;34:2418–24] to identify significant CpGs with cell-type-specific mediation effects. We conduct simulation studies to demonstrate that our method has correct FDR control. We also apply the MICS procedure to the Normative Aging Study and identify nine DNA methylation CpG sites in the lymphocytes that might mediate the effect of cigarette smoking on the lung function.


2021 ◽  
Vol 8 ◽  
Author(s):  
Ayşegül Kutlay ◽  
Yeşim Aydin Son

Introduction: Despite the significant progress in understanding cancer biology, the deduction of metastasis is still a challenge in the clinic. Transcriptional regulation is one of the critical mechanisms underlying cancer development. Even though mRNA, microRNA, and DNA methylation mechanisms have a crucial impact on the metastatic outcome, there are no comprehensive data mining models that combine all transcriptional regulation aspects for metastasis prediction. This study focused on identifying the regulatory impact of genetic biomarkers for monitoring metastatic molecular signatures of melanoma by investigating the consolidated effect of miRNA, mRNA, and DNA methylation.Method: We developed multiple machine learning models to distinguish the metastasis by integrating miRNA, mRNA, and DNA methylation markers. We used the TCGA melanoma dataset to differentiate between metastatic melanoma samples by assessing a set of predictive models. For this purpose, machine learning models using a support vector machine with different kernels, artificial neural networks, random forests, AdaBoost, and Naïve Bayes are compared. An iterative combination of differentially expressed miRNA, mRNA, and methylation signatures is used as a candidate marker to reveal each new biomarker category’s impact. In each iteration, the performances of the combined models are calculated. During all comparisons, the choice of the feature selection method and under and oversampling approaches are analyzed. Selected biomarkers of the highest performing models are further analyzed for the biological interpretation of functional enrichment.Results: In the initial model, miRNA biomarkers can identify metastatic melanoma with an 81% F-score. The addition of mRNA markers upon miRNA increased the F-score to 92%. In the final integrated model, the addition of the methylation data resulted in a similar F-score of 92% but produced a stable model with low variance across multiple trials.Conclusion: Our results support the role of miRNA regulation in metastatic melanoma as miRNA markers model metastasis outcomes with high accuracy. Moreover, the integrated evaluation of miRNA with mRNA and methylation biomarkers increases the model’s power. It populates selected biomarkers on the metastasis-associated pathways of melanoma, such as the “osteoclast”, “Rap1 signaling”, and “chemokine signaling” pathways.Source Code:https://github.com/aysegul-kt/MelonomaMetastasisPrediction/


2020 ◽  
Vol 21 (1) ◽  
Author(s):  
Yadollah Shahryary ◽  
Aikaterini Symeonidi ◽  
Rashmi R. Hazarika ◽  
Johanna Denkena ◽  
Talha Mubeen ◽  
...  

Abstract Stochastic changes in DNA methylation (i.e., spontaneous epimutations) contribute to methylome diversity in plants. Here, we describe AlphaBeta, a computational method for estimating the precise rate of such stochastic events using pedigree-based DNA methylation data as input. We demonstrate how AlphaBeta can be employed to study transgenerationally heritable epimutations in clonal or sexually derived mutation accumulation lines, as well as somatic epimutations in long-lived perennials. Application of our method to published and new data reveals that spontaneous epimutations accumulate neutrally at the genome-wide scale, originate mainly during somatic development and that they can be used as a molecular clock for age-dating trees.


2020 ◽  
Author(s):  
Arunima Roy ◽  
Christopher J. Earley ◽  
Richard P. Allen ◽  
Zachary A. Kaminsky

Blood ◽  
2021 ◽  
Vol 138 (Supplement 1) ◽  
pp. 214-214
Author(s):  
Shaobo Li ◽  
Pagna Sok ◽  
Keren Xu ◽  
Ivo S Muskens ◽  
Natalina Elliott ◽  
...  

Abstract Background: Down syndrome (DS) is associated with an up to 30-fold increased risk of B-cell acute lymphoblastic leukemia (ALL), and DS-ALL patients have worse overall survival and increased long-term treatment-related health conditions compared with non-DS ALL patients. In a recent genome-wide association study of DS-ALL, established ALL genetic risk loci were associated with DS-ALL, with several single nucleotide polymorphisms (SNPs) conferring a larger effect on ALL risk in the context of DS than in euploidy. We performed an epigenome-wide association study (EWAS) to elucidate whether epigenetic differences at birth are associated with risk of subsequent DS-ALL. Methods: The DS-ALL Discovery Study included 147 DS-ALL cases and 198 DS controls from the International Study of Down Syndrome Acute Leukemia, with newborn dried bloodspots (DBS) obtained from California (n=326) and Washington state (n=19) biobanks. The DS-ALL Replication Study included 24 DS-ALL cases and 24 DS controls with newborn DBS from the Michigan Neonatal Biobank. DNA was isolated from DBS, bisulfite converted, and assayed using Illumina Infinium MethylationEPIC Beadchip genome-wide DNA methylation arrays. Raw data were processed using "minfi" and "noob" packages in R. Reference-based deconvolution of blood cell proportions was performed using the Identifying Optimal DNA methylation Libraries (IDOL) algorithm, using DNA methylation data from cord blood reference samples, to estimate proportions of B cells, T cells (CD4+ and CD8+), monocytes, granulocytes, natural killer cells, and nucleated red blood cells. We compared each cell type proportion between DS-ALL cases and DS controls using linear regression adjusting for sex, plate, and principal components (PCs) to account for genetic ancestry. To identify single CpG probes associated with DS-ALL risk, we performed a multiethnic EWAS of DS-ALL in each study using linear regression adjusting for sex, plate, and PCs related to: 1) cell-type proportions and 2) genetic ancestry. Differentially methylated regions (DMRs) were identified using DMRcate and comb-p methods. In the Discovery Study, genome-wide SNP array data were available for 131 cases and 130 controls, and data from targeted sequencing of somatic mutations in exons 2/3 of GATA1 were available for 184/198 DS controls. Results: Deconvolution of blood cell proportions in the DS-ALL Discovery Study showed significantly higher B cell proportions in newborns with DS who later developed ALL (mean=0.0128, sd=0.0151) compared with DS controls (mean=0.00826, sd=0.0115) (P=6.4x10 -4, coefficient=0.0052). A significantly higher B cell proportion at birth was also found in DS-ALL cases in the independent Replication Study (cases mean=0.048, sd=0.024; controls mean=0.039, sd=0.028; P=0.03, coefficient=0.015). In the Discovery Study, the B cell difference remained significant (P=5.8x10 -3) with a similar effect size (coefficient=0.0045) after removal of GATA1 mutation-positive DS controls (n=30). We also investigated whether DS-ALL risk SNPs at ARID5B, IKZF1, GATA3, and CDKN2A may confound the association, but the increased B cell proportions in DS-ALL remained significant and effect estimates slightly increased in SNP genotype-adjusted models (coefficient range:0.0055-0.0059). In the EWAS of DS-ALL, 9 CpGs reached epigenome-wide significance (P&lt;7.67x10 -8), including 2 CpGs overlapping the promoter of the tumor suppressor gene TRIM13, frequently deleted in B-CLL, although none of these showed evidence of association (P&lt;0.05) in the Replication Study. We identified 125 DMRs associated with DS-ALL in the Discovery Study. For 3 DMRs, overlapping genes HOPX, SMIM24, and PPP1R10, all implicated in normal and leukemic stem cell function, there were multiple significant CpGs in the Replication Study (P&lt;0.05) all with effects in the same direction as the Discovery Study DMRs. Conclusions: Increased B cell proportions in newborns with DS may be a risk factor for development of DS-ALL in childhood. This finding, based on DNA methylation data, requires confirmation using conventional cell count measures, and should be explored as a novel biomarker for ALL risk in the non-DS population. Single CpGs and DMRs associated with DS-ALL risk in our Discovery Study require further investigation, including in additional ALL case-control studies in DS and non-DS populations. Disclosures Ma: Celgene/Bristol Myers Squibb: Consultancy, Research Funding.


Sign in / Sign up

Export Citation Format

Share Document