scholarly journals Revisiting genetic artifacts on DNA methylation microarrays exposes novel biological implications

2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Benjamin Planterose Jiménez ◽  
Manfred Kayser ◽  
Athina Vidaki

Abstract Background Illumina DNA methylation microarrays enable epigenome-wide analysis vastly used for the discovery of novel DNA methylation variation in health and disease. However, the microarrays’ probe design cannot fully consider the vast human genetic diversity, leading to genetic artifacts. Distinguishing genuine from artifactual genetic influence is of particular relevance in the study of DNA methylation heritability and methylation quantitative trait loci. But despite its importance, current strategies to account for genetic artifacts are lagging due to a limited mechanistic understanding on how such artifacts operate. Results To address this, we develop and benchmark UMtools, an R-package containing novel methods for the quantification and qualification of genetic artifacts based on fluorescence intensity signals. With our approach, we model and validate known SNPs/indels on a genetically controlled dataset of monozygotic twins, and we estimate minor allele frequency from DNA methylation data and empirically detect variants not included in dbSNP. Moreover, we identify examples where genetic artifacts interact with each other or with imprinting, X-inactivation, or tissue-specific regulation. Finally, we propose a novel strategy based on co-methylation that can discern between genetic artifacts and genuine genomic influence. Conclusions We provide an atlas to navigate through the huge diversity of genetic artifacts encountered on DNA methylation microarrays. Overall, our study sets the ground for a paradigm shift in the study of the genetic component of epigenetic variation in DNA methylation microarrays.

2021 ◽  
Author(s):  
Maria Derakhshan ◽  
Noah J. Kessler ◽  
Miho Ishida ◽  
Charalambos Demetriou ◽  
Nicolas Brucato ◽  
...  

We analysed DNA methylation data from 30 datasets comprising 3,474 individuals, 19 tissues and 8 ethnicities at CpGs covered by the Illumina450K array. We identified 4,143 hypervariable CpGs ('hvCpGs') with methylation in the top 5% most variable sites across multiple tissues and ethnicities. hvCpG methylation was influenced but not determined by genetic variation, and was not linked to probe reliability, epigenetic drift, age, sex or cell heterogeneity effects. hvCpG methylation tended to covary across tissues derived from different germ-layers and hvCpGs were enriched for associations with periconceptional environment, proximity to ERV1 and ERVK retrovirus elements and parent-of-origin-specific methylation. They also showed distinctive methylation signatures in monozygotic twins. Together, these properties position hvCpGs as strong candidates for studying how stochastic and/or environmentally influenced DNA methylation states which are established in the early embryo and maintained stably thereafter can influence life-long health and disease.


2015 ◽  
Vol 32 (4) ◽  
pp. 593-595 ◽  
Author(s):  
Kathleen Oros Klein ◽  
Stepan Grinek ◽  
Sasha Bernatsky ◽  
Luigi Bouchard ◽  
Antonio Ciampi ◽  
...  

2020 ◽  
Author(s):  
John T. Lawson ◽  
Jason P. Smith ◽  
Stefan Bekiranov ◽  
Francine E. Garrett-Bakelman ◽  
Nathan C. Sheffield

AbstractA key challenge in epigenetics is to determine the biological significance of epigenetic variation among individuals. Here, we present Coordinate Covariation Analysis (COCOA), a computational framework that uses covariation of epigenetic signals across individuals and a database of region sets to annotate epigenetic heterogeneity. COCOA is the first such tool for DNA methylation data and can also analyze any epigenetic signal with genomic coordinates. We demonstrate COCOA’s utility by analyzing DNA methylation, ATAC-seq, and multi-omic data in supervised and unsupervised analyses, showing that COCOA provides new understanding of inter-sample epigenetic variation. COCOA is available as a Bioconductor R package (http://bioconductor.org/packages/COCOA).


2019 ◽  
Author(s):  
Victor Yuan ◽  
E Magda Price ◽  
Giulia F Del Gobbo ◽  
Sara Mostafavi ◽  
Brian Cox ◽  
...  

ABSTRACTBackgroundThe influence of genetics on variation in DNA methylation (DNAme) is well documented. Yet confounding from population stratification is often unaccounted for in DNAme association studies. Existing approaches to address confounding by population stratification using DNAme data may not generalize to populations or tissues outside those in which they were developed. To aid future placental DNAme studies in assessing population stratification, we developed an ethnicity classifier, PlaNET (Placental DNAme Elastic Net Ethnicity Tool), using five cohorts with Infinium Human Methylation 450k BeadChip array (HM450k) data from placental samples that is also compatible with the newer EPIC platform.ResultsData from 509 placental samples was used to develop PlaNET and show that it accurately predicts (accuracy = 0.938, kappa = 0.823) major classes of self-reported ethnicity/race (African: n = 58, Asian: n = 53, Caucasian: n = 389), and produces ethnicity probabilities that are highly correlated with genetic ancestry inferred from genome-wide SNP arrays (>2.5 million SNP) and ancestry informative markers (n = 50 SNPs). PlaNET’s ethnicity classification relies on 1860 HM450K microarray sites, and over half of these were linked to nearby genetic polymorphisms (n = 955). Our placental-optimized method outperforms existing approaches in assessing population stratification in placental samples from individuals of Asian, African, and Caucasian ethnicities.ConclusionPlaNET provides an improved approach to address population stratification in placental DNAme association studies. The method can be applied to predict ethnicity as a discrete or continuous variable and will be especially useful when self-reported ethnicity information is missing and genotyping markers are unavailable. PlaNET is available as an R package at (https://github.com/wvictor14/planet).


Epigenomics ◽  
2019 ◽  
Vol 11 (13) ◽  
pp. 1469-1486 ◽  
Author(s):  
Sailalitha Bollepalli ◽  
Tellervo Korhonen ◽  
Jaakko Kaprio ◽  
Simon Anders ◽  
Miina Ollikainen

Aim: Smoking strongly influences DNA methylation, with current and never smokers exhibiting different methylation profiles. Methods: To advance the practical applicability of the smoking-associated methylation signals, we used machine learning methodology to train a classifier for smoking status prediction. Results: We show the prediction performance of our classifier on three independent whole-blood datasets demonstrating its robustness and global applicability. Furthermore, we examine the reasons for biologically meaningful misclassifications through comprehensive phenotypic evaluation. Conclusion: The major contribution of our classifier is its global applicability without a need for users to determine a threshold value for each dataset to predict the smoking status. We provide an R package, EpiSmokEr (Epigenetic Smoking status Estimator), facilitating the use of our classifier to predict smoking status in future studies.


2012 ◽  
Vol 29 (2) ◽  
pp. 189-196 ◽  
Author(s):  
Andrew E. Teschendorff ◽  
Francesco Marabita ◽  
Matthias Lechner ◽  
Thomas Bartlett ◽  
Jesper Tegner ◽  
...  

2019 ◽  
Author(s):  
Yadollah Shahryary ◽  
Aikaterini Symeonidi ◽  
Rashmi R. Hazarika ◽  
Johanna Denkena ◽  
Talha Mubeen ◽  
...  

AbstractIntroductionHeritable changes in cytosine methylation can arise stochastically in plant genomes independently of DNA sequence alterations. These so-called ‘spontaneous epimutations’ appear to be a byproduct of imperfect DNA methylation maintenance during mitotic or meitotic cell divisions. Accurate estimates of the rate and spectrum of these stochastic events are necessary to be able to quantify how epimutational processes shape methylome diversity in the context of plant evolution, development and aging.MethodHere we describe AlphaBeta, a computational method for estimating epimutation rates and spectra from pedigree-based high-throughput DNA methylation data. The approach requires that the topology of the pedigree is known, which is typically the case in the experimental construction of mutation accumulation lines (MA-lines) in sexually or clonally reproducing species. However, this method also works for inferring somatic epimutation rates in long-lived perennials, such as trees, using leaf methylomes and coring data as input. In this case, we treat the tree branching structure as an intra-organismal phylogeny of somatic lineages and leverage information about the epimutational history of each branch.ResultsTo illustrate the method, we applied AlphaBeta to multi-generational data from selfing- and asexually-derived MA-lines in Arabidopsis and dandelion, as well as to intra-generational leaf methylome data of a single poplar tree. Our results show that the epimutation landscape in plants is deeply conserved across angiosperm species, and that heritable epimutations originate mainly during somatic development, rather than from DNA methylation reinforcement errors during sexual reproduction. Finally, we also provide the first evidence that DNA methylation data, in conjunction with statistical epimutation models, can be used as a molecular clock for age-dating trees.ConclusionAlphaBeta faciliates unprecedented quantitative insights into epimutational processes in a wide range of plant systems. Software implementing our method is available as a Bioconductor R package at http://bioconductor.org/packages/3.10/bioc/html/AlphaBeta.html


2019 ◽  
Vol 35 (19) ◽  
pp. 3635-3641 ◽  
Author(s):  
Yue Wang ◽  
Jennifer M Franks ◽  
Michael L Whitfield ◽  
Chao Cheng

AbstractMotivationThe accumulation of publicly available DNA methylation datasets has resulted in the need for tools to interpret the specific cellular phenotypes in bulk tissue data. Current approaches use either single differentially methylated CpG sites or differentially methylated regions that map to genes. However, these approaches may introduce biases in downstream analyses of biological interpretation, because of the variability in gene length. There is a lack of approaches to interpret DNA methylation effectively. Therefore, we have developed computational models to provide biological interpretation of relevant gene sets using DNA methylation data in the context of The Cancer Genome Atlas.ResultsWe illustrate that Biological interpretation of DNA Methylation (BioMethyl) utilizes the complete DNA methylation data for a given cancer type to reflect corresponding gene expression profiles and performs pathway enrichment analyses, providing unique biological insight. Using breast cancer as an example, BioMethyl shows high consistency in the identification of enriched biological pathways from DNA methylation data compared to the results calculated from RNA sequencing data. We find that 12 out of 14 pathways identified by BioMethyl are shared with those by using RNA-seq data, with a Jaccard score 0.8 for estrogen receptor (ER) positive samples. For ER negative samples, three pathways are shared in the two enrichments with a slight lower similarity (Jaccard score = 0.6). Using BioMethyl, we can successfully identify those hidden biological pathways in DNA methylation data when gene expression profile is lacking.Availability and implementationBioMethyl R package is freely available in the GitHub repository (https://github.com/yuewangpanda/BioMethyl).Supplementary informationSupplementary data are available at Bioinformatics online.


Sign in / Sign up

Export Citation Format

Share Document