Low variability in the underlying cellular landscape adversely affects the performance of interaction-based approaches for conducting cell-specific analyses of DNA methylation in bulk samples

Author(s):  
Richard Meier ◽  
Emily Nissen ◽  
Devin C. Koestler

Abstract Statistical methods that allow for cell type specific DNA methylation (DNAm) analyses based on bulk-tissue methylation data have great potential to improve our understanding of human disease and have created unprecedented opportunities for new insights using the wealth of publicly available bulk-tissue methylation data. These methodologies involve incorporating interaction terms formed between the phenotypes/exposures of interest and proportions of the cell types underlying the bulk-tissue sample used for DNAm profiling. Despite growing interest in such “interaction-based” methods, there has been no comprehensive assessment how variability in the cellular landscape across study samples affects their performance. To answer this question, we used numerous publicly available whole-blood DNAm data sets along with extensive simulation studies and evaluated the performance of interaction-based approaches in detecting cell-specific methylation effects. Our results show that low cell proportion variability results in large estimation error and low statistical power for detecting cell-specific effects of DNAm. Further, we identified that many studies targeting methylation profiling in whole-blood may be at risk to be underpowered due to low variability in the cellular landscape across study samples. Finally, we discuss guidelines for researchers seeking to conduct studies utilizing interaction-based approaches to help ensure that their studies are adequately powered.

Author(s):  
Weiwei Zhang ◽  
Hao Wu ◽  
Ziyi Li

Abstract Motivation It is a common practice in epigenetics research to profile DNA methylation on tissue samples, which is usually a mixture of different cell types. To properly account for the mixture, estimating cell compositions has been recognized as an important first step. Many methods were developed for quantifying cell compositions from DNA methylation data, but they mostly have limited applications due to lack of reference or prior information. Results We develop Tsisal, a novel complete deconvolution method which accurately estimate cell compositions from DNA methylation data without any prior knowledge of cell types or their proportions. Tsisal is a full pipeline to estimate number of cell types, cell compositions, and identify cell-type-specific CpG sites. It can also assign cell type labels when (full or part of) reference panel is available. Extensive simulation studies and analyses of seven real data sets demonstrate the favorable performance of our proposed method compared with existing deconvolution methods serving similar purpose. Availability The proposed method Tsisal is implemented as part of the R/Bioconductor package TOAST at https://bioconductor.org/packages/TOAST. Contact [email protected] and [email protected]. Supplementary information Supplementary data are available at Bioinformatics online.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Hanyu Zhang ◽  
Ruoyi Cai ◽  
James Dai ◽  
Wei Sun

AbstractWe introduce a new computational method named EMeth to estimate cell type proportions using DNA methylation data. EMeth is a reference-based method that requires cell type-specific DNA methylation data from relevant cell types. EMeth improves on the existing reference-based methods by detecting the CpGs whose DNA methylation are inconsistent with the deconvolution model and reducing their contributions to cell type decomposition. Another novel feature of EMeth is that it allows a cell type with known proportions but unknown reference and estimates its methylation. This is motivated by the case of studying methylation in tumor cells while bulk tumor samples include tumor cells as well as other cell types such as infiltrating immune cells, and tumor cell proportion can be estimated by copy number data. We demonstrate that EMeth delivers more accurate estimates of cell type proportions than several other methods using simulated data and in silico mixtures. Applications in cancer studies show that the proportions of T regulatory cells estimated by DNA methylation have expected associations with mutation load and survival time, while the estimates from gene expression miss such associations.


2017 ◽  
Author(s):  
John Dou ◽  
Rebecca J. Schmidt ◽  
Kelly S. Benke ◽  
Craig Newschaffer ◽  
Irva Hertz-Picciotto ◽  
...  

AbstractBackgroundCord blood DNA methylation is associated with numerous health outcomes and environmental exposures. Whole cord blood DNA reflects all nucleated blood cell types, while centrifuging whole blood separates red blood cells by generating a white blood cell buffy coat. Both sample types are used in DNA methylation studies. Cell types have unique methylation patterns and processing can impact cell distributions, which may influence comparability.ObjectivesTo evaluate differences in cell composition and DNA methylation between buffy coat and whole cord blood samples.MethodsCord blood DNA methylation was measured with the Infinium EPIC BeadChip (Illumina) in 8 individuals, each contributing buffy coat and whole blood samples. We analyzed principal components (PC) of methylation, performed hierarchical clustering, and computed correlations of mean-centered methylation between pairs. We conducted moderated t-tests on single sites and estimated cell composition.ResultsDNA methylation PCs were associated with individual (PPC1=1.4x10-9; PPC2=2.9x10-5; PPC3=3.8x10-5; PPC4=4.2x10-6; PPC5=9.9x10-13), and not with sample type (PPC1-5>0.7). Samples hierarchically clustered by individual. Pearson correlations of mean-centered methylation between paired individual samples ranged from r=0.66 to r=0.87. No individual site significantly differed between buffy coat and whole cord blood when adjusting for multiple comparisons (5 sites had unadjusted P<10-5). Estimated cell type proportions did not differ by sample type (P=0.86), and estimated cell counts were highly correlated between paired samples (r=0.99).ConclusionsDifferences in methylation and cell composition between buffy coat and whole cord blood are much lower than inter-individual variation, demonstrating that both sample preparation types can be analytically combined and compared.


2015 ◽  
Vol 16 (Suppl 15) ◽  
pp. P7 ◽  
Author(s):  
Akhilesh Kaushal ◽  
Hongmei Zhang ◽  
Wilfried JJ Karmaus ◽  
Julie SL Wang

2019 ◽  
Author(s):  
Lara Nonell ◽  
Juan R González

AbstractDNA methylation plays an important role in the development and progression of disease. Beta-values are the standard methylation measures. Different statistical methods have been proposed to assess differences in methylation between conditions. However, most of them do not completely account for the distribution of beta-values. The simplex distribution can accommodate beta-values data. We hypothesize that simplex is a quite flexible distribution which is able to model methylation data.To test our hypothesis, we conducted several analyses using four real data sets obtained from microarrays and sequencing technologies. Standard data distributions were studied and modelled in comparison to the simplex. Besides, some simulations were conducted in different scenarios encompassing several distribution assumptions, regression models and sample sizes. Finally, we compared DNA methylation between females and males in order to benchmark the assessed methodologies under different scenarios.According to the results obtained by the simulations and real data analyses, DNA methylation data are concordant with the simplex distribution in many situations. Simplex regression models work well in small sample size data sets. However, when sample size increases, other models such as the beta regression or even the linear regression can be employed to assess group comparisons and obtain unbiased results. Based on these results, we can provide some practical recommendations when analyzing methylation data: 1) use data sets of at least 10 samples per studied condition for microarray data sets or 30 in NGS data sets, 2) apply a simplex or beta regression model for microarray data, 3) apply a linear model in any other case.


2016 ◽  
Author(s):  
E. Andres Houseman ◽  
Molly L. Kile ◽  
David C. Christiani ◽  
Tan A. Ince ◽  
Karl T. Kelsey ◽  
...  

AbstractWe propose a simple method for reference-free deconvolution that provides both proportions of putative cell types defined by their underlying methylomes, the number of these constituent cell types, as well as a method for evaluating the extent to which the underlying methylomes reflect specific types of cells. We have demonstrated these methods in an analysis of 23 Infinium data sets from 13 distinct data collection efforts; these empirical evaluations show that our algorithm can reasonably estimate the number of constituent types, return cell proportion estimates that demonstrate anticipated associations with underlying phenotypic data; and methylomes that reflect the underlying biology of constituent cell types. Thus the methodology permits an explicit quantitation of the mediation of phenotypic associations with DNA methylation by cell composition effects. Although more work is needed to investigate functional information related to estimated methylomes, our proposed method provides a novel and useful foundation for conducting DNA methylation studies on heterogeneous tissues lacking reference data.


Genes ◽  
2019 ◽  
Vol 10 (10) ◽  
pp. 778 ◽  
Author(s):  
Liu ◽  
Liu ◽  
Pan ◽  
Li ◽  
Yang ◽  
...  

For cancer diagnosis, many DNA methylation markers have been identified. However, few studies have tried to identify DNA methylation markers to diagnose diverse cancer types simultaneously, i.e., pan-cancers. In this study, we tried to identify DNA methylation markers to differentiate cancer samples from the respective normal samples in pan-cancers. We collected whole genome methylation data of 27 cancer types containing 10,140 cancer samples and 3386 normal samples, and divided all samples into five data sets, including one training data set, one validation data set and three test data sets. We applied machine learning to identify DNA methylation markers, and specifically, we constructed diagnostic prediction models by deep learning. We identified two categories of markers: 12 CpG markers and 13 promoter markers. Three of 12 CpG markers and four of 13 promoter markers locate at cancer-related genes. With the CpG markers, our model achieved an average sensitivity and specificity on test data sets as 92.8% and 90.1%, respectively. For promoter markers, the average sensitivity and specificity on test data sets were 89.8% and 81.1%, respectively. Furthermore, in cell-free DNA methylation data of 163 prostate cancer samples, the CpG markers achieved the sensitivity as 100%, and the promoter markers achieved 92%. For both marker types, the specificity of normal whole blood was 100%. To conclude, we identified methylation markers to diagnose pan-cancers, which might be applied to liquid biopsy of cancers.


Epigenomics ◽  
2019 ◽  
Vol 11 (12) ◽  
pp. 1429-1439 ◽  
Author(s):  
Eleftheria Theodoropoulou ◽  
Lars Alfredsson ◽  
Fredrik Piehl ◽  
Francesco Marabita ◽  
Maja Jagodic

Aim: Accumulating evidence links epigenetic age to diseases and age-related conditions, but little is known about its association with multiple sclerosis (MS). Materials & methods: We estimated epigenetic age acceleration measures using DNA methylation from blood or sorted cells of MS patients and controls. Results: In blood, sex (p = 4.39E-05) and MS (p = 2.99E-03) explained the variation in age acceleration, and isolated blood cell types showed different epigenetic age. Intrinsic epigenetic age acceleration and extrinsic epigenetic age acceleration were only associated with sex (p = 2.52E-03 and p = 1.58E-04, respectively), while PhenoAge Acceleration displayed positive association with MS (p = 3.40E-02). Conclusion: Different age acceleration measures are distinctly influenced by phenotypic factors, and they might measure separate pathophysiological aspects of MS. Data deposition: DNA methylation data can be accessed at Gene Expression Omnibus database under accession number GSE35069, GSE43976, GSE106648, GSE130029, GSE130030.


2016 ◽  
Author(s):  
Timothy J. Triche ◽  
Peter W. Laird ◽  
Kimberly D. Siegmund

AbstractBackgroundDNA methylation is the most readily assayed epigenetic mark, possessing confirmed relationships with gene expression, imprinting, and chromatin accessibility.Given the increasingly widespread use of DNA methylation microarrays in population-scale epidemiological applications, we sought to determine which methods provided the greatest statistical power to reproducibly detect differences in DNA methylation across various conditions,using publicly available data sets on tissue type and aging.ResultsBeta regression, as proposed originally by Ferrari and Cribari-Neto, yielded more validated hits in each of our comparisons than any other method under consideration, both in a regression setting and in comparisons to two-group tests such as the Wilcoxon-Mann-Whitney, Student t, and Welch t tests.In large cohorts of whole blood samples, we corrected for compositional differences and batch effects, and found that marginal likelihood ratio tests from beta regression models uniformly dominate popular alternatives based on linear models.The superior sensitivity and specificity exhibited by beta regression in epidemiologically relevant cohort sizes corresponded to approximately a 2% increase in sensitivity at the same specificity when compared to linear models fitted on raw beta values (proportion of signal intensity due to the methylated allele), M-values, or rankquantile normalized values.ConclusionsInvestigators should consider beta regression to maximize statistical power in studies of DNA methylation using microarrays.At epidemiologically relevant sample sizes, with typical quality control procedures (compositional and batch effect correction), cross-cohort agreement uniformly favors beta regression over popular alternatives.


2016 ◽  
Author(s):  
Elior Rahmani ◽  
Liat Shenhav ◽  
Regev Schweiger ◽  
Paul Yousefi ◽  
Karen Huen ◽  
...  

AbstractGenetic data are known to harbor information about human demographics, and genotyping data are commonly used for capturing ancestry information by leveraging genome-wide differences between populations. In contrast, it is not clear to what extent population structure is captured by whole-genome DNA methylation data. We demonstrate, using three large cohort 450K methylation array data sets, that ancestry information signal is mirrored in genome-wide DNA methylation data, and that it can be further isolated more effectively by leveraging the correlation structure of CpGs with cis-located SNPs. Based on these insights, we propose a method, EPISTRUCTURE, for the inference of ancestry from methylation data, without the need for genotype data. EPISTRUCTURE can be used to infer ancestry information of individuals based on their methylation data in the absence of corresponding genetic data. Although genetic data are often collected in epigenetic studies of large cohorts, these are typically not made publicly available, making the application of EPISTRUCTURE especially useful for anyone working on public data. Implementation of EPISTRUCTURE is available in GLINT, our recently released toolset for DNA methylation analysis at: http://glint-epigenetics.readthedocs.io.


Sign in / Sign up

Export Citation Format

Share Document