scholarly journals Reference-free deconvolution of DNA methylation data and mediation by cell composition effects

2016 ◽  
Author(s):  
E. Andres Houseman ◽  
Molly L. Kile ◽  
David C. Christiani ◽  
Tan A. Ince ◽  
Karl T. Kelsey ◽  
...  

AbstractWe propose a simple method for reference-free deconvolution that provides both proportions of putative cell types defined by their underlying methylomes, the number of these constituent cell types, as well as a method for evaluating the extent to which the underlying methylomes reflect specific types of cells. We have demonstrated these methods in an analysis of 23 Infinium data sets from 13 distinct data collection efforts; these empirical evaluations show that our algorithm can reasonably estimate the number of constituent types, return cell proportion estimates that demonstrate anticipated associations with underlying phenotypic data; and methylomes that reflect the underlying biology of constituent cell types. Thus the methodology permits an explicit quantitation of the mediation of phenotypic associations with DNA methylation by cell composition effects. Although more work is needed to investigate functional information related to estimated methylomes, our proposed method provides a novel and useful foundation for conducting DNA methylation studies on heterogeneous tissues lacking reference data.


2017 ◽  
Author(s):  
John Dou ◽  
Rebecca J. Schmidt ◽  
Kelly S. Benke ◽  
Craig Newschaffer ◽  
Irva Hertz-Picciotto ◽  
...  

AbstractBackgroundCord blood DNA methylation is associated with numerous health outcomes and environmental exposures. Whole cord blood DNA reflects all nucleated blood cell types, while centrifuging whole blood separates red blood cells by generating a white blood cell buffy coat. Both sample types are used in DNA methylation studies. Cell types have unique methylation patterns and processing can impact cell distributions, which may influence comparability.ObjectivesTo evaluate differences in cell composition and DNA methylation between buffy coat and whole cord blood samples.MethodsCord blood DNA methylation was measured with the Infinium EPIC BeadChip (Illumina) in 8 individuals, each contributing buffy coat and whole blood samples. We analyzed principal components (PC) of methylation, performed hierarchical clustering, and computed correlations of mean-centered methylation between pairs. We conducted moderated t-tests on single sites and estimated cell composition.ResultsDNA methylation PCs were associated with individual (PPC1=1.4x10-9; PPC2=2.9x10-5; PPC3=3.8x10-5; PPC4=4.2x10-6; PPC5=9.9x10-13), and not with sample type (PPC1-5>0.7). Samples hierarchically clustered by individual. Pearson correlations of mean-centered methylation between paired individual samples ranged from r=0.66 to r=0.87. No individual site significantly differed between buffy coat and whole cord blood when adjusting for multiple comparisons (5 sites had unadjusted P<10-5). Estimated cell type proportions did not differ by sample type (P=0.86), and estimated cell counts were highly correlated between paired samples (r=0.99).ConclusionsDifferences in methylation and cell composition between buffy coat and whole cord blood are much lower than inter-individual variation, demonstrating that both sample preparation types can be analytically combined and compared.



2019 ◽  
Vol 21 (5) ◽  
pp. 1581-1595 ◽  
Author(s):  
Xinlei Zhao ◽  
Shuang Wu ◽  
Nan Fang ◽  
Xiao Sun ◽  
Jue Fan

Abstract Single-cell RNA sequencing (scRNA-seq) has been rapidly developing and widely applied in biological and medical research. Identification of cell types in scRNA-seq data sets is an essential step before in-depth investigations of their functional and pathological roles. However, the conventional workflow based on clustering and marker genes is not scalable for an increasingly large number of scRNA-seq data sets due to complicated procedures and manual annotation. Therefore, a number of tools have been developed recently to predict cell types in new data sets using reference data sets. These methods have not been generally adapted due to a lack of tool benchmarking and user guidance. In this article, we performed a comprehensive and impartial evaluation of nine classification software tools specifically designed for scRNA-seq data sets. Results showed that Seurat based on random forest, SingleR based on correlation analysis and CaSTLe based on XGBoost performed better than others. A simple ensemble voting of all tools can improve the predictive accuracy. Under nonideal situations, such as small-sized and class-imbalanced reference data sets, tools based on cluster-level similarities have superior performance. However, even with the function of assigning ‘unassigned’ labels, it is still challenging to catch novel cell types by solely using any of the single-cell classifiers. This article provides a guideline for researchers to select and apply suitable classification tools in their analysis workflows and sheds some lights on potential direction of future improvement on classification tools.



2018 ◽  
Author(s):  
Jacob Bergstedt ◽  
Alejandra Urrutia ◽  
Darragh Duffy ◽  
Matthew L. Albert ◽  
Lluís Quintana-Murci ◽  
...  

DNA methylation is a stable epigenetic alteration that plays a key role in cellular differentiation and gene regulation, and that has been proposed to mediate environmental effects on disease risk. Epigenome-wide association studies have identified and replicated associations between methylation sites and several disease conditions, which could serve as biomarkers in predictive medicine and forensics. Nevertheless, heterogeneity in cellular proportions between the compared groups could complicate interpretation. Reference-based cell-type deconvolution methods have proven useful in correcting epigenomic studies for cellular heterogeneity, but they rely on reference libraries of sorted cells and only predict a limited number of cell populations. Here we leverage >850,000 methylation sites included in the MethylationEPIC array and use elastic net regularized and stability selected regression models to predict the circulating levels of 70 blood cell subsets, measured by standardized flow cytometry in 962 healthy donors of western European descent. We show that our predictions, based on a hundred of methylation sites or lower, are less error-prone than other existing methods, and extend the number of cell types that can be accurately predicted. Application of the same methods to age, smoking consumption and several serological responses to pathogen antigens also provide accurate estimations. Together, our study substantially improves predictions of blood cell composition based on methylation profiles, which will be critical in the emerging field of medical epigenomics.



2021 ◽  
Vol 13 (1) ◽  
Author(s):  
Nicholas D. Johnson ◽  
Xiumei Wu ◽  
Christopher D. Still ◽  
Xin Chu ◽  
Anthony T. Petrick ◽  
...  

Abstract Background Non-alcoholic fatty liver disease (NAFLD) is characterized by changes in cell composition that occur throughout disease pathogenesis, which includes the development of fibrosis in a subset of patients. DNA methylation (DNAm) is a plausible mechanism underlying these shifts, considering that DNAm profiles differ across tissues and cell types, and DNAm may play a role in cell-type differentiation. Previous work investigating the relationship between DNAm and fibrosis in NAFLD has been limited by sample size and the number of CpG sites interrogated. Results Here, we performed an epigenome-wide analysis using Infinium MethylationEPIC array data from 325 individuals with NAFLD, including 119 with severe fibrosis and 206 with no histological evidence of fibrosis. After adjustment for latent confounders, we identified 7 CpG sites whose DNAm associated with fibrosis (p < 5.96 × 10–8). Analysis of RNA-seq data collected from a subset of individuals (N = 56) revealed that gene expression at 288 genes associated with DNAm at one or more of the 7 fibrosis-related CpGs. DNAm-based estimates of cell-type proportions showed that estimated proportions of natural killer cells increased, while epithelial cell proportions decreased with disease stage. Finally, we used an elastic net regression model to assess DNAm as a biomarker of fibrotic stage and found that our model predicted fibrosis with a sensitivity of 0.93 and provided information beyond a model based solely on cell-type proportions. Conclusion These findings are consistent with DNAm as a mechanism underpinning or marking fibrosis-related shifts in cell composition and demonstrate the potential of DNAm as a possible biomarker of NAFLD fibrosis.



Author(s):  
Richard Meier ◽  
Emily Nissen ◽  
Devin C. Koestler

Abstract Statistical methods that allow for cell type specific DNA methylation (DNAm) analyses based on bulk-tissue methylation data have great potential to improve our understanding of human disease and have created unprecedented opportunities for new insights using the wealth of publicly available bulk-tissue methylation data. These methodologies involve incorporating interaction terms formed between the phenotypes/exposures of interest and proportions of the cell types underlying the bulk-tissue sample used for DNAm profiling. Despite growing interest in such “interaction-based” methods, there has been no comprehensive assessment how variability in the cellular landscape across study samples affects their performance. To answer this question, we used numerous publicly available whole-blood DNAm data sets along with extensive simulation studies and evaluated the performance of interaction-based approaches in detecting cell-specific methylation effects. Our results show that low cell proportion variability results in large estimation error and low statistical power for detecting cell-specific effects of DNAm. Further, we identified that many studies targeting methylation profiling in whole-blood may be at risk to be underpowered due to low variability in the cellular landscape across study samples. Finally, we discuss guidelines for researchers seeking to conduct studies utilizing interaction-based approaches to help ensure that their studies are adequately powered.



2019 ◽  
Vol 51 (6) ◽  
pp. 241-253 ◽  
Author(s):  
Wenhui Wang ◽  
Li Wang ◽  
Percio S. Gulko ◽  
Jun Zhu

Osteoarthritis (OA) and rheumatoid arthritis (RA) are the most common forms of arthritis. The synovial tissue is the major site of inflammation of OA and RA and consists of diverse cells. Synovial tissue cell composition changes during arthritis pathogenesis and progression have not been systematically characterized and may provide critical insights into disease processes. In this study we aimed at systematically examining cellular changes in synovial tissue. Publicly available synovial tissue transcriptomic data sets were used. We computationally estimated cell compositions in synovial tissue based on transcriptomic data and compared cell compositions in different diseases or at different disease stages. Synovial fibroblasts, macrophages, adipocytes, and immune cells were the major cell types in all synovial tissue. Both OA and RA patients had a significantly lower adipocyte fraction compared with healthy controls. The decrease trend was also observed during OA and RA progression. The fraction of monocytes was also increased in both OA and RA arthritis patients, consistent with the observations that inflammation involved in both OA and RA. But the monocyte fraction in RAs was much higher than the ones in healthy controls and OAs. The M2 macrophage fraction was reduced in RA compared with OA, the reduction trend continued during RA progression from the early- to the late-stage. There were consistent cell composition differences between different types or stages of arthritis. Both in RA and OA, the new discovery of changes in the adipocyte and M2 macrophage fractions has potential leading to novel therapeutic development.



Author(s):  
Weiwei Zhang ◽  
Hao Wu ◽  
Ziyi Li

Abstract Motivation It is a common practice in epigenetics research to profile DNA methylation on tissue samples, which is usually a mixture of different cell types. To properly account for the mixture, estimating cell compositions has been recognized as an important first step. Many methods were developed for quantifying cell compositions from DNA methylation data, but they mostly have limited applications due to lack of reference or prior information. Results We develop Tsisal, a novel complete deconvolution method which accurately estimate cell compositions from DNA methylation data without any prior knowledge of cell types or their proportions. Tsisal is a full pipeline to estimate number of cell types, cell compositions, and identify cell-type-specific CpG sites. It can also assign cell type labels when (full or part of) reference panel is available. Extensive simulation studies and analyses of seven real data sets demonstrate the favorable performance of our proposed method compared with existing deconvolution methods serving similar purpose. Availability The proposed method Tsisal is implemented as part of the R/Bioconductor package TOAST at https://bioconductor.org/packages/TOAST. Contact [email protected] and [email protected]. Supplementary information Supplementary data are available at Bioinformatics online.



BMC Genomics ◽  
2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Kimberly C Paul ◽  
Alexandra M Binder ◽  
Steve Horvath ◽  
Cynthia Kusters ◽  
Qi Yan ◽  
...  

Abstract Background Aging and inflammation are important components of Parkinson’s disease (PD) pathogenesis and both are associated with changes in hematopoiesis and blood cell composition. DNA methylation (DNAm) presents a mechanism to investigate inflammation, aging, and hematopoiesis in PD, using epigenetic mitotic aging and aging clocks. Here, we aimed to define the influence of blood cell lineage on epigenetic mitotic age and then investigate mitotic age acceleration with PD, while considering epigenetic age acceleration biomarkers. Results We estimated epigenetic mitotic age using the “epiTOC” epigenetic mitotic clock in 10 different blood cell populations and in a population-based study of PD with whole-blood. Within subject analysis of the flow-sorted purified blood cell types DNAm showed a clear separation of epigenetic mitotic age by cell lineage, with the mitotic age significantly lower in myeloid versus lymphoid cells (p = 2.1e-11). PD status was strongly associated with accelerated epigenetic mitotic aging (AccelEpiTOC) after controlling for cell composition (OR = 2.11, 95 % CI = 1.56, 2.86, p = 1.6e-6). AccelEpiTOC was also positively correlated with extrinsic epigenetic age acceleration, a DNAm aging biomarker related to immune system aging (with cell composition adjustment: R = 0.27, p = 6.5e-14), and both were independently associated with PD. Among PD patients, AccelEpiTOC measured at baseline was also associated with longitudinal motor and cognitive symptom decline. Conclusions The current study presents a first look at epigenetic mitotic aging in PD and our findings suggest accelerated hematopoietic cell mitosis, possibly reflecting immune pathway imbalances, in early PD that may also be related to motor and cognitive progression.



2016 ◽  
Vol 17 (1) ◽  
Author(s):  
E. Andres Houseman ◽  
Molly L. Kile ◽  
David C. Christiani ◽  
Tan A. Ince ◽  
Karl T. Kelsey ◽  
...  


2015 ◽  
Vol 16 (1) ◽  
Author(s):  
E Andres Houseman ◽  
Karl T Kelsey ◽  
John K Wiencke ◽  
Carmen J Marsit


Sign in / Sign up

Export Citation Format

Share Document