Reference-Based Versus Reference-Free Cell Type Estimation In DNA Methylation Studies Using Human Placental Tissue

Abstract The placenta is a central organ during early development, influencing trajectories of health and disease. DNA methylation (DNAm) studies of human placenta improve our understanding of how its function relates to disease risk. However, DNAm studies can be biased by cell type heterogeneity, so it is essential to control for this in order to reduce confounding and increase precision. Computational cell type deconvolution approaches have proven to be very useful for this purpose. For human placenta, however, an assessment of the performance of these estimation methods is still lacking. Here, we compare the predictive performance of reference-based versus reference-free estimated proportions of cell types from genome-wide DNAm in placental samples taken at birth and from chorion villus biopsies early in pregnancy using three independent studies comprising over 1,000 samples. We found both reference-free and reference-based estimated cell type proportions to have predictive value for DNAm, however, reference-based cell type estimation outperformed reference-free estimation for the majority of data sets. Reference-based cell type estimations mirror previous histological knowledge on changes in cell type proportions through gestation. Further, CpGs whose variation in DNAm was largely explained by reference-based estimated cell type proportions were in the proximity of genes that are highly tissue-specific for placenta. This was not the case for reference-free estimated cell type proportions. We provide a list of these CpGs as a resource to help researchers to interpret results of existing studies and improve future DNAm studies of human placenta.

Download Full-text

DNA methylation and gene expression integration in cardiovascular disease

Clinical Epigenetics ◽

10.1186/s13148-021-01064-y ◽

2021 ◽

Vol 13 (1) ◽

Author(s):

Guillermo Palou-Márquez ◽

Isaac Subirana ◽

Lara Nonell ◽

Alba Fernández-Sanlés ◽

Roberto Elosua

Keyword(s):

Gene Expression ◽

Cardiovascular Disease ◽

Dna Methylation ◽

Cardiovascular Diseases ◽

Disease Risk ◽

Cardiovascular Disease Risk ◽

Risk Function ◽

Predictive Biomarkers ◽

Independent Study ◽

Cell Type

Abstract Background The integration of different layers of omics information is an opportunity to tackle the complexity of cardiovascular diseases (CVD) and to identify new predictive biomarkers and potential therapeutic targets. Our aim was to integrate DNA methylation and gene expression data in an effort to identify biomarkers related to cardiovascular disease risk in a community-based population. We accessed data from the Framingham Offspring Study, a cohort study with data on DNA methylation (Infinium HumanMethylation450 BeadChip; Illumina) and gene expression (Human Exon 1.0 ST Array; Affymetrix). Using the MOFA2 R package, we integrated these data to identify biomarkers related to the risk of presenting a cardiovascular event. Results Four independent latent factors (9, 19, 21—only in women—and 27), driven by DNA methylation, were associated with cardiovascular disease independently of classical risk factors and cell-type counts. In a sensitivity analysis, we also identified factor 21 as associated with CVD in women. Factors 9, 21 and 27 were also associated with coronary heart disease risk. Moreover, in a replication effort in an independent study three of the genes included in factor 27 were also present in a factor identified to be associated with myocardial infarction (CDC42BPB, MAN2A2 and RPTOR). Factor 9 was related to age and cell-type proportions; factor 19 was related to age and B cells count; factor 21 pointed to human immunodeficiency virus infection-related pathways and inflammation; and factor 27 was related to lifestyle factors such as alcohol consumption, smoking and body mass index. Inclusion of factor 21 (only in women) improved the discriminative and reclassification capacity of the Framingham classical risk function and factor 27 improved its discrimination. Conclusions Unsupervised multi-omics data integration methods have the potential to provide insights into the pathogenesis of cardiovascular diseases. We identified four independent factors (one only in women) pointing to inflammation, endothelium homeostasis, visceral fat, cardiac remodeling and lifestyles as key players in the determination of cardiovascular risk. Moreover, two of these factors improved the predictive capacity of a classical risk function.

Download Full-text

EMeth: An EM algorithm for cell type decomposition based on DNA methylation data

Scientific Reports ◽

10.1038/s41598-021-84864-9 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Hanyu Zhang ◽

Ruoyi Cai ◽

James Dai ◽

Wei Sun

Keyword(s):

Dna Methylation ◽

Tumor Cells ◽

T Regulatory Cells ◽

Simulated Data ◽

Cell Types ◽

Computational Method ◽

Methylation Data ◽

Cell Type ◽

A Cell ◽

Type Decomposition

AbstractWe introduce a new computational method named EMeth to estimate cell type proportions using DNA methylation data. EMeth is a reference-based method that requires cell type-specific DNA methylation data from relevant cell types. EMeth improves on the existing reference-based methods by detecting the CpGs whose DNA methylation are inconsistent with the deconvolution model and reducing their contributions to cell type decomposition. Another novel feature of EMeth is that it allows a cell type with known proportions but unknown reference and estimates its methylation. This is motivated by the case of studying methylation in tumor cells while bulk tumor samples include tumor cells as well as other cell types such as infiltrating immune cells, and tumor cell proportion can be estimated by copy number data. We demonstrate that EMeth delivers more accurate estimates of cell type proportions than several other methods using simulated data and in silico mixtures. Applications in cancer studies show that the proportions of T regulatory cells estimated by DNA methylation have expected associations with mutation load and survival time, while the estimates from gene expression miss such associations.

Download Full-text

Differences in DNA methylation of white blood cell types at birth and in adulthood reflect postnatal immune maturation and influence accuracy of cell type prediction

10.1101/399279 ◽

2018 ◽

Cited By ~ 3

Author(s):

Meaghan J Jones ◽

Louie Dinh ◽

Hamid Reza Razzaghian ◽

Olivia de Goede ◽

Julia L MacIsaac ◽

...

Keyword(s):

Dna Methylation ◽

Immune System ◽

Blood Cell ◽

Cord Blood ◽

White Blood Cell ◽

Blood Cells ◽

Cell Types ◽

Cell Type ◽

Cell Type Specific ◽

The Impact

AbstractBackgroundDNA methylation profiling of peripheral blood leukocytes has many research applications, and characterizing the changes in DNA methylation of specific white blood cell types between newborn and adult could add insight into the maturation of the immune system. As a consequence of developmental changes, DNA methylation profiles derived from adult white blood cells are poor references for prediction of cord blood cell types from DNA methylation data. We thus examined cell-type specific differences in DNA methylation in leukocyte subsets between cord and adult blood, and assessed the impact of these differences on prediction of cell types in cord blood.ResultsThough all cell types showed differences between cord and adult blood, some specific patterns stood out that reflected how the immune system changes after birth. In cord blood, lymphoid cells showed less variability than in adult, potentially demonstrating their naïve status. In fact, cord CD4 and CD8 T cells were so similar that genetic effects on DNA methylation were greater than cell type effects in our analysis, and CD8 T cell frequencies remained difficult to predict, even after optimizing the library used for cord blood composition estimation. Myeloid cells showed fewer changes between cord and adult and also less variability, with monocytes showing the fewest sites of DNA methylation change between cord and adult. Finally, including nucleated red blood cells in the reference library was necessary for accurate cell type predictions in cord blood.ConclusionChanges in DNA methylation with age were highly cell type specific, and those differences paralleled what is known about the maturation of the postnatal immune system.

Download Full-text

Integration of DNA methylation and gene transcription across nineteen cell types reveals cell type-specific and genomic region-dependent regulatory patterns

Scientific Reports ◽

10.1038/s41598-017-03837-z ◽

2017 ◽

Vol 7 (1) ◽

Cited By ~ 13

Author(s):

Binhua Tang ◽

Yufan Zhou ◽

Chiou-Miin Wang ◽

Tim H.-M. Huang ◽

Victor X. Jin

Keyword(s):

Dna Methylation ◽

Gene Transcription ◽

Cell Types ◽

Genomic Region ◽

Cell Type ◽

Cell Type Specific

Download Full-text

DNA Methylation Atlas of the Mouse Brain at Single-Cell Resolution

10.1101/2020.04.30.069377 ◽

2020 ◽

Cited By ~ 1

Author(s):

Hanqing Liu ◽

Jingtian Zhou ◽

Wei Tian ◽

Chongyuan Luo ◽

Anna Bartlett ◽

...

Keyword(s):

Dna Methylation ◽

Mouse Brain ◽

Spatial Organization ◽

Brain Area ◽

Cell Types ◽

Regulatory Elements ◽

Mammalian Brain ◽

Open Chromatin ◽

Cell Type ◽

Single Nucleus

SummaryMammalian brain cells are remarkably diverse in gene expression, anatomy, and function, yet the regulatory DNA landscape underlying this extensive heterogeneity is poorly understood. We carried out a comprehensive assessment of the epigenomes of mouse brain cell types by applying single nucleus DNA methylation sequencing to profile 110,294 nuclei from 45 regions of the mouse cortex, hippocampus, striatum, pallidum, and olfactory areas. We identified 161 cell clusters with distinct spatial locations and projection targets. We constructed taxonomies of these epigenetic types, annotated with signature genes, regulatory elements, and transcription factors. These features indicate the potential regulatory landscape supporting the assignment of putative cell types, and reveal repetitive usage of regulators in excitatory and inhibitory cells for determining subtypes. The DNA methylation landscape of excitatory neurons in the cortex and hippocampus varied continuously along spatial gradients. Using this deep dataset, an artificial neural network model was constructed that precisely predicts single neuron cell-type identity and brain area spatial location. Integration of high-resolution DNA methylomes with single-nucleus chromatin accessibility data allowed prediction of high-confidence enhancer-gene interactions for all identified cell types, which were subsequently validated by cell-type-specific chromatin conformation capture experiments. By combining multi-omic datasets (DNA methylation, chromatin contacts, and open chromatin) from single nuclei and annotating the regulatory genome of hundreds of cell types in the mouse brain, our DNA methylation atlas establishes the epigenetic basis for neuronal diversity and spatial organization throughout the mouse brain.

Download Full-text

Accurate prediction of cell composition, age, smoking consumption and infection serostatus based on blood DNA methylation profiles

10.1101/456996 ◽

2018 ◽

Author(s):

Jacob Bergstedt ◽

Alejandra Urrutia ◽

Darragh Duffy ◽

Matthew L. Albert ◽

Lluís Quintana-Murci ◽

...

Keyword(s):

Dna Methylation ◽

Blood Cell ◽

Regression Models ◽

Disease Risk ◽

Association Studies ◽

Cell Types ◽

Cellular Heterogeneity ◽

Cell Composition ◽

Healthy Donors ◽

Circulating Levels

DNA methylation is a stable epigenetic alteration that plays a key role in cellular differentiation and gene regulation, and that has been proposed to mediate environmental effects on disease risk. Epigenome-wide association studies have identified and replicated associations between methylation sites and several disease conditions, which could serve as biomarkers in predictive medicine and forensics. Nevertheless, heterogeneity in cellular proportions between the compared groups could complicate interpretation. Reference-based cell-type deconvolution methods have proven useful in correcting epigenomic studies for cellular heterogeneity, but they rely on reference libraries of sorted cells and only predict a limited number of cell populations. Here we leverage >850,000 methylation sites included in the MethylationEPIC array and use elastic net regularized and stability selected regression models to predict the circulating levels of 70 blood cell subsets, measured by standardized flow cytometry in 962 healthy donors of western European descent. We show that our predictions, based on a hundred of methylation sites or lower, are less error-prone than other existing methods, and extend the number of cell types that can be accurately predicted. Application of the same methods to age, smoking consumption and several serological responses to pathogen antigens also provide accurate estimations. Together, our study substantially improves predictions of blood cell composition based on methylation profiles, which will be critical in the emerging field of medical epigenomics.

Download Full-text

Cell-type specific cis-eQTLs in eight brain cell-types identifies novel risk genes for human brain disorders

10.1101/2021.10.09.21264604 ◽

2021 ◽

Author(s):

Julien Bryois ◽

Daniela Calini ◽

Will Macnair ◽

Lynette Foo ◽

Eduard Urich ◽

...

Keyword(s):

Gene Expression ◽

Disease Risk ◽

Regulation Of Gene Expression ◽

Cell Types ◽

Specific Cell ◽

Eqtl Analysis ◽

Cell Type ◽

Disease Etiology ◽

Risk Genes ◽

Cell Type Specific

Most expression quantitative trait loci (eQTL) studies to date have been performed in heterogeneous brain tissues as opposed to specific cell types. To investigate the genetics of gene expression in adult human cell types from the central nervous system (CNS), we performed an eQTL analysis using single nuclei RNA-seq from 196 individuals in eight CNS cell types. We identified 6108 eGenes, a substantial fraction (43%, 2620 out of 6108) of which show cell-type specific effects, with strongest effects in microglia. Integration of CNS cell-type eQTLs with GWAS revealed novel relationships between expression and disease risk for neuropsychiatric and neurodegenerative diseases. For most GWAS loci, a single gene colocalized in a single cell type providing new clues into disease etiology. Our findings demonstrate substantial contrast in genetic regulation of gene expression among CNS cell types and reveal genetic mechanisms by which disease risk genes influence neurological disorders.

Download Full-text

DNA Methylation Profiles of Purified Cell Types in Bronchoalveolar Lavage: Applications for Mixed Cell Paediatric Pulmonary Studies

Frontiers in Immunology ◽

10.3389/fimmu.2021.788705 ◽

2021 ◽

Vol 12 ◽

Author(s):

Shivanthan Shanthikumar ◽

Melanie R. Neeland ◽

Richard Saffery ◽

Sarath C. Ranganathan ◽

Alicia Oshlack ◽

...

Keyword(s):

Dna Methylation ◽

Bronchoalveolar Lavage ◽

Association Studies ◽

Cell Types ◽

Alveolar Epithelial Cells ◽

Cell Type ◽

Mixed Cell ◽

Alveolar Epithelial ◽

Cell Type Composition ◽

Type Composition

In epigenome-wide association studies analysing DNA methylation from samples containing multiple cell types, it is essential to adjust the analysis for cell type composition. One well established strategy for achieving this is reference-based cell type deconvolution, which relies on knowledge of the DNA methylation profiles of purified constituent cell types. These are then used to estimate the cell type proportions of each sample, which can then be incorporated to adjust the association analysis. Bronchoalveolar lavage is commonly used to sample the lung in clinical practice and contains a mixture of different cell types that can vary in proportion across samples, affecting the overall methylation profile. A current barrier to the use of bronchoalveolar lavage in DNA methylation-based research is the lack of reference DNA methylation profiles for each of the constituent cell types, thus making reference-based cell composition estimation difficult. Herein, we use bronchoalveolar lavage samples collected from children with cystic fibrosis to define DNA methylation profiles for the four most common and clinically relevant cell types: alveolar macrophages, granulocytes, lymphocytes and alveolar epithelial cells. We then demonstrate the use of these methylation profiles in conjunction with an established reference-based methylation deconvolution method to estimate the cell type composition of two different tissue types; a publicly available dataset derived from artificial blood-based cell mixtures and further bronchoalveolar lavage samples. The reference DNA methylation profiles developed in this work can be used for future reference-based cell type composition estimation of bronchoalveolar lavage. This will facilitate the use of this tissue in studies examining the role of DNA methylation in lung health and disease.

Download Full-text

Guidelines for cell-type heterogeneity quantification based on a comparative analysis of reference-free DNA methylation deconvolution software

10.1101/698050 ◽

2019 ◽

Cited By ~ 4

Author(s):

Clementine Decamps ◽

Florian Privé ◽

Raphael Bacher ◽

Daniel Jost ◽

Arthur Waguet ◽

...

Keyword(s):

Dna Methylation ◽

Cell Types ◽

R Package ◽

Lessons Learned ◽

Future Research ◽

Cell Type ◽

Software Packages ◽

Response To Chemotherapy ◽

Key Factor ◽

Pre Treatment

AbstractCell-type heterogeneity of tumors is a key factor in tumor progression and response to chemotherapy. Tumor cell-type heterogeneity, defined as the proportion of the various cell-types in a tumor, can be inferred from DNA methylation of surgical specimens. However, confounding factors known to associate with methylation values, such as age and sex, complicate accurate inference of cell-type proportions. While reference-free algorithms have been developed to infer cell-type proportions from DNA methylation, a comparative evaluation of the performance of these methods is still lacking.Here we use simulations to evaluate several computational pipelines based on the software packages MeDeCom, EDec, and RefFreeEWAS. We identify that accounting for confounders, feature selection, and the choice of the number of estimated cell types are critical steps for inferring cell-type proportions. We find that removal of methylation probes which are correlated with confounder variables reduces the error of inference by 30-35%, and that selection of cell-type informative probes has similar effect. We show that Cattell’s rule based on the scree plot is a powerful tool to determine the number of cell-types. Once the pre-treatment steps are achieved, the three deconvolution methods provide comparable results. We observe that all the algorithms’ performance improves when inter-sample variation of cell-type proportions is large or when the number of available samples is large. We find that under specific circumstances the methods are sensitive to the initialization method, suggesting that averaging different solutions or optimizing initialization is an avenue for future research. Based on the lessons learned, to facilitate pipeline validation and catalyze further pipeline improvement by the community, we develop a benchmark pipeline for inference of cell-type proportions and implement it in the R package medepir.

Download Full-text

Differential DNA methylation and changing cell-type proportions as fibrotic stage progresses in NAFLD

Clinical Epigenetics ◽

10.1186/s13148-021-01129-y ◽

2021 ◽

Vol 13 (1) ◽

Author(s):

Nicholas D. Johnson ◽

Xiumei Wu ◽

Christopher D. Still ◽

Xin Chu ◽

Anthony T. Petrick ◽

...

Keyword(s):

Dna Methylation ◽

Cell Types ◽

Disease Stage ◽

Rna Seq ◽

Cell Type ◽

Alcoholic Fatty Liver ◽

Cell Composition ◽

Cpg Sites ◽

Plausible Mechanism ◽

The Relationship

Abstract Background Non-alcoholic fatty liver disease (NAFLD) is characterized by changes in cell composition that occur throughout disease pathogenesis, which includes the development of fibrosis in a subset of patients. DNA methylation (DNAm) is a plausible mechanism underlying these shifts, considering that DNAm profiles differ across tissues and cell types, and DNAm may play a role in cell-type differentiation. Previous work investigating the relationship between DNAm and fibrosis in NAFLD has been limited by sample size and the number of CpG sites interrogated. Results Here, we performed an epigenome-wide analysis using Infinium MethylationEPIC array data from 325 individuals with NAFLD, including 119 with severe fibrosis and 206 with no histological evidence of fibrosis. After adjustment for latent confounders, we identified 7 CpG sites whose DNAm associated with fibrosis (p < 5.96 × 10–8). Analysis of RNA-seq data collected from a subset of individuals (N = 56) revealed that gene expression at 288 genes associated with DNAm at one or more of the 7 fibrosis-related CpGs. DNAm-based estimates of cell-type proportions showed that estimated proportions of natural killer cells increased, while epithelial cell proportions decreased with disease stage. Finally, we used an elastic net regression model to assess DNAm as a biomarker of fibrotic stage and found that our model predicted fibrosis with a sensitivity of 0.93 and provided information beyond a model based solely on cell-type proportions. Conclusion These findings are consistent with DNAm as a mechanism underpinning or marking fibrosis-related shifts in cell composition and demonstrate the potential of DNAm as a possible biomarker of NAFLD fibrosis.

Download Full-text