scholarly journals Correcting for cell-type heterogeneity in epigenome-wide association studies: premature analyses and conclusions

2017 ◽  
Author(s):  
Shijie C Zheng ◽  
Stephan Beck ◽  
Andrew E. Jaffe ◽  
Devin C. Koestler ◽  
Kasper D. Hansen ◽  
...  

AbstractRecently, a study by Rahmani et al [1] claimed that a reference-free cell-type deconvolution method, called ReFACTor, leads to improved power and improved estimates of cell-type composition compared to competing reference-free and reference-based methods in the context of Epigenome-Wide Association Studies (EWAS). However, we identified many critical flaws (both conceptual and statistical in nature), which seriously question the validity of their claims. We outlined constructive criticism in a recent correspondence letter, Zheng et al [2]. The purpose of this letter is two-fold. First, to present additional analyses, which demonstrate that our original criticism is statistically sound. Second, to highlight additional serious concerns, which Rahmani et al have not yet addressed. In summary, we find that ReFACTor has not been demonstrated to outperform state-of-the-art reference-free methods such as SVA or RefFreeEWAS, nor state-of-the-art reference-based methods. Thus, the claim by Rahmani et al (a claim reiterated in their recent response letter [3]) that ReFACT or represents an advance over the state-of-the-art is not supported by an objective and rigorous statistical analysis of the data.

2021 ◽  
Vol 12 ◽  
Author(s):  
Shivanthan Shanthikumar ◽  
Melanie R. Neeland ◽  
Richard Saffery ◽  
Sarath C. Ranganathan ◽  
Alicia Oshlack ◽  
...  

In epigenome-wide association studies analysing DNA methylation from samples containing multiple cell types, it is essential to adjust the analysis for cell type composition. One well established strategy for achieving this is reference-based cell type deconvolution, which relies on knowledge of the DNA methylation profiles of purified constituent cell types. These are then used to estimate the cell type proportions of each sample, which can then be incorporated to adjust the association analysis. Bronchoalveolar lavage is commonly used to sample the lung in clinical practice and contains a mixture of different cell types that can vary in proportion across samples, affecting the overall methylation profile. A current barrier to the use of bronchoalveolar lavage in DNA methylation-based research is the lack of reference DNA methylation profiles for each of the constituent cell types, thus making reference-based cell composition estimation difficult. Herein, we use bronchoalveolar lavage samples collected from children with cystic fibrosis to define DNA methylation profiles for the four most common and clinically relevant cell types: alveolar macrophages, granulocytes, lymphocytes and alveolar epithelial cells. We then demonstrate the use of these methylation profiles in conjunction with an established reference-based methylation deconvolution method to estimate the cell type composition of two different tissue types; a publicly available dataset derived from artificial blood-based cell mixtures and further bronchoalveolar lavage samples. The reference DNA methylation profiles developed in this work can be used for future reference-based cell type composition estimation of bronchoalveolar lavage. This will facilitate the use of this tissue in studies examining the role of DNA methylation in lung health and disease.


Epigenomics ◽  
2020 ◽  
Author(s):  
Yen-Chen A Feng ◽  
Yichen Guo ◽  
Lucile Pain ◽  
G Mark Lathrop ◽  
Catherine Laprise ◽  
...  

Aim: To develop a method for estimating cell-specific effects in epigenomic association studies in the presence of cell type heterogeneity. Materials & methods: We utilized Monte Carlo Expectation-Maximization (MCEM) algorithm with Metropolis–Hastings sampler to reconstruct the ‘missing’ cell-specific methylations and to estimate their associations with phenotypes free of confounding by cell type proportions. Results: Simulations showed reliable performance of the method under various settings including when the cell type is rare. Application to a real dataset recapitulated the directly measured cell-specific methylation pattern in whole blood. Conclusion: This work provides a framework to identify important cell groups and account for cell type composition useful for studying the role of epigenetic changes in human traits and diseases.


2020 ◽  
Author(s):  
Miao Rui ◽  
Dang Qi ◽  
Huang Hai Hui ◽  
Xia Liang Yong ◽  
Yong Liang

Abstract Background: In epigenome-wide association studies (EWAS), the mixed methylation expression caused by the combination of different cell types may lead the researchers to find the false methylation site related to the phenotype of interest. In order to fix this problem, researchers have proposed some non-reference methods based on sparse principle component analysis (PCA) to correct the EWAS false discovery. However, the existing model assumes that all methylation site have the same a priori probability in each PC load, but it is known that there already has network structure in the genetic variable corresponding to the methylation site. In this paper, we show that the results of the existing EWAS correction model are still not good enough. If we can integrate the existing methylation network as prior knowledge into the sparse PCA model, we can effectively improve the correction ability of the existing model. Result: Based on the above ideas, we propose GN-ReFAEWAS, a model which uses the prior methylation gene network structure into the PCA framework for feature extraction. This model can be used to correct the false discovery in EWAS. GN-ReFAEWAS model does not need cell counting data and can estimate cell type composition through methylation principal component data. The key of this model is to solve a sparse regularize problem of methylation network. This paper uses regularize and random sampling algorithm to solve this problem. We used one simulated data set and three real data sets for experiments and compared four existing EWAS calibration models. The experimental results show that the GN-ReFAEWAS model is superior to existing models. Conclusion: The result proved that GN-ReFAEWAS model can provide a better estimation of cell-type composition and reduce the false positives in EWAS.


2018 ◽  
Author(s):  
Nelson Johansen ◽  
Gerald Quon

AbstractscRNA-seq dataset integration occurs in different contexts, such as the identification of cell type-specific differences in gene expression across conditions or species, or batch effect correction. We present scAlign, an unsupervised deep learning method for data integration that can incorporate partial, overlapping or a complete set of cell labels, and estimate per-cell differences in gene expression across datasets. scAlign performance is state-of-the-art and robust to cross-dataset variation in cell type-specific expression and cell type composition. We demonstrate that scAlign identifies a rare cell population likely to drive malaria transmission. Our framework is widely applicable to integration challenges in other domains.


Author(s):  
Shijie C Zheng ◽  
Charles E Breeze ◽  
Stephan Beck ◽  
Danyue Dong ◽  
Tianyu Zhu ◽  
...  

Abstract Summary It is well recognized that cell-type heterogeneity hampers the interpretation of Epigenome-Wide Association Studies (EWAS). Many tools have emerged to address this issue, including several R/Bioconductor packages that infer cell-type composition. Here we present a web application for cell-type deconvolution, which offers the functionality of our EpiDISH Bioconductor/R package in a user-friendly GUI environment. Users can upload their data to infer cell-type composition and differentially methylated cytosines in individual cell-types (DMCTs) for a range of different tissues. Availability and implementation EpiDISH web server is implemented with Shiny in R, and is freely available at https://www.biosino.org/EpiDISH/.


2014 ◽  
Vol 11 (3) ◽  
pp. 309-311 ◽  
Author(s):  
James Zou ◽  
Christoph Lippert ◽  
David Heckerman ◽  
Martin Aryee ◽  
Jennifer Listgarten

2020 ◽  
Author(s):  
Bryce Rowland ◽  
Ruth Huh ◽  
Zoey Hou ◽  
Ming Hu ◽  
Yin Shen ◽  
...  

AbstractHi-C data provide population averaged estimates of three-dimensional chromatin contacts across cell types and states in bulk samples. To effectively leverage Hi-C data for biological insights, we need to control for the confounding factor of differential cell type proportions across heterogeneous bulk samples. We propose a novel unsupervised deconvolution method for inferring cell type composition from bulk Hi-C data, the Two-step Hi-c UNsupervised DEconvolution appRoach (THUNDER). We conducted extensive real data based simulations to test THUNDER constructed from published single-cell Hi-C (scHi-C) data. THUNDER more accurately estimates the underlying cell type proportions when compared to both supervised and unsupervised deconvolution methods including CIBERSORT, TOAST, and NMF. THUNDER will be a useful tool in adjusting for varying cell type composition in population samples, facilitating valid and more powerful downstream analysis such as differential chromatin organization studies. Additionally, THUNDER estimates cell-type-specific chromatin contact profiles for all cell types in bulk Hi-C mixtures. These estimated contact profiles provide a useful exploratory framework to investigate cell-type-specificity of the chromatin interactome while experimental data is still sparse.


Sign in / Sign up

Export Citation Format

Share Document