Systematic evaluation of cell-type deconvolution pipelines for sequencing-based bulk DNA methylomes

DNA methylation sequencing is becoming increasingly popular, yielding genome-wide methylome data at single-base pair resolution through the novel cost- and labor-optimized protocols. It has tremendous potential for cell-type heterogeneity analysis, particularly in tumors, due to intrinsic read-level information. Although diverse deconvolution methods were developed to infer cell-type composition based on bulk sequencing-based methylomes, their systematic evaluation has not been performed so far. Here, we thoroughly review and evaluate five previously published deconvolution methods: Bayesian epiallele detection (BED), PRISM, csmFinder + coMethy, ClubCpG and MethylPurify, together with two array-based methods, MeDeCom and Houseman as a comparison group. Sequencing-based deconvolution methods consist of two main steps, informative region selection and cell-type composition estimation. Accordingly, we individually assessed the performance of each step and demonstrated the impact of the former step upon the performance of the following one. In conclusion, we demonstrate the best method showing the highest accuracy in different samples, and infer factors affecting cell-type deconvolution performance according to the number of cell types in the mixture. We found that cell-type deconvolution performance is influenced by different factors according to the number of components in the mixture. Whereas selecting similar genomic regions to DMRs generally contributed to increasing the performance in bi-component mixtures, the uniformity of cell-type distribution showed a high correlation with the performance in five cell-type bulk analyses.

Download Full-text

Benchmarking of cell type deconvolution pipelines for transcriptomics data

Nature Communications ◽

10.1038/s41467-020-19015-1 ◽

2020 ◽

Vol 11 (1) ◽

Author(s):

Francisco Avila Cobos ◽

José Alquicira-Hernandez ◽

Joseph E. Powell ◽

Pieter Mestdagh ◽

Katleen De Preter

Keyword(s):

Cell Types ◽

Cell Type ◽

Factors Affecting ◽

Marker Selection ◽

Cell Type Composition ◽

Type Composition ◽

Comparable Performance ◽

Transcriptomics Data ◽

Combined Impact ◽

The Impact

AbstractMany computational methods have been developed to infer cell type proportions from bulk transcriptomics data. However, an evaluation of the impact of data transformation, pre-processing, marker selection, cell type composition and choice of methodology on the deconvolution results is still lacking. Using five single-cell RNA-sequencing (scRNA-seq) datasets, we generate pseudo-bulk mixtures to evaluate the combined impact of these factors. Both bulk deconvolution methodologies and those that use scRNA-seq data as reference perform best when applied to data in linear scale and the choice of normalization has a dramatic impact on some, but not all methods. Overall, methods that use scRNA-seq data have comparable performance to the best performing bulk methods whereas semi-supervised approaches show higher error values. Moreover, failure to include cell types in the reference that are present in a mixture leads to substantially worse results, regardless of the previous choices. Altogether, we evaluate the combined impact of factors affecting the deconvolution task across different datasets and propose general guidelines to maximize its performance.

Download Full-text

Comprehensive benchmarking of computational deconvolution of transcriptomics data

10.1101/2020.01.10.897116 ◽

2020 ◽

Cited By ~ 2

Author(s):

Francisco Avila Cobos ◽

José Alquicira-Hernandez ◽

Joseph Powell ◽

Pieter Mestdagh ◽

Katleen De Preter

Keyword(s):

Single Cell ◽

Cell Types ◽

Cell Type ◽

Factors Affecting ◽

Marker Selection ◽

Cell Type Composition ◽

Type Composition ◽

Comparable Performance ◽

Transcriptomics Data ◽

Combined Impact

AbstractMany computational methods to infer cell type proportions from bulk transcriptomics data have been developed. Attempts comparing these methods revealed that the choice of reference marker signatures is far more important than the method itself. However, a thorough evaluation of the combined impact of data transformation, pre-processing, marker selection, cell type composition and choice of methodology on the results is still lacking.Using different single-cell RNA-sequencing (scRNA-seq) datasets, we generated hundreds of pseudo-bulk mixtures to evaluate the combined impact of these factors on the deconvolution results. Along with methods to perform deconvolution of bulk RNA-seq data we also included five methods specifically designed to infer the cell type composition of bulk data using scRNA-seq data as reference.Both bulk and single-cell deconvolution methods perform best when applied to data in linear scale and the choice of normalization can have a dramatic impact on the performance of some, but not all methods. Overall, single-cell methods have comparable performance to the best performing bulk methods and bulk methods based on semi-supervised approaches showed higher error and lower correlation values between the computed and the expected proportions. Moreover, failure to include cell types in the reference that are present in a mixture always led to substantially worse results, regardless of any of the previous choices. Taken together, we provide a thorough evaluation of the combined impact of the different factors affecting the computational deconvolution task across different datasets and propose general guidelines to maximize its performance.

Download Full-text

Systematic evaluation of transcriptomics-based deconvolution methods and references using thousands of clinical samples

10.1101/2021.03.09.434660 ◽

2021 ◽

Author(s):

Brian B. Nadel ◽

Meritxell Oliva ◽

Benjamin L. Shou ◽

Keith Mitchell ◽

Feiyang Ma ◽

...

Keyword(s):

Expression Profiles ◽

Clinical Care ◽

Cell Types ◽

Cell Counting ◽

Systematic Evaluation ◽

Clinical Samples ◽

Cell Type ◽

Cell Type Composition ◽

Type Composition

AbstractEstimating cell type composition of blood and tissue samples is a biological challenge relevant in both laboratory studies and clinical care. In recent years, a number of computational tools have been developed to estimate cell type abundance using gene expression data. While these tools use a variety of approaches, they all leverage expression profiles from purified cell types to evaluate the cell type composition within samples. In this study, we compare ten deconvolution tools and evaluate their performance while using each of eleven separate reference profiles. Specifically, we have run deconvolution tools on over 4,000 samples with known cell type proportions, spanning both immune and stromal cell types. Twelve of these represent in vitro synthetic mixtures and 300 represent in silico synthetic mixtures prepared using single cell data. A final 3,728 clinical samples have been collected from the Framingham Cohort, for which cell populations have been quantified using electrical impedance cell counting. When tools are applied to the Framingham dataset, the tool EPIC produces the highest correlation while GEDIT produces the lowest error. The best tool for other datasets is varied, but CIBERSORT and GEDIT most consistently produce accurate results. In terms of reference choice, we find that the Human Primary Cell Atlas (HPCA) and references published by the EPIC authors produce accurate results for the largest number of tools and datasets. When applying deconvolution to blood samples, the leukocyte reference matrix LM22 is also a suitable choice, usually (but not always) outperforming HPCA and EPIC. Running time varies substantially across tools. For as many as 5052 samples, SaVanT and dtangle reliably finish in under one minute, while slower tools may require up to two hours. However, when using custom references, CIBERSORT can run very slowly, taking over 24 hours to complete for large datasets. We conclude that combining the best tools with optimal reference datasets can provide significant gains in accuracy when carrying out deconvolution tasks.

Download Full-text

Transit-amplifying cells coordinate changes in intestinal epithelial cell-type composition

10.1101/840371 ◽

2019 ◽

Author(s):

Laura E. Sanman ◽

Ina W. Chen ◽

Jake M. Bieber ◽

Veronica Steri ◽

Byron Hann ◽

...

Keyword(s):

Quantitative Imaging ◽

Cell Types ◽

Culture Conditions ◽

Tissue Cell ◽

Specific Cell ◽

Cell Type ◽

Cell Type Composition ◽

Type Composition ◽

Coordinate Changes

AbstractRenewing tissues have the remarkable ability to continually produce both proliferative progenitor and specialized differentiated cell-types. How are complex milieus of microenvironmental signals interpreted to coordinate tissue cell-type composition? Here, we develop a high-throughput approach that combines organoid technology and quantitative imaging to address this question in the context of the intestinal epithelium. Using this approach, we comprehensively survey enteroid responses to individual and paired perturbations to eight epithelial signaling pathways. We uncover culture conditions that enrich for specific cell-types, including Lgr5+ stem and enteroendocrine cells. We analyze interactions between perturbations and dissect mechanisms underlying an unexpected mutual antagonism between EGFR and IL-4 signals. Finally, we show that, across diverse perturbations, modulating proliferation of transit-amplifying cells also consistently changes the composition of differentiated secretory and absorptive cell-types. This property is conserved in vivo and can arise from differential amplification of secretory and absorptive progenitor cells. Taken together, the observations highlight an underappreciated role for transit-amplifying cells in which proliferation of these short-lived progenitors provides a lineage-based mechanism for tuning differentiated cell-type composition.

Download Full-text

Adipose tissue in health and disease through the lens of its building blocks

10.1101/316083 ◽

2018 ◽

Cited By ~ 2

Author(s):

Michael Lenz ◽

Ilja C.W. Arts ◽

Ralf L.M. Peeters ◽

Theo M. de Kok ◽

Gökhan Ertaylan

Keyword(s):

Adipose Tissue ◽

Cell Types ◽

Cellular Heterogeneity ◽

Tissue Cell ◽

Cell Type ◽

Tissue Samples ◽

Cell Type Composition ◽

Type Composition ◽

Adipose Tissue Cell ◽

Cellular Markers

AbstractBackgroundHighly specialized cells work in synergy forming tissues to perform functions required for the survival of organisms. Understanding this tissue-specific cellular heterogeneity and homeostasis is essential to comprehend the development of diseases within the tissue and also for developing regenerative therapies. Cellular subpopulations in the adipose tissue have been related to disease development, but efforts towards characterizing the adipose tissue cell type composition are limited due to lack of robust cell surface markers, limited access to tissue samples, and the labor-intensive process required to identify them.ResultsWe propose a framework, identifying cellular heterogeneity while providing state-of-the-art cellular markers for each cell type present in tissues using transcriptomics level analysis. We validate our approach with an independent dataset and present the most comprehensive study of adipose tissue cell type composition to date, determining the relative amounts of 21 different cell types in 779 adipose tissue samples detailing differences across four adipose tissue depots, between genders, across ranges of BMI and in different stages of type-2 diabetes. We also highlight the heterogeneity in reported marker-based studies of adipose tissue cell type composition and provide novel cellular markers to distinguish different cell types within the adipose tissue.ConclusionsOur study provides a systematic framework for studying cell type composition in a given tissue and valuable insights into adipose tissue cell type heterogeneity in health and disease.

Download Full-text

LRcell: detecting the source of differential expression at the sub-cell type level from bulk RNA-seq data

10.1101/2021.08.10.455821 ◽

2021 ◽

Author(s):

Wenjing Ma ◽

Sumeet Sharma ◽

Peng Jin ◽

Shannon L Gourley ◽

Zhaohui Qin

Keyword(s):

Single Cell ◽

Cell Types ◽

Marker Genes ◽

Bioconductor Package ◽

Rna Seq ◽

Cell Type ◽

Reference Dataset ◽

Cell Type Composition ◽

Type Composition ◽

Differential Gene

The rapid proliferation of single-cell RNA-sequencing (scRNA-seq) datasets have revealed cell heterogeneity at unprecedented scales. Several deconvolution methods have been developed to decompose bulk experiments to reveal cell type contributions. However, these methods lack power in identifying the accurate cell type composition when having a considerable amount of sub-cell types in the reference dataset. Here, we present LRcell, a R Bioconductor package (http://bioconductor.org/packages/release/bioc/html/LRcell.html) aiming to identify specific sub-cell type(s) that drives the changes observed in a bulk RNA-seq differential gene expression experiment. In addition, LRcell provides pre-embedded marker genes computed from putative single-cell RNA-seq experiments as options to execute the analyses.

Download Full-text

DNA Methylation Profiles of Purified Cell Types in Bronchoalveolar Lavage: Applications for Mixed Cell Paediatric Pulmonary Studies

Frontiers in Immunology ◽

10.3389/fimmu.2021.788705 ◽

2021 ◽

Vol 12 ◽

Author(s):

Shivanthan Shanthikumar ◽

Melanie R. Neeland ◽

Richard Saffery ◽

Sarath C. Ranganathan ◽

Alicia Oshlack ◽

...

Keyword(s):

Dna Methylation ◽

Bronchoalveolar Lavage ◽

Association Studies ◽

Cell Types ◽

Alveolar Epithelial Cells ◽

Cell Type ◽

Mixed Cell ◽

Alveolar Epithelial ◽

Cell Type Composition ◽

Type Composition

In epigenome-wide association studies analysing DNA methylation from samples containing multiple cell types, it is essential to adjust the analysis for cell type composition. One well established strategy for achieving this is reference-based cell type deconvolution, which relies on knowledge of the DNA methylation profiles of purified constituent cell types. These are then used to estimate the cell type proportions of each sample, which can then be incorporated to adjust the association analysis. Bronchoalveolar lavage is commonly used to sample the lung in clinical practice and contains a mixture of different cell types that can vary in proportion across samples, affecting the overall methylation profile. A current barrier to the use of bronchoalveolar lavage in DNA methylation-based research is the lack of reference DNA methylation profiles for each of the constituent cell types, thus making reference-based cell composition estimation difficult. Herein, we use bronchoalveolar lavage samples collected from children with cystic fibrosis to define DNA methylation profiles for the four most common and clinically relevant cell types: alveolar macrophages, granulocytes, lymphocytes and alveolar epithelial cells. We then demonstrate the use of these methylation profiles in conjunction with an established reference-based methylation deconvolution method to estimate the cell type composition of two different tissue types; a publicly available dataset derived from artificial blood-based cell mixtures and further bronchoalveolar lavage samples. The reference DNA methylation profiles developed in this work can be used for future reference-based cell type composition estimation of bronchoalveolar lavage. This will facilitate the use of this tissue in studies examining the role of DNA methylation in lung health and disease.

Download Full-text

Notch signalling patterns retinal composition by regulating atoh7 during post-embryonic growth

10.1101/363010 ◽

2018 ◽

Cited By ~ 1

Author(s):

Alicia Pérez Saturnino ◽

Katharina Lust ◽

Joachim Wittbrodt

Keyword(s):

De Novo ◽

Cell Types ◽

Notch Signalling ◽

Cell Type ◽

Lineage Specification ◽

Cell Type Composition ◽

Type Composition ◽

Cell Niche ◽

Functional Relevance ◽

Type Specification

AbstractPatterning of a continuously growing naive field in the context of a life-long growing organ, the teleost eye is of highest functional relevance. Intrinsic and extrinsic signals were proposed to regulate lineage specification in progenitors that exit the stem cell niche in the ciliary marginal zone (CMZ). The proper cell type composition arising from those progenitors is prerequisite for retinal function. Our findings in the teleost medaka (Oryzias latipes) uncover that the Notch–Atoh7 axis continuously patterns the CMZ. The complement of cell-types originating from the two juxtaposed progenitors marked by Notch or Atoh7 activity contains all constituents of a retinal column. Modulation of Notch signalling specifically in Atoh7-expressing cells demonstrates the crucial role of this axis in generating the correct cell type proportions. After transiently blocking Notch signalling, retinal patterning and differentiation is reinitiated de novo. Taken together we show that Notch activity in the CMZ continuously structures the growing retina by juxtaposing Notch and Atoh7 progenitors giving rise to distinct, complementary lineages, revealing a coupling of de novo patterning and cell-type specification in the respective lineages.

Download Full-text

The effect of tissue composition on gene co-expression

Briefings in Bioinformatics ◽

10.1093/bib/bbz135 ◽

2019 ◽

Cited By ~ 6

Author(s):

Yun Zhang ◽

Jonavelle Cuerdo ◽

Marc K Halushka ◽

Matthew N McCall

Keyword(s):

Expression Patterns ◽

Real Data ◽

Cell Types ◽

Tissue Level ◽

Tissue Composition ◽

Cell Type ◽

Tissue Samples ◽

Cell Type Composition ◽

Type Composition ◽

Component Cell

Abstract Variable cellular composition of tissue samples represents a significant challenge for the interpretation of genomic profiling studies. Substantial effort has been devoted to modeling and adjusting for compositional differences when estimating differential expression between sample types. However, relatively little attention has been given to the effect of tissue composition on co-expression estimates. In this study, we illustrate the effect of variable cell-type composition on correlation-based network estimation and provide a mathematical decomposition of the tissue-level correlation. We show that a class of deconvolution methods developed to separate tumor and stromal signatures can be applied to two component cell-type mixtures. In simulated and real data, we identify conditions in which a deconvolution approach would be beneficial. Our results suggest that uncorrelated cell-type-specific markers are ideally suited to deconvolute both the expression and co-expression patterns of an individual cell type. We provide a Shiny application for users to interactively explore the effect of cell-type composition on correlation-based co-expression estimation for any cell types of interest.

Download Full-text

Single-Nucleus Sequencing of an Entire Mammalian Heart: Cell Type Composition and Velocity

Cells ◽

10.3390/cells9020318 ◽

2020 ◽

Vol 9 (2) ◽

pp. 318 ◽

Cited By ~ 6

Author(s):

Markus Wolfien ◽

Anne-Marie Galow ◽

Paula Müller ◽

Madeleine Bartsch ◽

Ronald M. Brunner ◽

...

Keyword(s):

Cell Types ◽

Cellular Level ◽

Heart Cell ◽

Cell Type ◽

Mammalian Heart ◽

Cell Type Composition ◽

Type Composition ◽

A Cell ◽

Single Nucleus ◽

First Time

Analyses on the cellular level are indispensable to expand our understanding of complex tissues like the mammalian heart. Single-nucleus sequencing (snRNA-seq) allows for the exploration of cellular composition and cell features without major hurdles of single-cell sequencing. We used snRNA-seq to investigate for the first time an entire adult mammalian heart. Single-nucleus quantification and clustering led to an accurate representation of cell types, revealing 24 distinct clusters with endothelial cells (28.8%), fibroblasts (25.3%), and cardiomyocytes (22.8%) constituting the major cell populations. An additional RNA velocity analysis allowed us to study transcription kinetics and was utilized to visualize the transitions between mature and nascent cellular states of the cell types. We identified subgroups of cardiomyocytes with distinct marker profiles. For example, the expression of Hand2os1 distinguished immature cardiomyocytes from differentiated cardiomyocyte populations. Moreover, we found a cell population that comprises endothelial markers as well as markers clearly related to cardiomyocyte function. Our velocity data support the idea that this population is in a trans-differentiation process from an endothelial cell-like phenotype towards a cardiomyocyte-like phenotype. In summary, we present the first report of sequencing an entire adult mammalian heart, providing realistic cell-type distributions combined with RNA velocity kinetics hinting at interrelations.

Download Full-text