scholarly journals DEBay: a computational tool for deconvolution of quantitative PCR data for estimation of cell type-specifc gene expression in a mixed population

2020 ◽  
Author(s):  
Vimalathithan Devaraj ◽  
Biplab Bose

AbstractThe expression of a gene is commonly estimated by quantitative PCR (qPCR) using RNA isolated from a large number of pooled cells. Such pooled samples often have subpopulations of cells with different levels of expression of the target gene. Estimation of gene expression from an ensemble of cells obscures the pattern of expression in different subpopulations. Physical separation of various subpopulations is a demanding task. We have developed a computational tool, Deconvolution of Ensemble through Bayes-approach (DEBay), to estimate cell type-specific gene expression from qPCR data of a mixed population. DEBay estimates Normalized Gene Expression Coefficient (NGEC), which is a relative measure of the expression of the target gene in each cell type in a population. NGEC has a direct algebraic correspondence with the normalized fold change in gene expression measured by qPCR. DEBay can deconvolute both time-dependent and -independent gene expression profiles. It uses the Bayesian method of model selection and parameter estimation. We have evaluated DEBay using synthetic and real experimental data. DEBay is implemented in Python. A GUI of DEBay and its source code are available for download at SourceForge (https://sourceforge.net/projects/debay).

2013 ◽  
Vol 14 (1) ◽  
pp. 89 ◽  
Author(s):  
Yi Zhong ◽  
Ying-Wooi Wan ◽  
Kaifang Pang ◽  
Lionel ML Chow ◽  
Zhandong Liu

Author(s):  
Johan Gustafsson ◽  
Felix Held ◽  
Jonathan Robinson ◽  
Elias Björnson ◽  
Rebecka Jörnsten ◽  
...  

Abstract Background Cell-type specific gene expression profiles are needed for many computational methods operating on bulk RNA-Seq samples, such as deconvolution of cell-type fractions and digital cytometry. However, the gene expression profile of a cell type can vary substantially due to both technical factors and biological differences in cell state and surroundings, reducing the efficacy of such methods. Here, we investigated which factors contribute most to this variation. Results We evaluated different normalization methods, quantified the magnitude of variation introduced by different sources, and examined the differences between UMI-based single-cell RNA-Seq and bulk RNA-Seq. We applied methods such as random forest regression to a collection of publicly available bulk and single-cell RNA-Seq datasets containing B and T cells, and found that the technical variation across laboratories is of the same magnitude as the biological variation across cell types. Tissue of origin and cell subtype are less important but still substantial factors, while the difference between individuals is relatively small. We also show that much of the differences between UMI-based single-cell and bulk RNA-Seq methods can be explained by the number of read duplicates per mRNA molecule in the single-cell sample.Conclusions Our work shows the importance of either matching or correcting for technical factors when creating cell-type specific gene expression profiles that are to be used together with bulk samples.


2020 ◽  
Author(s):  
Johan Gustafsson ◽  
Felix Held ◽  
Jonathan Robinson ◽  
Elias Björnson ◽  
Rebecka Jörnsten ◽  
...  

Abstract Cell-type specific gene expression profiles are needed for many computational methods operating on bulk RNA-Seq samples, such as deconvolution of cell-type fractions and digital cytometry. However, the gene expression profile of a cell type can vary substantially due to both technical factors and biological differences in cell state and surroundings, reducing the efficacy of such methods. Here, we investigated which factors contribute most to this variation. We evaluated different normalization methods, quantified the variance explained by different factors, evaluated the effect on deconvolution of cell type fractions, and examined the differences between UMI-based single-cell RNA-Seq and bulk RNA-Seq. We investigated a collection of publicly available bulk and single-cell RNA-Seq datasets containing B and T cells, and found that the technical variation across laboratories is substantial, even for genes specifically selected for deconvolution, and has a confounding effect on deconvolution. Tissue of origin is also a substantial factor, highlighting the challenge of applying cell type profiles derived from blood on mixtures from other tissues. We also show that much of the differences between UMI-based single-cell and bulk RNA-Seq methods can be explained by the number of read duplicates per mRNA molecule in the single-cell sample. Our work shows the importance of either matching or correcting for technical factors when creating cell-type specific gene expression profiles that are to be used together with bulk samples.


2018 ◽  
Author(s):  
Kai Kang ◽  
Qian Meng ◽  
Igor Shats ◽  
David M. Umbach ◽  
Melissa Li ◽  
...  

AbstractThe cell type composition of many biological tissues varies widely across samples. Such sample heterogeneity hampers efforts to probe the role of each cell type in the tissue microenvironment. Current approaches that address this issue have drawbacks. Cell sorting or single-cell based experimental techniques disrupt in situ interactions and alter physiological status of cells in tissues. Computational methods are flexible and promising; but they often estimate either sample-specific proportions of each cell type or cell-type-specific gene expression profiles, not both, by requiring the other as input. We introduce a computational Complete Deconvolution method that can estimate both sample-specific proportions of each cell type and cell-type-specific gene expression profiles simultaneously using bulk RNA-Seq data only (CDSeq). We assessed our method’s performance using several synthetic and experimental mixtures of varied but known cell type composition and compared its performance to the performance of two state-of-the art deconvolution methods on the same mixtures. The results showed CDSeq can estimate both sample-specific proportions of each component cell type and cell-typespecificgene expression profiles with high accuracy. CDSeq holds promise for computationally deciphering complex mixtures of cell types, each with differing expression profiles, using RNA-seq data measured in bulk tissue (MATLAB code is available at https://github.com/kkang7/CDSeq_011).


2020 ◽  
Author(s):  
Abolfazl Doostparast Torshizi ◽  
Jubao Duan ◽  
Kai Wang

AbstractThe importance of cell type-specific gene expression in disease-relevant tissues is increasingly recognized in genetic studies of complex diseases. However, the vast majority of gene expression studies are conducted on bulk tissues, necessitating computational approaches to infer biological insights on cell type-specific contribution to diseases. Several computational methods are available for cell type deconvolution (that is, inference of cellular composition) from bulk RNA-Seq data, but cannot impute cell type-specific expression profiles. We hypothesize that with external prior information such as single cell RNA-seq (scRNA-seq) and population-wide expression profiles, it can be a computationally tractable and identifiable to estimate both cellular composition and cell type-specific expression from bulk RNA-Seq data. Here we introduce CellR, which addresses cross-individual gene expression variations by employing genome-wide tissue-wise expression signatures from GTEx to adjust the weights of cell-specific gene markers. It then transforms the deconvolution problem into a linear programming model while taking into account inter/intra cellular correlations, and uses a multi-variate stochastic search algorithm to estimate the expression level of each gene in each cell type. Extensive analyses on several complex diseases such as schizophrenia, Alzheimer’s disease, Huntington’s disease, and type 2 diabetes validated efficiency of CellR, while revealing how specific cell types contribute to different diseases. We conducted numerical simulations on human cerebellum to generate pseudo-bulk RNA-seq data and demonstrated its efficiency in inferring cell-specific expression profiles. Moreover, we inferred cell-specific expression levels from bulk RNA-seq data on schizophrenia and computed differentially expressed genes within certain cell types. Using predicted gene expression profile on excitatory neurons, we were able to reproduce our recently published findings on TCF4 being a master regulator in schizophrenia and showed how this gene and its targets are enriched in excitatory neurons. In summary, CellR compares favorably (both accuracy and stability of inference) against competing approaches on inferring cellular composition from bulk RNA-seq data, but also allows direct imputation of cell type-specific gene expression, opening new doors to re-analyze gene expression data on bulk tissues in complex diseases.


Sign in / Sign up

Export Citation Format

Share Document