scholarly journals Accurate estimation of cell-type composition from gene expression data

2019 ◽  
Vol 10 (1) ◽  
Author(s):  
Daphne Tsoucas ◽  
Rui Dong ◽  
Haide Chen ◽  
Qian Zhu ◽  
Guoji Guo ◽  
...  
GigaScience ◽  
2021 ◽  
Vol 10 (2) ◽  
Author(s):  
Brian B Nadel ◽  
David Lopez ◽  
Dennis J Montoya ◽  
Feiyang Ma ◽  
Hannah Waddel ◽  
...  

Abstract Background The cell type composition of heterogeneous tissue samples can be a critical variable in both clinical and laboratory settings. However, current experimental methods of cell type quantification (e.g., cell flow cytometry) are costly, time consuming and have potential to introduce bias. Computational approaches that use expression data to infer cell type abundance offer an alternative solution. While these methods have gained popularity, most fail to produce accurate predictions for the full range of platforms currently used by researchers or for the wide variety of tissue types often studied. Results We present the Gene Expression Deconvolution Interactive Tool (GEDIT), a flexible tool that utilizes gene expression data to accurately predict cell type abundances. Using both simulated and experimental data, we extensively evaluate the performance of GEDIT and demonstrate that it returns robust results under a wide variety of conditions. These conditions include multiple platforms (microarray and RNA-seq), tissue types (blood and stromal), and species (human and mouse). Finally, we provide reference data from 8 sources spanning a broad range of stromal and hematopoietic types in both human and mouse. GEDIT also accepts user-submitted reference data, thus allowing the estimation of any cell type or subtype, provided that reference data are available. Conclusions GEDIT is a powerful method for evaluating the cell type composition of tissue samples and provides excellent accuracy and versatility compared to similar tools. The reference database provided here also allows users to obtain estimates for a wide variety of tissue samples without having to provide their own data.


2019 ◽  
Author(s):  
Gregory J. Hunt ◽  
Johann A. Gagnon-Bartsch

ABSTRACTComplex tissues are composed of a large number of different types of cells, each involved in a multitude of biological processes. Consequently, an important component to understanding such processes is understanding the cell-type composition of the tissues. Estimating cell type composition using high-throughput gene expression data is known as cell-type deconvolution. In this paper, we first summarize the extensive deconvolution literature by identifying a common regression-like approach to deconvolution. We call this approach the Unified Deconvolution-as-Regression (UDAR) framework. While methods that fall under this framework all use a similar model, they fit using data on different scales. Two popular scales for gene expression data are logarithmic and linear. Unfortunately, each of these scales has problems in the UDAR framework. Using log-scale gene expressions proposes a biologically implausible model and using linear-scale gene expressions will lead to statistically inefficient estimators. To overcome these problems, we propose a new approach for cell-type deconvolution that works on a hybrid of the two scales. This new approach is biologically plausible and improves statistical efficiency. We compare the hybrid approach to other methods on simulations as well as a collection of eleven real benchmark datasets. Here, we find the hybrid approach to be accurate and robust.deconvolution, gene expression, microarray, RNA-seq


2014 ◽  
Vol 23 (10) ◽  
pp. 2721-2728 ◽  
Author(s):  
S. De Jong ◽  
M. Neeleman ◽  
J. J. Luykx ◽  
M. J. Ten Berg ◽  
E. Strengman ◽  
...  

2021 ◽  
Vol 8 ◽  
Author(s):  
Marianthi Kalafati ◽  
Michael Lenz ◽  
Gökhan Ertaylan ◽  
Ilja C. W. Arts ◽  
Chris T. Evelo ◽  
...  

Background: Macrophages play an important role in regulating adipose tissue function, while their frequencies in adipose tissue vary between individuals. Adipose tissue infiltration by high frequencies of macrophages has been linked to changes in adipokine levels and low-grade inflammation, frequently associated with the progression of obesity. The objective of this project was to assess the contribution of relative macrophage frequencies to the overall subcutaneous adipose tissue gene expression using publicly available datasets.Methods: Seven publicly available microarray gene expression datasets from human subcutaneous adipose tissue biopsies (n = 519) were used together with TissueDecoder to determine the adipose tissue cell-type composition of each sample. We divided the subjects in four groups based on their relative macrophage frequencies. Differential gene expression analysis between the high and low relative macrophage frequencies groups was performed, adjusting for sex and study. Finally, biological processes were identified using pathway enrichment and network analysis.Results: We observed lower frequencies of adipocytes and higher frequencies of adipose stem cells in individuals characterized by high macrophage frequencies. We additionally studied whether, within subcutaneous adipose tissue, interindividual differences in the relative frequencies of macrophages were reflected in transcriptional differences in metabolic and inflammatory pathways. Adipose tissue of individuals with high macrophage frequencies had a higher expression of genes involved in complement activation, chemotaxis, focal adhesion, and oxidative stress. Similarly, we observed a lower expression of genes involved in lipid metabolism, fatty acid synthesis, and oxidation and mitochondrial respiration.Conclusion: We present an approach that combines publicly available subcutaneous adipose tissue gene expression datasets with a deconvolution algorithm to calculate subcutaneous adipose tissue cell-type composition. The results showed the expected increased inflammation gene expression profile accompanied by decreased gene expression in pathways related to lipid metabolism and mitochondrial respiration in subcutaneous adipose tissue in individuals characterized by high macrophage frequencies. This approach demonstrates the hidden strength of reusing publicly available data to gain cell-type-specific insights into adipose tissue function.


eLife ◽  
2017 ◽  
Vol 6 ◽  
Author(s):  
Julien Racle ◽  
Kaat de Jonge ◽  
Petra Baumgaertner ◽  
Daniel E Speiser ◽  
David Gfeller

Immune cells infiltrating tumors can have important impact on tumor progression and response to therapy. We present an efficient algorithm to simultaneously estimate the fraction of cancer and immune cell types from bulk tumor gene expression data. Our method integrates novel gene expression profiles from each major non-malignant cell type found in tumors, renormalization based on cell-type-specific mRNA content, and the ability to consider uncharacterized and possibly highly variable cell types. Feasibility is demonstrated by validation with flow cytometry, immunohistochemistry and single-cell RNA-Seq analyses of human melanoma and colorectal tumor specimens. Altogether, our work not only improves accuracy but also broadens the scope of absolute cell fraction predictions from tumor gene expression data, and provides a unique novel experimental benchmark for immunogenomics analyses in cancer research (http://epic.gfellerlab.org).


2020 ◽  
Vol 17 (6) ◽  
pp. 621-628 ◽  
Author(s):  
Zhichao Miao ◽  
Pablo Moreno ◽  
Ni Huang ◽  
Irene Papatheodorou ◽  
Alvis Brazma ◽  
...  

2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Bárbara Andrade Barbosa ◽  
Saskia D. van Asten ◽  
Ji Won Oh ◽  
Arantza Farina-Sarasqueta ◽  
Joanne Verheij ◽  
...  

AbstractDeconvolution of bulk gene expression profiles into the cellular components is pivotal to portraying tissue’s complex cellular make-up, such as the tumor microenvironment. However, the inherently variable nature of gene expression requires a comprehensive statistical model and reliable prior knowledge of individual cell types that can be obtained from single-cell RNA sequencing. We introduce BLADE (Bayesian Log-normAl Deconvolution), a unified Bayesian framework to estimate both cellular composition and gene expression profiles for each cell type. Unlike previous comprehensive statistical approaches, BLADE can handle > 20 types of cells due to the efficient variational inference. Throughout an intensive evaluation with > 700 simulated and real datasets, BLADE demonstrated enhanced robustness against gene expression variability and better completeness than conventional methods, in particular, to reconstruct gene expression profiles of each cell type. In summary, BLADE is a powerful tool to unravel heterogeneous cellular activity in complex biological systems from standard bulk gene expression data.


2020 ◽  
Author(s):  
Bárbara Andrade Barbosa ◽  
Saskia van Asten ◽  
Ji-won Oh ◽  
Arantza Fariña-Sarasqueta ◽  
Joanne Verheij ◽  
...  

Abstract High-resolution deconvolution of bulk gene expression profiles is pivotal to characterize the complex cellular make-up of tissues, such as tumor microenvironment. Single-cell RNA-seq provides reliable prior knowledge for deconvolution, however, a comprehensive statistical model is required for efficient utilization due to the inherently variable nature of gene expression. We introduce BLADE (Bayesian Log-normAl Deconvolution), a comprehensive probabilistic framework to estimate both cellular make-up and gene expression profiles of each cell type in each sample. Unlike previous comprehensive statistical approaches, BLADE can handle >20 cell types thanks to the efficient variational inference. Throughout an intensive evaluation using >700 datasets, BLADE showed enhanced robustness against gene expression variability and better completeness than conventional methods, in particular to reconstruct gene expression profiles of each cell type. All-in-all, BLADE is a powerful tool to unravel heterogeneous cellular activity in complex biological systems based on standard bulk gene expression data.


Sign in / Sign up

Export Citation Format

Share Document