CytoPro: A computational platform for accurate and robust assessment of cell contributions in bulk expression from diverse tissues and conditions
AbstractCells are the quanta unit of biology and their relative composition in a tissue is the prime driver of bulk tissue gene expression variation. When there is no cell information, deconvolution is an effective tool to achieve cell resolution, which provides important information for learning disease complexity and its interactions with treatments, drugs and/or the environment in a wide variety of contexts. Here we present CytoPro, a production-level tissue and condition-specific deconvolution platform, based on a large collection of human tissue-specific signatures derived from single and sorted cells. CytoPro infer per-sample multiple cell-type composition, given input bulk gene expression. CytoPro includes a rigorous QC pipeline for learning, generating and selecting signatures and performs internal automated validation using multiple QC test criteria including: Comparison to ground truth cytometry and pure sorted cells data, performance evaluation using simulated data including robustness to noise as well as agreement with biological expectations in validation datasets regarding genes and cells. We demonstrate that CytoPro outperforms existing deconvolution tools, in both accuracy and robustness. By exploring multiple datasets with predefined disease phenotypes, and analyzing a use-case of biological treatment response, we show the ability of CytoPro to flush out relevant cell biology in real pathological conditions.