scholarly journals Nabo – a framework to define leukemia-initiating cells and differentiation in single-cell RNA-sequencing data

2020 ◽  
Author(s):  
Parashar Dhapola ◽  
Mohamed Eldeeb ◽  
Amol Ugale ◽  
Rasmus Olofzon ◽  
Eva Erlandsson ◽  
...  

ABSTRACTSingle-cell transcriptomics facilitates innovative approaches to define and identify cell types within tissues and cell populations. An emerging interest in the cancer field is to assess the heterogeneity of transformed cells, including the identification of tumor-initiating cells based on similarities to their normal counterparts. However, such cell mapping is often confounded by the large effects on total gene expression programs introduced by strong perturbations such as an oncogenic event. Here, we present Nabo, a novel computational method that allows mapping of cells from one population to the most similar cells in a reference population, independently of confounding changes to gene expression programs initiated by perturbation. We validated this method on multiple datasets from different sources and platforms and show that Nabo achieves higher rates of accuracy than conventional classification methods. Nabo is available as an integrated toolkit for preprocessing, cell mapping, differential gene expression identification, and visualization of single-cell RNA-Seq data. For exploratory studies, Nabo includes methods to help evaluate the reliability of cell mapping results. We applied Nabo on droplet-based single-cell RNA-Seq data of healthy and oncogene-induced (MLL-ENL) hematopoietic progenitor cells (GMLPs) differentiating in vitro. Despite a substantial cellular heterogeneity resulting from differentiation of GMLPs and the large transcriptional effects induced by the fusion oncogene, Nabo could pinpoint the specific cell stage where differentiation arrest occurs, which included an immunophenotypic definition of the tumor-initiating population. Thus, Nabo allows for relevant comparison between target and control cells, without being confounded by differences in population heterogeneity.

2020 ◽  
Author(s):  
Lin Li ◽  
Hao Dai ◽  
Zhaoyuan Fang ◽  
Luonan Chen

AbstractThe rapid advancement of single cell technologies has shed new light on the complex mechanisms of cellular heterogeneity. However, compared with bulk RNA sequencing (RNA-seq), single-cell RNA-seq (scRNA-seq) suffers from higher noise and lower coverage, which brings new computational difficulties. Based on statistical independence, cell-specific network (CSN) is able to quantify the overall associations between genes for each cell, yet suffering from a problem of overestimation related to indirect effects. To overcome this problem, we propose the “conditional cell-specific network” (CCSN) method, which can measure the direct associations between genes by eliminating the indirect associations. CCSN can be used for cell clustering and dimension reduction on a network basis of single cells. Intuitively, each CCSN can be viewed as the transformation from less “reliable” gene expression to more “reliable” gene-gene associations in a cell. Based on CCSN, we further design network flow entropy (NFE) to estimate the differentiation potency of a single cell. A number of scRNA-seq datasets were used to demonstrate the advantages of our approach: (1) one direct association network for one cell; (2) most existing scRNA-seq methods designed for gene expression matrices are also applicable to CCSN-transformed degree matrices; (3) CCSN-based NFE helps resolving the direction of differentiation trajectories by quantifying the potency of each cell. CCSN is publicly available at http://sysbio.sibcb.ac.cn/cb/chenlab/soft/CCSN.zip.


2020 ◽  
Author(s):  
Matthew N. Bernstein ◽  
Zijian Ni ◽  
Michael Collins ◽  
Mark E. Burkard ◽  
Christina Kendziorski ◽  
...  

AbstractBackgroundSingle-cell RNA-seq (scRNA-seq) enables the profiling of genome-wide gene expression at the single-cell level and in so doing facilitates insight into and information about cellular heterogeneity within a tissue. Perhaps nowhere is this more important than in cancer, where tumor and tumor microenvironment heterogeneity directly impact development, maintenance, and progression of disease. While publicly available scRNA-seq cancer datasets offer unprecedented opportunity to better understand the mechanisms underlying tumor progression, metastasis, drug resistance, and immune evasion, much of the available information has been underutilized, in part, due to the lack of tools available for aggregating and analysing these data.ResultsWe present CHARacterizing Tumor Subpopulations (CHARTS), a computational pipeline and web application for analyzing, characterizing, and integrating publicly available scRNA-seq cancer datasets. CHARTS enables the exploration of individual gene expression, cell type, malignancy-status, differentially expressed genes, and gene set enrichment results in subpopulations of cells across multiple tumors and datasets.ConclusionCHARTS is an easy to use, comprehensive platform for exploring single-cell subpopulations within tumors across the ever-growing collection of public scRNA-seq cancer datasets. CHARTS is freely available at charts.morgridge.org.


2019 ◽  
Author(s):  
Alemu Takele Assefa ◽  
Jo Vandesompele ◽  
Olivier Thas

SummarySPsimSeq is a semi-parametric simulation method for bulk and single cell RNA sequencing data. It simulates data from a good estimate of the actual distribution of a given real RNA-seq dataset. In contrast to existing approaches that assume a particular data distribution, our method constructs an empirical distribution of gene expression data from a given source RNA-seq experiment to faithfully capture the data characteristics of real data. Importantly, our method can be used to simulate a wide range of scenarios, such as single or multiple biological groups, systematic variations (e.g. confounding batch effects), and different sample sizes. It can also be used to simulate different gene expression units resulting from different library preparation protocols, such as read counts or UMI counts.Availability and implementationThe R package and associated documentation is available from https://github.com/CenterForStatistics-UGent/SPsimSeq.Supplementary informationSupplementary data are available at bioRχiv online.


2019 ◽  
Author(s):  
Gabriela S. Kinker ◽  
Alissa C. Greenwald ◽  
Rotem Tal ◽  
Zhanna Orlova ◽  
Michael S. Cuoco ◽  
...  

AbstractCultured cell lines are the workhorse of cancer research, but it is unclear to what extent they recapitulate the cellular heterogeneity observed among malignant cells in tumors, given the absence of a native tumor microenvironment. Here, we used multiplexed single cell RNA-seq to profile ~200 cancer cell lines. We uncovered expression programs that are recurrently heterogeneous within many cancer cell lines and are largely independent of observed genetic diversity. These programs of heterogeneity are associated with diverse biological processes, including cell cycle, senescence, stress and interferon responses, epithelial-to-mesenchymal transition, and protein maturation and degradation. Notably, some of these recurrent programs recapitulate those seen in human tumors, suggesting a prominent role of intrinsic plasticity in generating intra-tumoral heterogeneity. Moreover, the data allowed us to prioritize specific cell lines as model systems of cellular plasticity. We used two such models to demonstrate the dynamics, regulation and drug sensitivities associated with a cancer senescence program also observed in human tumors. Our work describes the landscape of cellular heterogeneity in diverse cancer cell lines, and identifies recurrent patterns of expression heterogeneity that are shared between tumors and specific cell lines and can thus be further explored in follow up studies.


2022 ◽  
Author(s):  
Sofya Lipnitskaya ◽  
Yang Shen ◽  
Stefan Legewie ◽  
Holger Klein ◽  
Kolja Becker

Abstract Background: Recent studies in the area of transcriptomics performed on single-cell and population levels reveal noticeable variability in gene expression measurements provided by different RNA sequencing technologies. Due to increased noise and complexity of single-cell RNA-Seq (scRNA-Seq) data over the bulk experiment, there is a substantial number of variably-expressed genes and so-called dropouts, challenging the subsequent computational analysis and potentially leading to false positive discoveries. In order to investigate factors affecting technical variability between RNA sequencing experiments of different technologies, we performed a systematic assessment of single-cell and bulk RNA-Seq data, which have undergone the same pre-processing and sample preparation procedures. Results: Our analysis indicates that variability between gene expression measurements as well as dropout events are not exclusively caused by biological variability, low expression levels, or random variation. Furthermore, we propose FAVSeq, a machine learning-assisted pipeline for detection of factors contributing to gene expression variability in matched RNA-Seq data provided by two technologies. Based on the analysis of the matched bulk and single-cell dataset, we found the 3'-UTR and transcript lengths as the most relevant effectors of the observed variation between RNA-Seq experiments, while the same factors together with cellular compartments were shown to be associated with dropouts. Conclusions: Here, we investigated the sources of variation in RNA-Seq profiles of matched single-cell and bulk experiments. In addition, we proposed the FAVSeq pipeline for analyzing multimodal RNA sequencing data, which allowed to identify factors affecting quantitative difference in gene expression measurements as well as the presence of dropouts. Hereby, the derived knowledge can be employed further in order to improve the interpretation of RNA-Seq data and identify genes that can be affected by assay-based deviations. Source code is available under the MIT license at https://github.com/slipnitskaya/FAVSeq.


2021 ◽  
Author(s):  
Manuel Neumann ◽  
Xiaocai Xu ◽  
Cezary Smaczniak ◽  
Julia Schumacher ◽  
Wenhao Yan ◽  
...  

Identity and functions of plant cells are influenced by their precise cellular location within the plant body. Cellular heterogeneity in growth and differentiation trajectories results in organ patterning. Therefore, assessing this heterogeneity at molecular scale is a major question in developmental biology. Single-cell transcriptomics (scRNA-seq) allows to characterize and quantify gene expression heterogeneity in developing organs at unprecedented resolution. However, the original physical location of the cell is lost during the scRNA-seq procedure. To recover the original location of cells is essential to link gene activity with cellular function and morphology. Here, we reconstruct genome-wide gene expression patterns of individual cells in a floral meristem by combining single-nuclei RNA-seq with 3D spatial reconstruction. By this, gene expression differences among meristematic domains giving rise to different tissue and organ types can be determined. As a proof of principle, the data are used to trace the initiation of vascular identity within the floral meristem. Our work demonstrates the power of spatially reconstructed single cell transcriptome atlases to understand plant morphogenesis. The floral meristem 3D gene expression atlas can be accessed at http://threed-flower-meristem.herokuapp.com


2019 ◽  
Author(s):  
Haruka Ozaki ◽  
Tetsutaro Hayashi ◽  
Mana Umeda ◽  
Itoshi Nikaido

AbstractBackgroundRead coverage of RNA sequencing data reflects gene expression and RNA processing events. Single-cell RNA sequencing (scRNA-seq) methods, particularly “full-length” ones, provide read coverage of many individual cells and have the potential to reveal cellular heterogeneity in RNA transcription and processing. However, visualization tools suited to highlighting cell-to-cell heterogeneity in read coverage are still lacking.ResultsHere, we have developed Millefy, a tool for visualizing read coverage of scRNA-seq data in genomic contexts. Millefy is designed to show read coverage of all individual cells at once in genomic contexts and to highlight cell-to-cell heterogeneity in read coverage. By visualizing read coverage of all cells as a heat map and dynamically reordering cells based on diffusion maps, Millefy facilitates discovery of “local” region-specific, cell-to-cell heterogeneity in read coverage, including variability of transcribed regions.ConclusionsMillefy simplifies the examination of cellular heterogeneity in RNA transcription and processing events using scRNA-seq data. Millefy is available as an R package (https://github.com/yuifu/millefy) and a Docker image to help use Millefy on the Jupyter notebook (https://hub.docker.com/r/yuifu/datascience-notebook-millefy).


2018 ◽  
Author(s):  
Yang Liu ◽  
Tao Wang ◽  
Deyou Zheng

AbstractSingle cell RNA-seq (scRNA-seq) has remarkably advanced our understanding of cellular heterogeneity and dynamics in tissue development, diseases, and cancers. Integrated data analysis can often uncover molecular and cellular links among individual datasets and thus provide new biological insights, such as developmental relationship. Due to differences in experimental platforms and biological sample batches, the integration of multiple scRNA-seq datasets is challenging. To address this, we developed a novel computational method for robust integration of scRNA-seq (RISC) datasets using principal component regression (PCR). Because of the natural compatibility of eigenvectors between PCR model and dimension reduction, RISC can accurately integrate scRNA-seq datasets and avoid over-integration. Compared to existing software, RISC shows particular improvement in integrating datasets that contain cells of the same types (more accurately clusters) but at distinct functional states. To demonstrate the value of RISC in finding small groups of cells common between otherwise heterogenous datasets, we applied it to scRNA-seq datasets of normal and malignant cells and successfully identified small clusters of cells in healthy kidney tissues that may be related to the origin of renal tumors.


2018 ◽  
Author(s):  
Verboom Karen ◽  
Everaert Celine ◽  
Bolduc Nathalie ◽  
Livak J. Kenneth ◽  
Yigit Nurten ◽  
...  

AbstractSingle cell RNA sequencing methods have been increasingly used to understand cellular heterogeneity. Nevertheless, most of these methods suffer from one or more limitations, such as focusing only on polyadenylated RNA, sequencing of only the 3’ end of the transcript, an exuberant fraction of reads mapping to ribosomal RNA, and the unstranded nature of the sequencing data. Here, we developed a novel single cell strand-specific total RNA library preparation method addressing all the aforementioned shortcomings. Our method was validated on a microfluidics system using three different cancer cell lines undergoing a chemical or genetic perturbation. We demonstrate that our total RNA-seq method detects an equal or higher number of genes compared to classic polyA[+] RNA-seq, including novel and non-polyadenylated genes. The obtained RNA expression patterns also recapitulate the expected biological signal. Inherent to total RNA-seq, our method is also able to detect circular RNAs. Taken together, SMARTer single cell total RNA sequencing is very well suited for any single cell sequencing experiment in which transcript level information is needed beyond polyadenylated genes.


2018 ◽  
Author(s):  
Davis J. McCarthy ◽  
Raghd Rostom ◽  
Yuanhua Huang ◽  
Daniel J. Kunz ◽  
Petr Danecek ◽  
...  

AbstractDecoding the clonal substructures of somatic tissues sheds light on cell growth, development and differentiation in health, ageing and disease. DNA-sequencing, either using bulk or using single-cell assays, has enabled the reconstruction of clonal trees from frequency and co-occurrence patterns of somatic variants. However, approaches to systematically characterize phenotypic and functional variations between individual clones are not established. Here we present cardelino (https://github.com/PMBio/cardelino), a computational method for inferring the clone of origin of individual cells that have been assayed using single-cell RNA-seq (scRNA-seq). After validating our model using simulations, we apply cardelino to matched scRNA-seq and exome sequencing data from 32 human dermal fibroblast lines, identifying hundreds of differentially expressed genes between cells from different somatic clones. These genes are frequently enriched for cell cycle and proliferation pathways, indicating a key role for cell division genes in non-neutral somatic evolution.Key findingsA novel approach for integrating DNA-seq and single-cell RNA-seq data to reconstruct clonal substructure for single-cell transcriptomes.Evidence for non-neutral evolution of clonal populations in human fibroblasts.Proliferation and cell cycle pathways are commonly distorted in mutated clonal populations.


Sign in / Sign up

Export Citation Format

Share Document