scholarly journals A Pan-cancer Blueprint of the Heterogeneous Tumour Microenvironment Revealed by Single-Cell Profiling

Author(s):  
Junbin Qian ◽  
Siel Olbrecht ◽  
Bram Boeckx ◽  
Hanne Vos ◽  
Damya Laoui ◽  
...  

AbstractThe stromal compartment of the tumour microenvironment consists of a heterogeneous set of tissue-resident and tumour-infiltrating cells, which are profoundly moulded by cancer cells. An outstanding question is to what extent this heterogeneity is similar between cancers affecting different organs. Here, we profile 233,591 single cells from patients with lung, colorectal, ovary and breast cancer (n=36) and construct a pan-cancer blueprint of stromal cell heterogeneity using different single-cell RNA and protein-based technologies. We identify 68 stromal cell populations, of which 46 are shared between cancer types and 22 are unique. We also characterise each population phenotypically by highlighting its marker genes, transcription factors, metabolic activities and tissue-specific expression differences. Resident cell types are characterised by substantial tissue specificity, while tumour-infiltrating cell types are largely shared across cancer types. Finally, by applying the blueprint to melanoma tumours treated with checkpoint immunotherapy and identifying a naïve CD4+ T-cell phenotype predictive of response to checkpoint immunotherapy, we illustrate how it can serve as a guide to interpret scRNA-seq data. In conclusion, by providing a comprehensive blueprint through an interactive web server, we generate a first panoramic view on the shared complexity of stromal cells in different cancers.


2018 ◽  
Author(s):  
Douglas Abrams ◽  
Parveen Kumar ◽  
R. Krishna Murthy Karuturi ◽  
Joshy George

AbstractBackgroundThe advent of single cell RNA sequencing (scRNA-seq) enabled researchers to study transcriptomic activity within individual cells and identify inherent cell types in the sample. Although numerous computational tools have been developed to analyze single cell transcriptomes, there are no published studies and analytical packages available to guide experimental design and to devise suitable analysis procedure for cell type identification.ResultsWe have developed an empirical methodology to address this important gap in single cell experimental design and analysis into an easy-to-use tool called SCEED (Single Cell Empirical Experimental Design and analysis). With SCEED, user can choose a variety of combinations of tools for analysis, conduct performance analysis of analytical procedures and choose the best procedure, and estimate sample size (number of cells to be profiled) required for a given analytical procedure at varying levels of cell type rarity and other experimental parameters. Using SCEED, we examined 3 single cell algorithms using 48 simulated single cell datasets that were generated for varying number of cell types and their proportions, number of genes expressed per cell, number of marker genes and their fold change, and number of single cells successfully profiled in the experiment.ConclusionsBased on our study, we found that when marker genes are expressed at fold change of 4 or more than the rest of the genes, either Seurat or Simlr algorithm can be used to analyze single cell dataset for any number of single cells isolated (minimum 1000 single cells were tested). However, when marker genes are expected to be only up to fC 2 upregulated, choice of the single cell algorithm is dependent on the number of single cells isolated and proportion of rare cell type to be identified. In conclusion, our work allows the assessment of various single cell methods and also aids in examining the single cell experimental design.



2018 ◽  
Author(s):  
Nikos Konstantinides ◽  
Katarina Kapuralin ◽  
Chaimaa Fadil ◽  
Luendreo Barboza ◽  
Rahul Satija ◽  
...  

SummaryTranscription factors regulate the molecular, morphological, and physiological characters of neurons and generate their impressive cell type diversity. To gain insight into general principles that govern how transcription factors regulate cell type diversity, we used large-scale single-cell mRNA sequencing to characterize the extensive cellular diversity in the Drosophila optic lobes. We sequenced 55,000 single optic lobe neurons and glia and assigned them to 52 clusters of transcriptionally distinct single cells. We validated the clustering and annotated many of the clusters using RNA sequencing of characterized FACS-sorted single cell types, as well as marker genes specific to given clusters. To identify transcription factors responsible for inducing specific terminal differentiation features, we used machine-learning to generate a ‘random forest’ model. The predictive power of the model was confirmed by showing that two transcription factors expressed specifically in cholinergic (apterous) and glutamatergic (traffic-jam) neurons are necessary for the expression of ChAT and VGlut in many, but not all, cholinergic or glutamatergic neurons, respectively. We used a transcriptome-wide approach to show that the same terminal characters, including but not restricted to neurotransmitter identity, can be regulated by different transcription factors in different cell types, arguing for extensive phenotypic convergence. Our data provide a deep understanding of the developmental and functional specification of a complex brain structure.



Author(s):  
Ling-Ling Zheng ◽  
Jing-Hua Xiong ◽  
Wu-Jian Zheng ◽  
Jun-Hao Wang ◽  
Zi-Liang Huang ◽  
...  

Abstract Although long noncoding RNAs (lncRNAs) have significant tissue specificity, their expression and variability in single cells remain unclear. Here, we developed ColorCells (http://rna.sysu.edu.cn/colorcells/), a resource for comparative analysis of lncRNAs expression, classification and functions in single-cell RNA-Seq data. ColorCells was applied to 167 913 publicly available scRNA-Seq datasets from six species, and identified a batch of cell-specific lncRNAs. These lncRNAs show surprising levels of expression variability between different cell clusters, and has the comparable cell classification ability as known marker genes. Cell-specific lncRNAs have been identified and further validated by in vitro experiments. We found that lncRNAs are typically co-expressed with the mRNAs in the same cell cluster, which can be used to uncover lncRNAs’ functions. Our study emphasizes the need to uncover lncRNAs in all cell types and shows the power of lncRNAs as novel marker genes at single cell resolution.



2017 ◽  
Author(s):  
Giovanni Iacono ◽  
Elisabetta Mereu ◽  
Amy Guillaumet-Adkins ◽  
Roser Corominas ◽  
Ivon Cuscó ◽  
...  

AbstractSingle-cell RNA sequencing significantly deepened our insights into complex tissues and latest techniques are capable processing ten-thousands of cells simultaneously. With bigSCale, we provide an analytical framework being scalable to analyze millions of cells, addressing challenges of future large datasets. Unlike previous methods, bigSCale does not constrain data to fit an a priori-defined distribution and instead uses an accurate numerical model of noise. We evaluated the performance of bigSCale using a biological model of aberrant gene expression in patient derived neuronal progenitor cells and simulated datasets, which underlined its speed and accuracy in differential expression analysis. We further applied bigSCale to analyze 1.3 million cells from the mouse developing forebrain. Herein, we identified rare populations, such as Reelin positive Cajal-Retzius neurons, for which we determined a previously not recognized heterogeneity associated to distinct differentiation stages, spatial organization and cellular function. Together, bigSCale presents a perfect solution to address future challenges of large single-cell datasets.Extended AbstractSingle-cell RNA sequencing (scRNAseq) significantly deepened our insights into complex tissues by providing high-resolution phenotypes for individual cells. Recent microfluidic-based methods are scalable to ten-thousands of cells, enabling an unbiased sampling and comprehensive characterization without prior knowledge. Increasing cell numbers, however, generates extremely big datasets, which extends processing time and challenges computing resources. Current scRNAseq analysis tools are not designed to analyze datasets larger than from thousands of cells and often lack sensitivity and specificity to identify marker genes for cell populations or experimental conditions. With bigSCale, we provide an analytical framework for the sensitive detection of population markers and differentially expressed genes, being scalable to analyze millions of single cells. Unlike other methods that use simple or mixture probabilistic models with negative binomial, gamma or Poisson distributions to handle the noise and sparsity of scRNAseq data, bigSCale does not constrain the data to fit an a priori-defined distribution. Instead, bigSCale uses large sample sizes to estimate a highly accurate and comprehensive numerical model of noise and gene expression. The framework further includes modules for differential expression (DE) analysis, cell clustering and population marker identification. Moreover, a directed convolution strategy allows processing of extremely large data sets, while preserving the transcript information from individual cells.We evaluate the performance of bigSCale using a biological model for reduced or elevated gene expression levels. Specifically, we perform scRNAseq of 1,920 patient derived neuronal progenitor cells from Williams-Beuren and 7q11.23 microduplication syndrome patients, harboring a deletion or duplication of 7q11.23, respectively. The affected region contains 28 genes whose transcriptional levels vary in line with their allele frequency. BigSCale detects expression changes with respect to cells from a healthy donor and outperforms other methods for single-cell DE analysis in sensitivity. Simulated data sets, underline the performance of bigSCale in DE analysis as it is faster and more sensitive and specific than other methods. The probabilistic model of cell-distances within bigSCale is further suitable for unsupervised clustering and the identification of cell types and subpopulations. Using bigSCale, we identify all major cell types of the somatosensory cortex and hippocampus analyzing 3,005 cells from adult mouse brains. Remarkably, we increase the number of cell population specific marker genes 4-6-fold compared to the original analysis and, moreover, define markers of higher order cell types. These include CD90 (Thy1), a neuronal surface receptor, potentially suitable for isolating intact neurons from complex brain samples.To test its applicability for large data sets, we apply bigSCale on scRNAseq data from 1.3 million cells derived from the pallium of the mouse developing forebrain (E18, 10x Genomics). Our directed down-sampling strategy accumulates transcript counts from cells with similar transcriptional profiles into index cell transcriptomes, thereby defining cellular clusters with improved resolution. Accordingly, index cell clusters provide a rich resource of marker genes for the main brain cell types and less frequent subpopulations. Our analysis of rare populations includes poorly characterized developmental cell types, such as neuron progenitors from the subventricular zone and neocortical Reelin positive neurons known as Cajal-Retzius (CR) cells. The latter represent a transient population which regulates the laminar formation of the developing neocortex and whose malfunctioning causes major neurodevelopmental disorders like autism or schizophrenia. Most importantly, index cell cluster can be deconvoluted to individual cell level for targeted analysis of populations of interest. Through decomposition of Reelin positive neurons, we determined a previously not recognized heterogeneity among CR cells, which we could associate to distinct differentiation stages as well as spatial and functional differences in the developing mouse brain. Specifically, subtypes of CR cells identified by bigSCale express different compositions of NMDA, AMPA and glycine receptor subunits, pointing to subpopulations with distinct membrane properties. Furthermore, we found Cxcl12, a chemokine secreted by the meninges and regulating the tangential migration of CR cells, to be also expressed in CR cells located in the marginal zone of the neocortex, indicating a self-regulated migration capacity.Together, bigSCale presents a perfect solution for the processing and analysis of scRNAseq data from millions of single cells. Its speed and sensitivity makes it suitable to the address future challenges of large single-cell data sets.



2020 ◽  
Vol 36 (12) ◽  
pp. 3910-3912 ◽  
Author(s):  
Oscar Franzén ◽  
Johan L M Björkegren

Abstract Summary Single-cell RNA sequencing (scRNA-seq) is a technology to measure gene expression in single cells. It has enabled discovery of new cell types and established cell type atlases of tissues and organs. The widespread adoption of scRNA-seq has created a need for user-friendly software for data analysis. We have developed a web server, alona that incorporates several of the most popular single-cell analysis algorithms into a flexible pipeline. alona can perform quality filtering, normalization, batch correction, clustering, cell type annotation and differential gene expression analysis. Data are visualized in the web browser using an interface based on JavaScript, allowing the user to query genes of interest and visualize the cluster structure. alona accepts a compressed gene expression matrix and identifies cell clusters with a graph-based clustering strategy. Cell types are identified from a comprehensive collection of marker genes or by specifying a custom set of marker genes. Availability and implementation The service runs at https://alona.panglaodb.se and the Python package can be downloaded from https://oscar-franzen.github.io/adobo/. Supplementary information Supplementary data are available at Bioinformatics online.



2021 ◽  
Author(s):  
Michael J. Geuenich ◽  
Jinyu Hou ◽  
Sunyun Lee ◽  
Hartland W. Jackson ◽  
Kieran R. Campbell

AbstractThe creation of scalable single-cell and highly-multiplexed imaging technologies that profile the protein expression and phosphorylation status of heterogeneous cellular populations has led to multiple insights into disease processes including cancer initiation and progression. A major analytical challenge in interpreting the resulting data is the assignment of cells to a priori known cell types in a robust and interpretable manner. Existing approaches typically solve this by clustering cells followed by manual annotation of individual clusters or by strategies that gate protein expression at predefined thresholds. However, these often require several subjective analysis choices such as selecting the number of clusters and do not automatically assign cell types in line with prior biological knowledge. They further lack the ability to explicitly assign cells to an unknown or uncharacterized type, which exist in most highly multiplexed imaging experiments due to the limited number of markers quantified. To address these issues we present Astir, a probabilistic model to assign cells to cell types by integrating prior knowledge of marker proteins. Astir uses deep recognition neural networks for fast Bayesian inference, allowing for cell type annotations at the million-cell scale and in the absence of previously annotated reference data across multiple experimental modalities and antibody panels. We demonstrate that Astir outperforms existing approaches in terms of accuracy and robustness by applying it to over 2.1 million single cells from several suspension and imaging mass cytometry and microscopy datasets in multiple tissue contexts. We further showcase that Astir can be used for the fast analysis of the spatial architecture of the tumour microenvironment, automatically quantifying the immune influx and spatial heterogeneity of patient samples. Astir is freely available as an open source Python package at https://www.github.com/camlab-bioml/astir.



2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Ann J. Ligocki ◽  
Wen Fury ◽  
Christian Gutierrez ◽  
Christina Adler ◽  
Tao Yang ◽  
...  

AbstractBulk RNA sequencing of a tissue captures the gene expression profile from all cell types combined. Single-cell RNA sequencing identifies discrete cell-signatures based on transcriptomic identities. Six adult human corneas were processed for single-cell RNAseq and 16 cell clusters were bioinformatically identified. Based on their transcriptomic signatures and RNAscope results using representative cluster marker genes on human cornea cross-sections, these clusters were confirmed to be stromal keratocytes, endothelium, several subtypes of corneal epithelium, conjunctival epithelium, and supportive cells in the limbal stem cell niche. The complexity of the epithelial cell layer was captured by eight distinct corneal clusters and three conjunctival clusters. These were further characterized by enriched biological pathways and molecular characteristics which revealed novel groupings related to development, function, and location within the epithelial layer. Moreover, epithelial subtypes were found to reflect their initial generation in the limbal region, differentiation, and migration through to mature epithelial cells. The single-cell map of the human cornea deepens the knowledge of the cellular subsets of the cornea on a whole genome transcriptional level. This information can be applied to better understand normal corneal biology, serve as a reference to understand corneal disease pathology, and provide potential insights into therapeutic approaches.



2020 ◽  
Vol 8 (Suppl 3) ◽  
pp. A520-A520
Author(s):  
Son Pham ◽  
Tri Le ◽  
Tan Phan ◽  
Minh Pham ◽  
Huy Nguyen ◽  
...  

BackgroundSingle-cell sequencing technology has opened an unprecedented ability to interrogate cancer. It reveals significant insights into the intratumoral heterogeneity, metastasis, therapeutic resistance, which facilitates target discovery and validation in cancer treatment. With rapid advancements in throughput and strategies, a particular immuno-oncology study can produce multi-omics profiles for several thousands of individual cells. This overflow of single-cell data poses formidable challenges, including standardizing data formats across studies, performing reanalysis for individual datasets and meta-analysis.MethodsN/AResultsWe present BioTuring Browser, an interactive platform for accessing and reanalyzing published single-cell omics data. The platform is currently hosting a curated database of more than 10 million cells from 247 projects, covering more than 120 immune cell types and subtypes, and 15 different cancer types. All data are processed and annotated with standardized labels of cell types, diseases, therapeutic responses, etc. to be instantly accessed and explored in a uniform visualization and analytics interface. Based on this massive curated database, BioTuring Browser supports searching similar expression profiles, querying a target across datasets and automatic cell type annotation. The platform supports single-cell RNA-seq, CITE-seq and TCR-seq data. BioTuring Browser is now available for download at www.bioturing.com.ConclusionsN/A



2021 ◽  
Author(s):  
Qing Xie ◽  
Chengong Han ◽  
Victor Jin ◽  
Shili Lin

Single cell Hi-C techniques enable one to study cell to cell variability in chromatin interactions. However, single cell Hi-C (scHi-C) data suffer severely from sparsity, that is, the existence of excess zeros due to insufficient sequencing depth. Complicate things further is the fact that not all zeros are created equal, as some are due to loci truly not interacting because of the underlying biological mechanism (structural zeros), whereas others are indeed due to insufficient sequencing depth (sampling zeros), especially for loci that interact infrequently. Differentiating between structural zeros and sampling zeros is important since correct inference would improve downstream analyses such as clustering and discovery of subtypes. Nevertheless, distinguishing between these two types of zeros has received little attention in the single cell Hi-C literature, where the issue of sparsity has been addressed mainly as a data quality improvement problem. To fill this gap, in this paper, we propose HiCImpute, a Bayesian hierarchy model that goes beyond data quality improvement by also identifying observed zeros that are in fact structural zeros. HiCImpute takes spatial dependencies of scHi-C 2D data structure into account while also borrowing information from similar single cells and bulk data, when such are available. Through an extensive set of analyses of synthetic and real data, we demonstrate the ability of HiCImpute for identifying structural zeros with high sensitivity, and for accurate imputation of dropout values in sampling zeros. Downstream analyses using data improved from HiCImpute yielded much more accurate clustering of cell types compared to using observed data or data improved by several comparison methods. Most significantly, HiCImpute-improved data has led to the identification of subtypes within each of the excitatory neuronal cells of L4 and L5 in the prefrontal cortex.



2020 ◽  
Author(s):  
Feng Tian ◽  
Fan Zhou ◽  
Xiang Li ◽  
Wenping Ma ◽  
Honggui Wu ◽  
...  

SummaryBy circumventing cellular heterogeneity, single cell omics have now been widely utilized for cell typing in human tissues, culminating with the undertaking of human cell atlas aimed at characterizing all human cell types. However, more important are the probing of gene regulatory networks, underlying chromatin architecture and critical transcription factors for each cell type. Here we report the Genomic Architecture of Cells in Tissues (GeACT), a comprehensive genomic data base that collectively address the above needs with the goal of understanding the functional genome in action. GeACT was made possible by our novel single-cell RNA-seq (MALBAC-DT) and ATAC-seq (METATAC) methods of high detectability and precision. We exemplified GeACT by first studying representative organs in human mid-gestation fetus. In particular, correlated gene modules (CGMs) are observed and found to be cell-type-dependent. We linked gene expression profiles to the underlying chromatin states, and found the key transcription factors for representative CGMs.HighlightsGenomic Architecture of Cells in Tissues (GeACT) data for human mid-gestation fetusDetermining correlated gene modules (CGMs) in different cell types by MALBAC-DTMeasuring chromatin open regions in single cells with high detectability by METATACIntegrating transcriptomics and chromatin accessibility to reveal key TFs for a CGM



Sign in / Sign up

Export Citation Format

Share Document