SuperCT: A supervised-learning-framework to enhance the characterization of single-cell transcriptomic profiles

AbstractCharacterization of individual cell types is fundamental to the study of multicellular samples such as tumor tissues. Single-cell RNAseq techniques, which allow high-throughput expression profiling of individual cells, have significantly advanced our ability of this task. Currently, most of the scRNA-seq data analyses are commenced with unsupervised clustering of cells followed by visualization of clusters in a low-dimensional space. Clusters are often assigned to different cell types based on canonical markers. However, the efficiency of characterizing the known cell types in this way is low and limited by the investigator[s] knowledge. In this study, we present a technical framework of training the expandable supervised-classifier in order to reveal the single-cell identities based on their RNA expression profiles. Using multiple scRNA-seq datasets we demonstrate the superior accuracy, robustness, compatibility and expandability of this new solution compared to the traditional methods. We use two examples of model upgrade to demonstrate how the projected evolution of the cell-type classifier is realized.

Download Full-text

Discovering a sparse set of pairwise discriminating features in high-dimensional data

Bioinformatics ◽

10.1093/bioinformatics/btaa690 ◽

2020 ◽

Author(s):

Samuel Melton ◽

Sharad Ramanathan

Keyword(s):

Single Cell ◽

Dimensional Space ◽

Cell Types ◽

Dimensional Subspace ◽

Supplementary Information ◽

High Dimensional ◽

Technological Advances ◽

Data Points ◽

Low Dimensional ◽

Sparse Set

Abstract Motivation Recent technological advances produce a wealth of high-dimensional descriptions of biological processes, yet extracting meaningful insight and mechanistic understanding from these data remains challenging. For example, in developmental biology, the dynamics of differentiation can now be mapped quantitatively using single-cell RNA sequencing, yet it is difficult to infer molecular regulators of developmental transitions. Here, we show that discovering informative features in the data is crucial for statistical analysis as well as making experimental predictions. Results We identify features based on their ability to discriminate between clusters of the data points. We define a class of problems in which linear separability of clusters is hidden in a low-dimensional space. We propose an unsupervised method to identify the subset of features that define a low-dimensional subspace in which clustering can be conducted. This is achieved by averaging over discriminators trained on an ensemble of proposed cluster configurations. We then apply our method to single-cell RNA-seq data from mouse gastrulation, and identify 27 key transcription factors (out of 409 total), 18 of which are known to define cell states through their expression levels. In this inferred subspace, we find clear signatures of known cell types that eluded classification prior to discovery of the correct low-dimensional subspace. Availability and implementation https://github.com/smelton/SMD. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Single-cell transcriptome analysis of tumor and stromal compartments of pancreatic ductal adenocarcinoma primary tumors and metastatic lesions

Genome Medicine ◽

10.1186/s13073-020-00776-9 ◽

2020 ◽

Vol 12 (1) ◽

Author(s):

Wei Lin ◽

Pawan Noel ◽

Erkut H. Borazanci ◽

Jeeyun Lee ◽

Albert Amini ◽

...

Keyword(s):

Single Cell ◽

Tumor Cells ◽

Cell Types ◽

Ductal Adenocarcinoma ◽

Cellular Composition ◽

Primary Tumors ◽

Metastatic Lesions ◽

Tumor Tissues ◽

Cell Type Specific ◽

Different Cell Types

Abstract Background Solid tumors such as pancreatic ductal adenocarcinoma (PDAC) comprise not just tumor cells but also a microenvironment with which the tumor cells constantly interact. Detailed characterization of the cellular composition of the tumor microenvironment is critical to the understanding of the disease and treatment of the patient. Single-cell transcriptomics has been used to study the cellular composition of different solid tumor types including PDAC. However, almost all of those studies used primary tumor tissues. Methods In this study, we employed a single-cell RNA sequencing technology to profile the transcriptomes of individual cells from dissociated primary tumors or metastatic biopsies obtained from patients with PDAC. Unsupervised clustering analysis as well as a new supervised classification algorithm, SuperCT, was used to identify the different cell types within the tumor tissues. The expression signatures of the different cell types were then compared between primary tumors and metastatic biopsies. The expressions of the cell type-specific signature genes were also correlated with patient survival using public datasets. Results Our single-cell RNA sequencing analysis revealed distinct cell types in primary and metastatic PDAC tissues including tumor cells, endothelial cells, cancer-associated fibroblasts (CAFs), and immune cells. The cancer cells showed high inter-patient heterogeneity, whereas the stromal cells were more homogenous across patients. Immune infiltration varies significantly from patient to patient with majority of the immune cells being macrophages and exhausted lymphocytes. We found that the tumor cellular composition was an important factor in defining the PDAC subtypes. Furthermore, the expression levels of cell type-specific markers for EMT+ cancer cells, activated CAFs, and endothelial cells significantly associated with patient survival. Conclusions Taken together, our work identifies significant heterogeneity in cellular compositions of PDAC tumors and between primary tumors and metastatic lesions. Furthermore, the cellular composition was an important factor in defining PDAC subtypes and significantly correlated with patient outcome. These findings provide valuable insights on the PDAC microenvironment and could potentially inform the management of PDAC patients.

Download Full-text

Unsupervised cell functional annotation for single-cell RNA-Seq

10.1101/2021.11.20.469410 ◽

2021 ◽

Author(s):

Dongshunyi Li ◽

Jun Ding ◽

Ziv Bar-Joseph

Keyword(s):

Single Cell ◽

Dimensional Space ◽

Cell Types ◽

Specific Gene ◽

Rna Seq ◽

Cell Type ◽

Sequencing Data ◽

Gene Sets ◽

Supervised Methods ◽

Low Dimensional

One of the first steps in the analysis of single cell RNA-Sequencing data (scRNA-Seq) is the assignment of cell types. While a number of supervised methods have been developed for this, in most cases such assignment is performed by first clustering cells in low-dimensional space and then assigning cell types to different clusters. To overcome noise and to improve cell type assignments we developed UNIFAN, a neural network method that simultaneously clusters and annotates cells using known gene sets. UNIFAN combines both, low dimension representation for all genes and cell specific gene set activity scores to determine the clustering. We applied UNIFAN to human and mouse scRNA-Seq datasets from several different organs. As we show, by using knowledge on gene sets, UNIFAN greatly outperforms prior methods developed for clustering scRNA-Seq data. The gene sets assigned by UNIFAN to different clusters provide strong evidence for the cell type that is represented by this cluster making annotations easier.

Download Full-text

Single-cell atlas of the first intra-mammalian developmental stage of the human parasite Schistosoma mansoni

10.1101/754713 ◽

2019 ◽

Cited By ~ 7

Author(s):

Carmen Lidia Diaz Soria ◽

Jayhun Lee ◽

Tracy Chong ◽

Avril Coghlan ◽

Alan Tracey ◽

...

Keyword(s):

Schistosoma Mansoni ◽

Single Cell ◽

Expression Profiles ◽

Gene Expression Profiles ◽

Cell Types ◽

Mammalian Development ◽

Transcriptional Dynamics ◽

Swimming Water ◽

Different Cell Types

AbstractOver 250 million people suffer from schistosomiasis, a tropical disease caused by parasitic flatworms known as schistosomes. Humans become infected by free-swimming, water-borne larvae, which penetrate the skin. The earliest intra-mammalian stage, called the schistosomulum, undergoes a series of developmental transitions. These changes are critical for the parasite to adapt to its new environment as it navigates through host tissues to reach its niche, where it will grow to reproductive maturity. Unravelling the mechanisms that drive intra-mammalian development requires knowledge of the spatial organisation and transcriptional dynamics of different cell types that comprise the schistomulum body. To fill these important knowledge gaps, we performed single-cell RNA sequencing on two-day old schistosomula of Schistosoma mansoni. We identified likely gene expression profiles for muscle, nervous system, tegument, parenchymal/primordial gut cells, and stem cells. In addition, we validated cell markers for all these clusters by in situ hybridisation in schistosomula and adult parasites. Taken together, this study provides a comprehensive cell-type atlas for the early intra-mammalian stage of this devastating metazoan parasite.

Download Full-text

SC2disease: a manually curated database of single-cell transcriptome for human diseases

Nucleic Acids Research ◽

10.1093/nar/gkaa838 ◽

2020 ◽

Vol 49 (D1) ◽

pp. D1413-D1419 ◽

Cited By ~ 1

Author(s):

Tianyi Zhao ◽

Shuxuan Lyu ◽

Guilin Lu ◽

Liran Juan ◽

Xi Zeng ◽

...

Keyword(s):

Gene Expression ◽

Single Cell ◽

Expression Profiles ◽

Gene Expression Profiles ◽

Cell Types ◽

Cellular Level ◽

Human Diseases ◽

Cell Type ◽

Cell Type Specific ◽

Different Cell Types

Abstract SC2disease (http://easybioai.com/sc2disease/) is a manually curated database that aims to provide a comprehensive and accurate resource of gene expression profiles in various cell types for different diseases. With the development of single-cell RNA sequencing (scRNA-seq) technologies, uncovering cellular heterogeneity of different tissues for different diseases has become feasible by profiling transcriptomes across cell types at the cellular level. In particular, comparing gene expression profiles between different cell types and identifying cell-type-specific genes in various diseases offers new possibilities to address biological and medical questions. However, systematic, hierarchical and vast databases of gene expression profiles in human diseases at the cellular level are lacking. Thus, we reviewed the literature prior to March 2020 for studies which used scRNA-seq to study diseases with human samples, and developed the SC2disease database to summarize all the data by different diseases, tissues and cell types. SC2disease documents 946 481 entries, corresponding to 341 cell types, 29 tissues and 25 diseases. Each entry in the SC2disease database contains comparisons of differentially expressed genes between different cell types, tissues and disease-related health status. Furthermore, we reanalyzed gene expression matrix by unified pipeline to improve the comparability between different studies. For each disease, we also compare cell-type-specific genes with the corresponding genes of lead single nucleotide polymorphisms (SNPs) identified in genome-wide association studies (GWAS) to implicate cell type specificity of the traits.

Download Full-text

Moana: A robust and scalable cell type classification framework for single-cell RNA-Seq data

10.1101/456129 ◽

2018 ◽

Cited By ~ 24

Author(s):

Florian Wagner ◽

Itai Yanai

Keyword(s):

Single Cell ◽

Cell Types ◽

Specific Cell ◽

Rna Seq ◽

Cell Type ◽

Systematic Analysis ◽

Learning Framework ◽

Classification Framework ◽

Heterogeneous Tissues

AbstractSingle-cell RNA-Seq (scRNA-Seq) enables the systematic molecular characterization of heterogeneous tissues at an unprecedented resolution and scale. However, it is currently unclear how to establish formal cell type definitions, which impedes the systematic analysis of scRNA-Seq data across experiments and studies. To address this challenge, we have developed Moana, a hierarchical machine learning framework that enables the construction of robust cell type classifiers from heterogeneous scRNA-Seq datasets. To demonstrate Moana’s capabilities, we construct cell type classifiers for human immune cells that accurately distinguish between closely related cell types in the presence of experimental perturbations and systematic differences between scRNA-Seq protocols. We show that Moana is generally applicable and scales to datasets with more than ten thousand cells, thus enabling the construction of tissue-specific cell type atlases that can be directly applied to analyze new scRNASeq datasets. A Python implementation of Moana can be found at https://github.com/yanailab/moana.

Download Full-text

scPNMF: sparse gene encoding of single cells to facilitate gene selection for targeted gene profiling

10.1101/2021.02.09.430550 ◽

2021 ◽

Author(s):

Dongyuan Song ◽

Kexin Aileen Li ◽

Zachary Hemminger ◽

Roy Wollman ◽

Jingyi Jessica Li

Keyword(s):

Single Cell ◽

Gene Selection ◽

Spatial Information ◽

Dimensional Space ◽

Single Cells ◽

High Sensitivity ◽

Cell Types ◽

Gene Profiling ◽

Selection Methods ◽

Low Dimensional

AbstractSingle-cell RNA sequencing (scRNA-seq) captures whole transcriptome information of individual cells. While scRNA-seq measures thousands of genes, researchers are often interested in only dozens to hundreds of genes for a closer study. Then a question is how to select those informative genes from scRNA-seq data. Moreover, single-cell targeted gene profiling technologies are gaining popularity for their low costs, high sensitivity, and extra (e.g., spatial) information; however, they typically can only measure up to a few hundred genes. Then another challenging question is how to select genes for targeted gene profiling based on existing scRNA-seq data. Here we develop the single-cell Projective Non-negative Matrix Factorization (scPNMF) method to select informative genes from scRNA-seq data in an unsupervised way. Compared with existing gene selection methods, scPNMF has two advantages. First, its selected informative genes can better distinguish cell types. Second, it enables the alignment of new targeted gene profiling data with reference data in a low-dimensional space to facilitate the prediction of cell types in the new data. Technically, scPNMF modifies the PNMF algorithm for gene selection by changing the initialization and adding a basis selection step, which selects informative bases to distinguish cell types. We demonstrate that scPNMF outperforms the state-of-the-art gene selection methods on diverse scRNA-seq datasets. Moreover, we show that scPNMF can guide the design of targeted gene profiling experiments and cell-type annotation on targeted gene profiling data.

Download Full-text

Molecular, spatial and projection diversity of neurons in primary motor cortex revealed by in situ single-cell transcriptomics

10.1101/2020.06.04.105700 ◽

2020 ◽

Cited By ~ 9

Author(s):

Meng Zhang ◽

Stephen W. Eichhorn ◽

Brian Zingg ◽

Zizhen Yao ◽

Hongkui Zeng ◽

...

Keyword(s):

Gene Expression ◽

High Resolution ◽

Single Cell ◽

Primary Motor Cortex ◽

Expression Profiles ◽

Neuronal Cell ◽

Gene Expression Profiles ◽

Cell Types ◽

Different Cell Types

AbstractA mammalian brain is comprised of numerous cell types organized in an intricate manner to form functional neural circuits. Single-cell RNA sequencing provides a powerful approach to identify cell types based on their gene expression profiles and has revealed many distinct cell populations in the brain1-3. Single-cell epigenomic profiling4,5 further provides information on gene-regulatory signatures of different cell types. Understanding how different cell types contribute to brain function, however, requires knowledge of their spatial organization and connectivity, which is not preserved in sequencing-based methods that involve cell dissociation3,6. Here, we used an in situ single-cell transcriptome-imaging method, multiplexed error-robust fluorescence in situ hybridization (MERFISH)7, to generate a molecularly defined and spatially resolved cell atlas of the mouse primary motor cortex (MOp). We profiled ∼300,000 cells in the MOp, identified 95 neuronal and non-neuronal cell clusters, and revealed a complex spatial map in which not only excitatory neuronal clusters but also most inhibitory neuronal clusters adopted layered organizations. Notably, intratelencephalic (IT) cells, the largest branch of neurons in the MOp, formed a continuous spectrum of cells with gradual changes in both gene expression profiles and cortical depth positions in a highly correlated manner. Furthermore, we integrated MERFISH with retrograde tracing to probe the projection targets for different MOp neuronal cell types and found that projections of MOp neurons to other cortical regions formed a many-to-many network with each target region receiving input preferentially from a different composition of IT clusters. Overall, our results provide a high-resolution spatial and projection map of molecularly defined cell types in the MOp. We anticipate that the imaging platform described here can be broadly applied to create high-resolution cell atlases of a wide range of systems.

Download Full-text

Ensemble dimensionality reduction and feature gene extraction for single-cell RNA-seq data

Nature Communications ◽

10.1038/s41467-020-19465-7 ◽

2020 ◽

Vol 11 (1) ◽

Author(s):

Xiaoxiao Sun ◽

Yiwen Liu ◽

Lingling An

Keyword(s):

Dimensionality Reduction ◽

Single Cell ◽

Dimensional Space ◽

Essential Feature ◽

Empirical Studies ◽

Expression Patterns ◽

Cell Types ◽

Stochastic Gradient Descent ◽

Reduction Techniques ◽

Low Dimensional

AbstractSingle-cell RNA sequencing (scRNA-seq) technologies allow researchers to uncover the biological states of a single cell at high resolution. For computational efficiency and easy visualization, dimensionality reduction is necessary to capture gene expression patterns in low-dimensional space. Here we propose an ensemble method for simultaneous dimensionality reduction and feature gene extraction (EDGE) of scRNA-seq data. Different from existing dimensionality reduction techniques, the proposed method implements an ensemble learning scheme that utilizes massive weak learners for an accurate similarity search. Based on the similarity matrix constructed by those weak learners, the low-dimensional embedding of the data is estimated and optimized through spectral embedding and stochastic gradient descent. Comprehensive simulation and empirical studies show that EDGE is well suited for searching for meaningful organization of cells, detecting rare cell types, and identifying essential feature genes associated with certain cell types.

Download Full-text

Single-cell atlas of the first intra-mammalian developmental stage of the human parasite Schistosoma mansoni

Nature Communications ◽

10.1038/s41467-020-20092-5 ◽

2020 ◽

Vol 11 (1) ◽

Author(s):

Carmen Lidia Diaz Soria ◽

Jayhun Lee ◽

Tracy Chong ◽

Avril Coghlan ◽

Alan Tracey ◽

...

Keyword(s):

Schistosoma Mansoni ◽

Single Cell ◽

Expression Profiles ◽

In Situ Hybridisation ◽

Gene Expression Profiles ◽

Cell Types ◽

Mammalian Development ◽

Transcriptional Dynamics ◽

Swimming Water ◽

Different Cell Types

AbstractOver 250 million people suffer from schistosomiasis, a tropical disease caused by parasitic flatworms known as schistosomes. Humans become infected by free-swimming, water-borne larvae, which penetrate the skin. The earliest intra-mammalian stage, called the schistosomulum, undergoes a series of developmental transitions. These changes are critical for the parasite to adapt to its new environment as it navigates through host tissues to reach its niche, where it will grow to reproductive maturity. Unravelling the mechanisms that drive intra-mammalian development requires knowledge of the spatial organisation and transcriptional dynamics of different cell types that comprise the schistomulum body. To fill these important knowledge gaps, we perform single-cell RNA sequencing on two-day old schistosomula of Schistosoma mansoni. We identify likely gene expression profiles for muscle, nervous system, tegument, oesophageal gland, parenchymal/primordial gut cells, and stem cells. In addition, we validate cell markers for all these clusters by in situ hybridisation in schistosomula and adult parasites. Taken together, this study provides a comprehensive cell-type atlas for the early intra-mammalian stage of this devastating metazoan parasite.

Download Full-text