scholarly journals SuperCT: A supervised-learning-framework to enhance the characterization of single-cell transcriptomic profiles

2018 ◽  
Author(s):  
Peng Xie ◽  
Mingxuan Gao ◽  
Chunming Wang ◽  
Pawan Noel ◽  
Chaoyong Yang ◽  
...  

AbstractCharacterization of individual cell types is fundamental to the study of multicellular samples such as tumor tissues. Single-cell RNAseq techniques, which allow high-throughput expression profiling of individual cells, have significantly advanced our ability of this task. Currently, most of the scRNA-seq data analyses are commenced with unsupervised clustering of cells followed by visualization of clusters in a low-dimensional space. Clusters are often assigned to different cell types based on canonical markers. However, the efficiency of characterizing the known cell types in this way is low and limited by the investigator[s] knowledge. In this study, we present a technical framework of training the expandable supervised-classifier in order to reveal the single-cell identities based on their RNA expression profiles. Using multiple scRNA-seq datasets we demonstrate the superior accuracy, robustness, compatibility and expandability of this new solution compared to the traditional methods. We use two examples of model upgrade to demonstrate how the projected evolution of the cell-type classifier is realized.

Author(s):  
Samuel Melton ◽  
Sharad Ramanathan

Abstract Motivation Recent technological advances produce a wealth of high-dimensional descriptions of biological processes, yet extracting meaningful insight and mechanistic understanding from these data remains challenging. For example, in developmental biology, the dynamics of differentiation can now be mapped quantitatively using single-cell RNA sequencing, yet it is difficult to infer molecular regulators of developmental transitions. Here, we show that discovering informative features in the data is crucial for statistical analysis as well as making experimental predictions. Results We identify features based on their ability to discriminate between clusters of the data points. We define a class of problems in which linear separability of clusters is hidden in a low-dimensional space. We propose an unsupervised method to identify the subset of features that define a low-dimensional subspace in which clustering can be conducted. This is achieved by averaging over discriminators trained on an ensemble of proposed cluster configurations. We then apply our method to single-cell RNA-seq data from mouse gastrulation, and identify 27 key transcription factors (out of 409 total), 18 of which are known to define cell states through their expression levels. In this inferred subspace, we find clear signatures of known cell types that eluded classification prior to discovery of the correct low-dimensional subspace. Availability and implementation https://github.com/smelton/SMD. Supplementary information Supplementary data are available at Bioinformatics online.


2020 ◽  
Vol 12 (1) ◽  
Author(s):  
Wei Lin ◽  
Pawan Noel ◽  
Erkut H. Borazanci ◽  
Jeeyun Lee ◽  
Albert Amini ◽  
...  

Abstract Background Solid tumors such as pancreatic ductal adenocarcinoma (PDAC) comprise not just tumor cells but also a microenvironment with which the tumor cells constantly interact. Detailed characterization of the cellular composition of the tumor microenvironment is critical to the understanding of the disease and treatment of the patient. Single-cell transcriptomics has been used to study the cellular composition of different solid tumor types including PDAC. However, almost all of those studies used primary tumor tissues. Methods In this study, we employed a single-cell RNA sequencing technology to profile the transcriptomes of individual cells from dissociated primary tumors or metastatic biopsies obtained from patients with PDAC. Unsupervised clustering analysis as well as a new supervised classification algorithm, SuperCT, was used to identify the different cell types within the tumor tissues. The expression signatures of the different cell types were then compared between primary tumors and metastatic biopsies. The expressions of the cell type-specific signature genes were also correlated with patient survival using public datasets. Results Our single-cell RNA sequencing analysis revealed distinct cell types in primary and metastatic PDAC tissues including tumor cells, endothelial cells, cancer-associated fibroblasts (CAFs), and immune cells. The cancer cells showed high inter-patient heterogeneity, whereas the stromal cells were more homogenous across patients. Immune infiltration varies significantly from patient to patient with majority of the immune cells being macrophages and exhausted lymphocytes. We found that the tumor cellular composition was an important factor in defining the PDAC subtypes. Furthermore, the expression levels of cell type-specific markers for EMT+ cancer cells, activated CAFs, and endothelial cells significantly associated with patient survival. Conclusions Taken together, our work identifies significant heterogeneity in cellular compositions of PDAC tumors and between primary tumors and metastatic lesions. Furthermore, the cellular composition was an important factor in defining PDAC subtypes and significantly correlated with patient outcome. These findings provide valuable insights on the PDAC microenvironment and could potentially inform the management of PDAC patients.


2021 ◽  
Author(s):  
Dongshunyi Li ◽  
Jun Ding ◽  
Ziv Bar-Joseph

One of the first steps in the analysis of single cell RNA-Sequencing data (scRNA-Seq) is the assignment of cell types. While a number of supervised methods have been developed for this, in most cases such assignment is performed by first clustering cells in low-dimensional space and then assigning cell types to different clusters. To overcome noise and to improve cell type assignments we developed UNIFAN, a neural network method that simultaneously clusters and annotates cells using known gene sets. UNIFAN combines both, low dimension representation for all genes and cell specific gene set activity scores to determine the clustering. We applied UNIFAN to human and mouse scRNA-Seq datasets from several different organs. As we show, by using knowledge on gene sets, UNIFAN greatly outperforms prior methods developed for clustering scRNA-Seq data. The gene sets assigned by UNIFAN to different clusters provide strong evidence for the cell type that is represented by this cluster making annotations easier.


2019 ◽  
Author(s):  
Carmen Lidia Diaz Soria ◽  
Jayhun Lee ◽  
Tracy Chong ◽  
Avril Coghlan ◽  
Alan Tracey ◽  
...  

AbstractOver 250 million people suffer from schistosomiasis, a tropical disease caused by parasitic flatworms known as schistosomes. Humans become infected by free-swimming, water-borne larvae, which penetrate the skin. The earliest intra-mammalian stage, called the schistosomulum, undergoes a series of developmental transitions. These changes are critical for the parasite to adapt to its new environment as it navigates through host tissues to reach its niche, where it will grow to reproductive maturity. Unravelling the mechanisms that drive intra-mammalian development requires knowledge of the spatial organisation and transcriptional dynamics of different cell types that comprise the schistomulum body. To fill these important knowledge gaps, we performed single-cell RNA sequencing on two-day old schistosomula of Schistosoma mansoni. We identified likely gene expression profiles for muscle, nervous system, tegument, parenchymal/primordial gut cells, and stem cells. In addition, we validated cell markers for all these clusters by in situ hybridisation in schistosomula and adult parasites. Taken together, this study provides a comprehensive cell-type atlas for the early intra-mammalian stage of this devastating metazoan parasite.


2020 ◽  
Vol 49 (D1) ◽  
pp. D1413-D1419 ◽  
Author(s):  
Tianyi Zhao ◽  
Shuxuan Lyu ◽  
Guilin Lu ◽  
Liran Juan ◽  
Xi Zeng ◽  
...  

Abstract SC2disease (http://easybioai.com/sc2disease/) is a manually curated database that aims to provide a comprehensive and accurate resource of gene expression profiles in various cell types for different diseases. With the development of single-cell RNA sequencing (scRNA-seq) technologies, uncovering cellular heterogeneity of different tissues for different diseases has become feasible by profiling transcriptomes across cell types at the cellular level. In particular, comparing gene expression profiles between different cell types and identifying cell-type-specific genes in various diseases offers new possibilities to address biological and medical questions. However, systematic, hierarchical and vast databases of gene expression profiles in human diseases at the cellular level are lacking. Thus, we reviewed the literature prior to March 2020 for studies which used scRNA-seq to study diseases with human samples, and developed the SC2disease database to summarize all the data by different diseases, tissues and cell types. SC2disease documents 946 481 entries, corresponding to 341 cell types, 29 tissues and 25 diseases. Each entry in the SC2disease database contains comparisons of differentially expressed genes between different cell types, tissues and disease-related health status. Furthermore, we reanalyzed gene expression matrix by unified pipeline to improve the comparability between different studies. For each disease, we also compare cell-type-specific genes with the corresponding genes of lead single nucleotide polymorphisms (SNPs) identified in genome-wide association studies (GWAS) to implicate cell type specificity of the traits.


2018 ◽  
Author(s):  
Florian Wagner ◽  
Itai Yanai

AbstractSingle-cell RNA-Seq (scRNA-Seq) enables the systematic molecular characterization of heterogeneous tissues at an unprecedented resolution and scale. However, it is currently unclear how to establish formal cell type definitions, which impedes the systematic analysis of scRNA-Seq data across experiments and studies. To address this challenge, we have developed Moana, a hierarchical machine learning framework that enables the construction of robust cell type classifiers from heterogeneous scRNA-Seq datasets. To demonstrate Moana’s capabilities, we construct cell type classifiers for human immune cells that accurately distinguish between closely related cell types in the presence of experimental perturbations and systematic differences between scRNA-Seq protocols. We show that Moana is generally applicable and scales to datasets with more than ten thousand cells, thus enabling the construction of tissue-specific cell type atlases that can be directly applied to analyze new scRNASeq datasets. A Python implementation of Moana can be found at https://github.com/yanailab/moana.


2021 ◽  
Author(s):  
Dongyuan Song ◽  
Kexin Aileen Li ◽  
Zachary Hemminger ◽  
Roy Wollman ◽  
Jingyi Jessica Li

AbstractSingle-cell RNA sequencing (scRNA-seq) captures whole transcriptome information of individual cells. While scRNA-seq measures thousands of genes, researchers are often interested in only dozens to hundreds of genes for a closer study. Then a question is how to select those informative genes from scRNA-seq data. Moreover, single-cell targeted gene profiling technologies are gaining popularity for their low costs, high sensitivity, and extra (e.g., spatial) information; however, they typically can only measure up to a few hundred genes. Then another challenging question is how to select genes for targeted gene profiling based on existing scRNA-seq data. Here we develop the single-cell Projective Non-negative Matrix Factorization (scPNMF) method to select informative genes from scRNA-seq data in an unsupervised way. Compared with existing gene selection methods, scPNMF has two advantages. First, its selected informative genes can better distinguish cell types. Second, it enables the alignment of new targeted gene profiling data with reference data in a low-dimensional space to facilitate the prediction of cell types in the new data. Technically, scPNMF modifies the PNMF algorithm for gene selection by changing the initialization and adding a basis selection step, which selects informative bases to distinguish cell types. We demonstrate that scPNMF outperforms the state-of-the-art gene selection methods on diverse scRNA-seq datasets. Moreover, we show that scPNMF can guide the design of targeted gene profiling experiments and cell-type annotation on targeted gene profiling data.


Author(s):  
Meng Zhang ◽  
Stephen W. Eichhorn ◽  
Brian Zingg ◽  
Zizhen Yao ◽  
Hongkui Zeng ◽  
...  

AbstractA mammalian brain is comprised of numerous cell types organized in an intricate manner to form functional neural circuits. Single-cell RNA sequencing provides a powerful approach to identify cell types based on their gene expression profiles and has revealed many distinct cell populations in the brain1-3. Single-cell epigenomic profiling4,5 further provides information on gene-regulatory signatures of different cell types. Understanding how different cell types contribute to brain function, however, requires knowledge of their spatial organization and connectivity, which is not preserved in sequencing-based methods that involve cell dissociation3,6. Here, we used an in situ single-cell transcriptome-imaging method, multiplexed error-robust fluorescence in situ hybridization (MERFISH)7, to generate a molecularly defined and spatially resolved cell atlas of the mouse primary motor cortex (MOp). We profiled ∼300,000 cells in the MOp, identified 95 neuronal and non-neuronal cell clusters, and revealed a complex spatial map in which not only excitatory neuronal clusters but also most inhibitory neuronal clusters adopted layered organizations. Notably, intratelencephalic (IT) cells, the largest branch of neurons in the MOp, formed a continuous spectrum of cells with gradual changes in both gene expression profiles and cortical depth positions in a highly correlated manner. Furthermore, we integrated MERFISH with retrograde tracing to probe the projection targets for different MOp neuronal cell types and found that projections of MOp neurons to other cortical regions formed a many-to-many network with each target region receiving input preferentially from a different composition of IT clusters. Overall, our results provide a high-resolution spatial and projection map of molecularly defined cell types in the MOp. We anticipate that the imaging platform described here can be broadly applied to create high-resolution cell atlases of a wide range of systems.


2020 ◽  
Vol 11 (1) ◽  
Author(s):  
Xiaoxiao Sun ◽  
Yiwen Liu ◽  
Lingling An

AbstractSingle-cell RNA sequencing (scRNA-seq) technologies allow researchers to uncover the biological states of a single cell at high resolution. For computational efficiency and easy visualization, dimensionality reduction is necessary to capture gene expression patterns in low-dimensional space. Here we propose an ensemble method for simultaneous dimensionality reduction and feature gene extraction (EDGE) of scRNA-seq data. Different from existing dimensionality reduction techniques, the proposed method implements an ensemble learning scheme that utilizes massive weak learners for an accurate similarity search. Based on the similarity matrix constructed by those weak learners, the low-dimensional embedding of the data is estimated and optimized through spectral embedding and stochastic gradient descent. Comprehensive simulation and empirical studies show that EDGE is well suited for searching for meaningful organization of cells, detecting rare cell types, and identifying essential feature genes associated with certain cell types.


2020 ◽  
Vol 11 (1) ◽  
Author(s):  
Carmen Lidia Diaz Soria ◽  
Jayhun Lee ◽  
Tracy Chong ◽  
Avril Coghlan ◽  
Alan Tracey ◽  
...  

AbstractOver 250 million people suffer from schistosomiasis, a tropical disease caused by parasitic flatworms known as schistosomes. Humans become infected by free-swimming, water-borne larvae, which penetrate the skin. The earliest intra-mammalian stage, called the schistosomulum, undergoes a series of developmental transitions. These changes are critical for the parasite to adapt to its new environment as it navigates through host tissues to reach its niche, where it will grow to reproductive maturity. Unravelling the mechanisms that drive intra-mammalian development requires knowledge of the spatial organisation and transcriptional dynamics of different cell types that comprise the schistomulum body. To fill these important knowledge gaps, we perform single-cell RNA sequencing on two-day old schistosomula of Schistosoma mansoni. We identify likely gene expression profiles for muscle, nervous system, tegument, oesophageal gland, parenchymal/primordial gut cells, and stem cells. In addition, we validate cell markers for all these clusters by in situ hybridisation in schistosomula and adult parasites. Taken together, this study provides a comprehensive cell-type atlas for the early intra-mammalian stage of this devastating metazoan parasite.


Sign in / Sign up

Export Citation Format

Share Document