scholarly journals Accurately Clustering Single-cell RNA-seq data by Capturing Structural Relations between Cells through Graph Convolutional Network

2020 ◽  
Author(s):  
Yuansong Zeng ◽  
Xiang Zhou ◽  
Jiahua Rao ◽  
Yutong Lu ◽  
Yuedong Yang

AbstractRecent advances in single-cell RNA sequencing (scRNA-seq) technologies provide a great opportunity to study gene expression at cellular resolution, and the scRNA-seq data has been routinely conducted to unfold cell heterogeneity and diversity. A critical step for the scRNA-seq analyses is to cluster the same type of cells, and many methods have been developed for cell clustering. However, existing clustering methods are limited to extract the representations from expression data of individual cells, while ignoring the high-order structural relations between cells. Here, we proposed a new method (GraphSCC) to cluster cells based on scRNA-seq data by accounting structural relations between cells through a graph convolutional network. The representation learned from the graph convolutional network, together with another representation output from a denoising autoencoder network, are optimized by a dual self-supervised module for better cell clustering. Extensive experiments indicate that GraphSCC model outperforms state-of-the-art methods in various evaluation metrics on both simulated and real datasets. Further visualizations show that GraphSCC provides representations for better intra-cluster compactness and inter-cluster separability.

Genes ◽  
2021 ◽  
Vol 12 (2) ◽  
pp. 311
Author(s):  
Zhenqiu Liu

Single-cell RNA-seq (scRNA-seq) is a powerful tool to measure the expression patterns of individual cells and discover heterogeneity and functional diversity among cell populations. Due to variability, it is challenging to analyze such data efficiently. Many clustering methods have been developed using at least one free parameter. Different choices for free parameters may lead to substantially different visualizations and clusters. Tuning free parameters is also time consuming. Thus there is need for a simple, robust, and efficient clustering method. In this paper, we propose a new regularized Gaussian graphical clustering (RGGC) method for scRNA-seq data. RGGC is based on high-order (partial) correlations and subspace learning, and is robust over a wide-range of a regularized parameter λ. Therefore, we can simply set λ=2 or λ=log(p) for AIC (Akaike information criterion) or BIC (Bayesian information criterion) without cross-validation. Cell subpopulations are discovered by the Louvain community detection algorithm that determines the number of clusters automatically. There is no free parameter to be tuned with RGGC. When evaluated with simulated and benchmark scRNA-seq data sets against widely used methods, RGGC is computationally efficient and one of the top performers. It can detect inter-sample cell heterogeneity, when applied to glioblastoma scRNA-seq data.


2020 ◽  
Author(s):  
Snehalika Lall ◽  
Abhik Ghosh ◽  
Sumanta Ray ◽  
Sanghamitra Bandyopadhyay

ABSTRACTMany single-cell typing methods require pure clustering of cells, which is susceptible towards the technical noise, and heavily dependent on high quality informative genes selected in the preliminary steps of downstream analysis. Techniques for gene selection in single-cell RNA sequencing (scRNA-seq) data are seemingly simple which casts problems with respect to the resolution of (sub-)types detection, marker selection and ultimately impacts towards cell annotation. We introduce sc-REnF, a novel and robust entropy based feature (gene) selection method, which leverages the landmark advantage of ‘Renyi’ and ‘Tsallis’ entropy achieved in their original application, in single cell clustering. Thereby, gene selection is robust and less sensitive towards the technical noise present in the data, producing a pure clustering of cells, beyond classifying independent and unknown sample with utmost accuracy. The corresponding software is available at: https://github.com/Snehalikalall/sc-REnF


2020 ◽  
Vol 21 (1) ◽  
Author(s):  
Chunxiang Wang ◽  
Xin Gao ◽  
Juntao Liu

Abstract Background Advances in single-cell RNA-seq technology have led to great opportunities for the quantitative characterization of cell types, and many clustering algorithms have been developed based on single-cell gene expression. However, we found that different data preprocessing methods show quite different effects on clustering algorithms. Moreover, there is no specific preprocessing method that is applicable to all clustering algorithms, and even for the same clustering algorithm, the best preprocessing method depends on the input data. Results We designed a graph-based algorithm, SC3-e, specifically for discriminating the best data preprocessing method for SC3, which is currently the most widely used clustering algorithm for single cell clustering. When tested on eight frequently used single-cell RNA-seq data sets, SC3-e always accurately selects the best data preprocessing method for SC3 and therefore greatly enhances the clustering performance of SC3. Conclusion The SC3-e algorithm is practically powerful for discriminating the best data preprocessing method, and therefore largely enhances the performance of cell-type clustering of SC3. It is expected to play a crucial role in the related studies of single-cell clustering, such as the studies of human complex diseases and discoveries of new cell types.


2017 ◽  
Vol 18 (1) ◽  
Author(s):  
Zhuo Wang ◽  
Shuilin Jin ◽  
Guiyou Liu ◽  
Xiurui Zhang ◽  
Nan Wang ◽  
...  

2021 ◽  
Vol 3 (2) ◽  
Author(s):  
Xin Wang ◽  
Jane Frederick ◽  
Hongbin Wang ◽  
Sheng Hui ◽  
Vadim Backman ◽  
...  

Abstract The transcriptional plasticity of cancer cells promotes intercellular heterogeneity in response to anticancer drugs and facilitates the generation of subpopulation surviving cells. Characterizing single-cell transcriptional heterogeneity after drug treatments can provide mechanistic insights into drug efficacy. Here, we used single-cell RNA-seq to examine transcriptomic profiles of cancer cells treated with paclitaxel, celecoxib and the combination of the two drugs. By normalizing the expression of endogenous genes to spike-in molecules, we found that cellular mRNA abundance shows dynamic regulation after drug treatment. Using a random forest model, we identified gene signatures classifying single cells into three states: transcriptional repression, amplification and control-like. Treatment with paclitaxel or celecoxib alone generally repressed gene transcription across single cells. Interestingly, the drug combination resulted in transcriptional amplification and hyperactivation of mitochondrial oxidative phosphorylation pathway linking to enhanced cell killing efficiency. Finally, we identified a regulatory module enriched with metabolism and inflammation-related genes activated in a subpopulation of paclitaxel-treated cells, the expression of which predicted paclitaxel efficacy across cancer cell lines and in vivo patient samples. Our study highlights the dynamic global transcriptional activity driving single-cell heterogeneity during drug response and emphasizes the importance of adding spike-in molecules to study gene expression regulation using single-cell RNA-seq.


2016 ◽  
Author(s):  
Xun Zhu ◽  
Travers Ching ◽  
Xinghua Pan ◽  
Sherman Weissman ◽  
Lana Garmire

Single-cell RNA-Sequencing (scRNA-Seq) is a cutting edge technology that enables the understanding of biological processes at an unprecedentedly high resolution. However, well suited bioinformatics tools to analyze the data generated from this new technology are still lacking. Here we have investigated the performance of non-negative matrix factorization (NMF) method to analyze a wide variety of scRNA-Seq data sets, ranging from mouse hematopoietic stem cells to human glioblastoma data. In comparison to other unsupervised clustering methods including K-means and hierarchical clustering, NMF has higher accuracy even when the clustering results of K-means and hierarchical clustering are enhanced by t-SNE. Moreover, NMF successfully detect the subpopulations, such as those in a single glioblastoma patient. Furthermore, in conjugation with the modularity detection method FEM, it reveals unique modules that are indicative of clinical subtypes. In summary, we propose that NMF is a desirable method to analyze heterogeneous single-cell RNA-Seq data, and the NMFEM pipeline is suitable for modularity detection among single-cell RNA-Seq data.


PeerJ ◽  
2017 ◽  
Vol 5 ◽  
pp. e2888 ◽  
Author(s):  
Xun Zhu ◽  
Travers Ching ◽  
Xinghua Pan ◽  
Sherman M. Weissman ◽  
Lana Garmire

Single-cell RNA-Sequencing (scRNA-Seq) is a fast-evolving technology that enables the understanding of biological processes at an unprecedentedly high resolution. However, well-suited bioinformatics tools to analyze the data generated from this new technology are still lacking. Here we investigate the performance of non-negative matrix factorization (NMF) method to analyze a wide variety of scRNA-Seq datasets, ranging from mouse hematopoietic stem cells to human glioblastoma data. In comparison to other unsupervised clustering methods including K-means and hierarchical clustering, NMF has higher accuracy in separating similar groups in various datasets. We ranked genes by their importance scores (D-scores) in separating these groups, and discovered that NMF uniquely identifies genes expressed at intermediate levels as top-ranked genes. Finally, we show that in conjugation with the modularity detection method FEM, NMF reveals meaningful protein-protein interaction modules. In summary, we propose that NMF is a desirable method to analyze heterogeneous single-cell RNA-Seq data. The NMF based subpopulation detection package is available at:https://github.com/lanagarmire/NMFEM.


Genes ◽  
2020 ◽  
Vol 11 (7) ◽  
pp. 792 ◽  
Author(s):  
Liang Chen ◽  
Yuyao Zhai ◽  
Qiuyan He ◽  
Weinan Wang ◽  
Minghua Deng

As single-cell RNA sequencing technologies mature, massive gene expression profiles can be obtained. Consequently, cell clustering and annotation become two crucial and fundamental procedures affecting other specific downstream analyses. Most existing single-cell RNA-seq (scRNA-seq) data clustering algorithms do not take into account the available cell annotation results on the same tissues or organisms from other laboratories. Nonetheless, such data could assist and guide the clustering process on the target dataset. Identifying marker genes through differential expression analysis to manually annotate large amounts of cells also costs labor and resources. Therefore, in this paper, we propose a novel end-to-end cell supervised clustering and annotation framework called scAnCluster, which fully utilizes the cell type labels available from reference data to facilitate the cell clustering and annotation on the unlabeled target data. Our algorithm integrates deep supervised learning, self-supervised learning and unsupervised learning techniques together, and it outperforms other customized scRNA-seq supervised clustering methods in both simulation and real data. It is particularly worth noting that our method performs well on the challenging task of discovering novel cell types that are absent in the reference data.


F1000Research ◽  
2018 ◽  
Vol 7 ◽  
pp. 1141 ◽  
Author(s):  
Angelo Duò ◽  
Mark D. Robinson ◽  
Charlotte Soneson

Subpopulation identification, usually via some form of unsupervised clustering, is a fundamental step in the analysis of many single-cell RNA-seq data sets. This has motivated the development and application of a broad range of clustering methods, based on various underlying algorithms. Here, we provide a systematic and extensible performance evaluation of 12 clustering algorithms, including both methods developed explicitly for scRNA-seq data and more general-purpose methods. The methods were evaluated using 9 publicly available scRNA-seq data sets as well as three simulations with varying degree of cluster separability. The same feature selection approaches were used for all methods, allowing us to focus on the investigation of the performance of the clustering algorithms themselves. We evaluated the ability of recovering known subpopulations, the stability and the run time of the methods. Additionally, we investigated whether the performance could be improved by generating consensus partitions from multiple individual clustering methods. We found substantial differences in the performance, run time and stability between the methods, with SC3 and Seurat showing the most favorable results. Additionally, we found that consensus clustering typically did not improve the performance compared to the best of the combined methods, but that several of the top-performing methods already perform some type of consensus clustering. The R scripts providing an extensible framework for the evaluation of new methods and data sets are available on GitHub (https://github.com/markrobinsonuzh/scRNAseq_clustering_comparison).


Sign in / Sign up

Export Citation Format

Share Document