scholarly journals Sc-GPE: A Graph Partitioning-Based Cluster Ensemble Method for Single-Cell

2020 ◽  
Vol 11 ◽  
Author(s):  
Xiaoshu Zhu ◽  
Jian Li ◽  
Hong-Dong Li ◽  
Miao Xie ◽  
Jianxin Wang

Clustering is an efficient way to analyze single-cell RNA sequencing data. It is commonly used to identify cell types, which can help in understanding cell differentiation processes. However, different clustering results can be obtained from different single-cell clustering methods, sometimes including conflicting conclusions, and biologists will often fail to get the right clustering results and interpret the biological significance. The cluster ensemble strategy can be an effective solution for the problem. As the graph partitioning-based clustering methods are good at clustering single-cell, we developed Sc-GPE, a novel cluster ensemble method combining five single-cell graph partitioning-based clustering methods. The five methods are SNN-cliq, PhenoGraph, SC3, SSNN-Louvain, and MPGS-Louvain. In Sc-GPE, a consensus matrix is constructed based on the five clustering solutions by calculating the probability that the cell pairs are divided into the same cluster. It solved the problem in the hypergraph-based ensemble approach, including the different cluster labels that were assigned in the individual clustering method, and it was difficult to find the corresponding cluster labels across all methods. Then, to distinguish the different importance of each method in a clustering ensemble, a weighted consensus matrix was constructed by designing an importance score strategy. Finally, hierarchical clustering was performed on the weighted consensus matrix to cluster cells. To evaluate the performance, we compared Sc-GPE with the individual clustering methods and the state-of-the-art SAME-clustering on 12 single-cell RNA-seq datasets. The results show that Sc-GPE obtained the best average performance, and achieved the highest NMI and ARI value in five datasets.

2019 ◽  
Vol 35 (22) ◽  
pp. 4827-4829 ◽  
Author(s):  
Xiao-Fei Zhang ◽  
Le Ou-Yang ◽  
Shuo Yang ◽  
Xing-Ming Zhao ◽  
Xiaohua Hu ◽  
...  

Abstract Summary Imputation of dropout events that may mislead downstream analyses is a key step in analyzing single-cell RNA-sequencing (scRNA-seq) data. We develop EnImpute, an R package that introduces an ensemble learning method for imputing dropout events in scRNA-seq data. EnImpute combines the results obtained from multiple imputation methods to generate a more accurate result. A Shiny application is developed to provide easier implementation and visualization. Experiment results show that EnImpute outperforms the individual state-of-the-art methods in almost all situations. EnImpute is useful for correcting the noisy scRNA-seq data before performing downstream analysis. Availability and implementation The R package and Shiny application are available through Github at https://github.com/Zhangxf-ccnu/EnImpute. Supplementary information Supplementary data are available at Bioinformatics online.


PeerJ ◽  
2021 ◽  
Vol 9 ◽  
pp. e12570
Author(s):  
Yunqing Liu ◽  
Na Lu ◽  
Changwei Bi ◽  
Tingyu Han ◽  
Guo Zhuojun ◽  
...  

Background One goal of expression data analysis is to discover the biological significance or function of genes that are differentially expressed. Gene Set Enrichment (GSE) analysis is one of the main tools for function mining that has been widely used. However, every gene expressed in a cell is valuable information for GSE for single-cell RNA sequencing (scRNA-SEQ) data and not should be discarded. Methods We developed the functional expression matrix (FEM) algorithm to utilize the information from all expressed genes. The algorithm converts the gene expression matrix (GEM) into a FEM. The FEM algorithm can provide insight on the biological significance of a single cell. It can also integrate with GEM for downstream analysis. Results We found that FEM performed well with cell clustering and cell-type specific function annotation in three datasets (peripheral blood mononuclear cells, human liver, and human pancreas).


2021 ◽  
Author(s):  
Helena L Crowell ◽  
Sarah X Morillo Leonardo ◽  
Charlotte Soneson ◽  
Mark D Robinson

With the emergence of hundreds of single-cell RNA-sequencing (scRNA-seq) datasets, the number of computational tools to analyse aspects of the generated data has grown rapidly. As a result, there is a recurring need to demonstrate whether newly developed methods are truly performant - on their own as well as in comparison to existing tools. Benchmark studies aim to consolidate the space of available methods for a given task, and often use simulated data that provide a ground truth for evaluations. Thus, demanding a high quality standard for synthetically generated data is critical to make simulation study results credible and transferable to real data. Here, we evaluated methods for synthetic scRNA-seq data generation in their ability to mimic experimental data. Besides comparing gene- and cell-level quality control summaries in both one- and two-dimensional settings, we further quantified these at the batch- and cluster-level. Secondly, we investigate the effect of simulators on clustering and batch correction method comparisons, and, thirdly, which and to what extent quality control summaries can capture reference-simulation similarity. Our results suggest that most simulators are unable to accommodate complex designs without introducing artificial effects; they yield over-optimistic performance of integration, and potentially unreliable ranking of clustering methods; and, it is generally unknown which summaries are important to ensure effective simulation-based method comparisons.


2020 ◽  
Author(s):  
Duanchen Sun ◽  
Xiangnan Guan ◽  
Amy E. Moran ◽  
David Z. Qian ◽  
Pepper Schedin ◽  
...  

AbstractSingle-cell sequencing yields novel discoveries by distinguishing cell types, states and lineages within the context of heterogeneous tissues. However, interpreting complex single-cell data from highly heterogeneous cell populations remains challenging. Currently, most existing single-cell data analyses focus on cell type clusters defined by unsupervised clustering methods, which cannot directly link cell clusters with specific biological and clinical phenotypes. Here we present Scissor, a novel approach that utilizes disease phenotypes to identify cell subpopulations from single-cell data that most highly correlate with a given phenotype. This “phenotype-to-cell within a single step” strategy enables the utilization of a large amount of clinical information that has been collected for bulk assays to identify the most highly phenotype-associated cell subpopulations. When applied to a lung cancer single-cell RNA-seq (scRNA-seq) dataset, Scissor identified a subset of cells exhibiting high hypoxia activities, which predicted worse survival outcomes in lung cancer patients. Furthermore, in a melanoma scRNA-seq dataset, Scissor discerned a T cell subpopulation with low PDCD1/CTLA4 and high TCF7 expressions, which is associated with a favorable immunotherapy response. Thus, Scissor provides a novel framework to identify the biologically and clinically relevant cell subpopulations from single-cell assays by leveraging the wealth of phenotypes and bulk-omics datasets.


BMC Cancer ◽  
2020 ◽  
Vol 20 (1) ◽  
Author(s):  
Xiaozhi Li ◽  
Yutong Meng

Abstract Background Glioma is the most common malignant tumor of the brain. The existence of metastatic tumor cells is an important cause of recurrence even after radical glioma resection. Methods Single-cell sequencing data and high-throughput data were downloaded from GEO database and TCGA/CGGA database. By means of PCA and tSNE clustering methods, metastasis-associated genes in glioma were identified. GSEA explored possible biological functions that these metastasis-associated genes may participate in. Univariate and multivariate Cox regression were used to construct a prognostic model. Results Glioma metastatic cells and metastasis-associated genes were identified. The prognostic model based on metastasis-associated genes had good sensitivity and specificity for the prognosis of glioma. These genes may be involved in signal pathways such as cellular protein catabolic process, p53 signaling pathway, transcriptional misregulation in cancer and JAK-STAT signaling pathway. Conclusion This study explored glioma metastasis-associated genes through single-cell sequencing data mining, and aimed to identify prognostic metastasis-associated signatures for glioma and may provide potential targets for further cancer research.


2017 ◽  
Author(s):  
Saskia Freytag ◽  
Ingrid Lonnstedt ◽  
Milica Ng ◽  
Melanie Bahlo

AbstractThe commercially available 10X Genomics protocol to generate droplet-based single cell RNA-seq (scRNA-seq) data is enjoying growing popularity among researchers. Fundamental to the analysis of such scRNA-seq data is the ability to cluster similar or same cells into non-overlapping groups. Many competing methods have been proposed for this task, but there is currently little guidance with regards to which method offers most accuracy. Answering this question is complicated by the fact that 10X Genomics data lack cell labels that would allow a direct performance evaluation. Thus in this review, we focused on comparing clustering solutions of a dozen methods for three datasets on human peripheral mononuclear cells generated with the 10X Genomics technology. While clustering solutions appeared robust, we found that solutions produced by different methods have little in common with each other. They also failed to replicate cell type assignment generated with supervised labeling approaches. Furthermore, we demonstrate that all clustering methods tested clustered cells to a large degree according to the amount of genes coding for ribosomal protein genes in each cell.


Author(s):  
Hung Nguyen ◽  
Duc Tran ◽  
Bang Tran ◽  
Bahadir Pehlivan ◽  
Tin Nguyen

Abstract Gene regulatory network is a complicated set of interactions between genetic materials, which dictates how cells develop in living organisms and react to their surrounding environment. Robust comprehension of these interactions would help explain how cells function as well as predict their reactions to external factors. This knowledge can benefit both developmental biology and clinical research such as drug development or epidemiology research. Recently, the rapid advance of single-cell sequencing technologies, which pushed the limit of transcriptomic profiling to the individual cell level, opens up an entirely new area for regulatory network research. To exploit this new abundant source of data and take advantage of data in single-cell resolution, a number of computational methods have been proposed to uncover the interactions hidden by the averaging process in standard bulk sequencing. In this article, we review 15 such network inference methods developed for single-cell data. We discuss their underlying assumptions, inference techniques, usability, and pros and cons. In an extensive analysis using simulation, we also assess the methods’ performance, sensitivity to dropout and time complexity. The main objective of this survey is to assist not only life scientists in selecting suitable methods for their data and analysis purposes but also computational scientists in developing new methods by highlighting outstanding challenges in the field that remain to be addressed in the future development.


2021 ◽  
Author(s):  
David Peeney ◽  
Yu Fan ◽  
Sadeechya Gurung ◽  
Carolyn Lazaroff ◽  
Shashikala Ratnayake ◽  
...  

Tissue inhibitor of metalloproteinases (TIMPs/Timps) are an endogenous family of widely expressed matrisome-associated proteins that were initially identified as inhibitors of matrix metalloproteinase activity (Metzincin family proteases). Consequently, TIMPs are often considered simply as protease inhibitors by many investigators. However, an evolving list of new metalloproteinase-independent functions for TIMP family members suggests that this concept is outdated. These novel TIMP functions include direct agonism/antagonism of multiple transmembrane receptors, as well as interactions with matrisome targets. While the family was fully identified over 2 decades ago, there has yet to be an in-depth study describing the expression of TIMPs in normal tissues of adult mammals. An understanding of the tissues and cell-types that express TIMPs 1 through 4, in both normal and disease states are important to contextualize the growing functional capabilities of TIMP proteins, which are often dismissed as non-canonical. Using publicly available single cell RNA sequencing data from the Tabula Muris Consortium, we analyzed approximately 100,000 murine cells across nineteen tissues from non-diseased organs, representing seventy-three annotated cell types, to define the diversity in Timp gene expression across healthy tissues. We describe that Timp genes display unique expression profiles across tissues and organ-specific cell types. Within annotated cell-types, we identify clear and discrete cluster-specific patterns of Timp expression, particularly in cells of stromal and endothelial origins. Differential expression and gene set pathway analysis provide evidence of the biological significance of Timp expression in these identified cell sub-types, which are consistent with novel roles in normal tissue homeostasis and changing roles in disease progression. This understanding of the tissues, specific cell types and conditions of the microenvironment in which Timp genes are expressed adds important physiological context to the growing array of novel TIMP protein functions.


2021 ◽  
Vol 16 (1) ◽  
Author(s):  
Leah L. Weber ◽  
Mohammed El-Kebir

Abstract Background Cancer arises from an evolutionary process where somatic mutations give rise to clonal expansions. Reconstructing this evolutionary process is useful for treatment decision-making as well as understanding evolutionary patterns across patients and cancer types. In particular, classifying a tumor’s evolutionary process as either linear or branched and understanding what cancer types and which patients have each of these trajectories could provide useful insights for both clinicians and researchers. While comprehensive cancer phylogeny inference from single-cell DNA sequencing data is challenging due to limitations with current sequencing technology and the complexity of the resulting problem, current data might provide sufficient signal to accurately classify a tumor’s evolutionary history as either linear or branched. Results We introduce the Linear Perfect Phylogeny Flipping (LPPF) problem as a means of testing two alternative hypotheses for the pattern of evolution, which we prove to be NP-hard. We develop Phyolin, which uses constraint programming to solve the LPPF problem. Through both in silico experiments and real data application, we demonstrate the performance of our method, outperforming a competing machine learning approach. Conclusion Phyolin is an accurate, easy to use and fast method for classifying an evolutionary trajectory as linear or branched given a tumor’s single-cell DNA sequencing data.


Sign in / Sign up

Export Citation Format

Share Document