cell clustering
Recently Published Documents


TOTAL DOCUMENTS

253
(FIVE YEARS 127)

H-INDEX

25
(FIVE YEARS 8)

2022 ◽  
Vol 23 (1) ◽  
Author(s):  
Gaoyang Li ◽  
Shaliu Fu ◽  
Shuguang Wang ◽  
Chenyu Zhu ◽  
Bin Duan ◽  
...  

AbstractHere, we present a multi-modal deep generative model, the single-cell Multi-View Profiler (scMVP), which is designed for handling sequencing data that simultaneously measure gene expression and chromatin accessibility in the same cell, including SNARE-seq, sci-CAR, Paired-seq, SHARE-seq, and Multiome from 10X Genomics. scMVP generates common latent representations for dimensionality reduction, cell clustering, and developmental trajectory inference and generates separate imputations for differential analysis and cis-regulatory element identification. scMVP can help mitigate data sparsity issues with imputation and accurately identify cell groups for different joint profiling techniques with common latent embedding, and we demonstrate its advantages on several realistic datasets.


2022 ◽  
Vol 12 ◽  
Author(s):  
Xin Duan ◽  
Wei Wang ◽  
Minghui Tang ◽  
Feng Gao ◽  
Xudong Lin

Identifying the phenotypes and interactions of various cells is the primary objective in cellular heterogeneity dissection. A key step of this methodology is to perform unsupervised clustering, which, however, often suffers challenges of the high level of noise, as well as redundant information. To overcome the limitations, we proposed self-diffusion on local scaling affinity (LSSD) to enhance cell similarities’ metric learning for dissecting cellular heterogeneity. Local scaling infers the self-tuning of cell-to-cell distances that are used to construct cell affinity. Our approach implements the self-diffusion process by propagating the affinity matrices to further improve the cell similarities for the downstream clustering analysis. To demonstrate the effectiveness and usefulness, we applied LSSD on two simulated and four real scRNA-seq datasets. Comparing with other single-cell clustering methods, our approach demonstrates much better clustering performance, and cell types identified on colorectal tumors reveal strongly biological interpretability.


2022 ◽  
Author(s):  
Jiyuan Fang ◽  
Cliburn Chan ◽  
Kouros Owzar ◽  
Liuyang Wang ◽  
Diyuan Qin ◽  
...  

Single-cell RNA-sequencing (scRNA-seq) technology allows us to explore cellular heterogeneity in the transcriptome. Because most scRNA-seq data analyses begin with cell clustering, its accuracy considerably impacts the validity of downstream analyses. Although many clustering methods have been developed, few tools are available to evaluate the clustering "goodness-of-fit" to the scRNA-seq data. In this paper, we propose a new Clustering Deviation Index (CDI) that measures the deviation of any clustering label set from the observed single-cell data. We conduct in silico and experimental scRNA-seq studies to show that CDI can select the optimal clustering label set. Particularly, CDI also informs the optimal tuning parameters for any given clustering method and the correct number of cluster components.


2021 ◽  
Author(s):  
Wenjing Zhang ◽  
Yuting Tan ◽  
Fang-Xiang Wu

2021 ◽  
Author(s):  
Jiachen Li ◽  
Siheng Chen ◽  
Xiaoyong Pan ◽  
Ye Yuan ◽  
Hong-bin Shen

Abstract Spatial transcriptomics data can provide high-throughput gene expression profiling and spatial structure of tissues simultaneously. An essential question of its initial analysis is cell clustering. However, most existing studies rely on only gene expression information and cannot utilize spatial information efficiently. Taking advantages of two recent technical development, spatial transcriptomics and graph neural network, we thus introduce CCST, Cell Clustering for Spatial Transcriptomics data with graph neural network, an unsupervised cell clustering method based on graph convolutional network to improve ab initio cell clustering and discovering of novel sub cell types based on curated cell category annotation. CCST is a general framework for dealing with various kinds of spatially resolved transcriptomics. With application to five in vitro and in vivo spatial datasets, we show that CCST outperforms other spatial cluster approaches on spatial transcriptomics datasets, and can clearly identify all four cell cycle phases from MERFISH data of cultured cells, and find novel functional sub cell types with different micro-environments from seqFISH+ data of brain, which are all validated experimentally, inspiring novel biological hypotheses about the underlying interactions among cell state, cell type and micro-environment.


2021 ◽  
Vol 11 ◽  
Author(s):  
Jujuan Zhuang ◽  
Changjing Ren ◽  
Dan Ren ◽  
Yu’ang Li ◽  
Danyang Liu ◽  
...  

Critical in revealing cell heterogeneity and identifying new cell subtypes, cell clustering based on single-cell RNA sequencing (scRNA-seq) is challenging. Due to the high noise, sparsity, and poor annotation of scRNA-seq data, existing state-of-the-art cell clustering methods usually ignore gene functions and gene interactions. In this study, we propose a feature extraction method, named FEGFS, to analyze scRNA-seq data, taking advantage of known gene functions. Specifically, we first derive the functional gene sets based on Gene Ontology (GO) terms and reduce their redundancy by semantic similarity analysis and gene repetitive rate reduction. Then, we apply the kernel principal component analysis to select features on each non-redundant functional gene set, and we combine the selected features (for each functional gene set) together for subsequent clustering analysis. To test the performance of FEGFS, we apply agglomerative hierarchical clustering based on FEGFS and compared it with seven state-of-the-art clustering methods on six real scRNA-seq datasets. For small datasets like Pollen and Goolam, FEGFS outperforms all methods on all four evaluation metrics including adjusted Rand index (ARI), normalized mutual information (NMI), homogeneity score (HOM), and completeness score (COM). For example, the ARIs of FEGFS are 0.955 and 0.910, respectively, on Pollen and Goolam; and those of the second-best method are only 0.938 and 0.910, respectively. For large datasets, FEGFS also outperforms most methods. For example, the ARIs of FEGFS are 0.781 on both Klein and Zeisel, which are higher than those of all other methods but slight lower than those of SC3 (0.798 and 0.807, respectively). Moreover, we demonstrate that CMF-Impute is powerful in reconstructing cell-to-cell and gene-to-gene correlation and in inferring cell lineage trajectories. As for application, take glioma as an example; we demonstrated that our clustering methods could identify important cell clusters related to glioma and also inferred key marker genes related to these cell clusters.


Genes ◽  
2021 ◽  
Vol 12 (12) ◽  
pp. 1847
Author(s):  
Jie Xia ◽  
Lequn Wang ◽  
Guijun Zhang ◽  
ZuoChun Man ◽  
Luonan Chen

Rapid advances in single-cell genomics sequencing (SCGS) have allowed researchers to characterize tumor heterozygosity with unprecedented resolution and reveal the phylogenetic relationships between tumor cells or clones. However, high sequencing error rates of current SCGS data, i.e., false positives, false negatives, and missing bases, severely limit its application. Here, we present a deep learning framework, RDAClone, to recover genotype matrices from noisy data with an extended robust deep autoencoder, cluster cells into subclones by the Louvain-Jaccard method, and further infer evolutionary relationships between subclones by the minimum spanning tree. Studies on both simulated and real datasets demonstrate its robustness and superiority in data denoising, cell clustering, and evolutionary tree reconstruction, particularly for large datasets.


Cancers ◽  
2021 ◽  
Vol 13 (22) ◽  
pp. 5840
Author(s):  
Fabien Gava ◽  
Julie Pignolet ◽  
Sébastien Déjean ◽  
Odile Mondésert ◽  
Renaud Morin ◽  
...  

Characterization of the molecular mechanisms involved in tumor cell clustering could open the way to new therapeutic strategies. Towards this aim, we used an in vitro quantitative procedure to monitor the anchorage-independent cell aggregation kinetics in a panel of 25 cancer cell lines. The analysis of the relationship between selected aggregation dynamic parameters and the gene expression data for these cell lines from the CCLE database allowed identifying genes with expression significantly associated with aggregation parameter variations. Comparison of these transcripts with the perturbagen signatures from the Connectivity Map resource highlighted that they were strongly correlated with the transcriptional signature of most histone deacetylase (HDAC) inhibitors. Experimental evaluation of two HDAC inhibitors (SAHA and ISOX) showed that they inhibited the initial step of in vitro tumor cell aggregation. This validates our findings and reinforces the potential interest of HDCA inhibitors to prevent metastasis spreading.


2021 ◽  
Author(s):  
Snehalika Lall ◽  
Sumanta Ray ◽  
Sanghamitra Bandyopadhyay

Annotation of cells in single-cell clustering requires a homogeneous grouping of cell populations. There are various issues in single cell sequencing that effect homogeneous grouping (clustering) of cells, such as small amount of starting RNA, limited per-cell sequenced reads, cell-to-cell variability due to cell-cycle, cellular morphology, and variable reagent concentrations. Moreover, single cell data is susceptible to technical noise, which affects the quality of genes (or features) selected/extracted prior to clustering. Here we introduce sc-CGconv (copula based graph convolution network for single cell clustering), a stepwise robust unsupervised feature extraction and clustering approach that formulates and aggregates cell–cell relationships using copula correlation (Ccor), followed by a graph convolution network based clustering approach. sc-CGconv formulates a cell-cell graph using Ccor that is learned by a graph-based artificial intelligence model, graph convolution network. The learned representation (low dimensional embedding) is utilized for cell clustering. sc-CGconv features the following advantages. a. sc-CGconv works with substantially smaller sample sizes to identify homogeneous clusters. b. sc-CGconv can model the expression co-variability of a large number of genes, thereby outperforming state-of-the-art gene selection/extraction methods for clustering. c. sc-CGconv preserves the cell-to-cell variability within the selected gene set by constructing a cell-cell graph through copula correlation measure. d. sc-CGconv provides a topology-preserving embedding of cells in low dimensional space. The source code and usage information are available at https://github.com/Snehalikalall/CopulaGCN .


Sign in / Sign up

Export Citation Format

Share Document