scholarly journals A copula based topology preserving graph convolution network for clustering of single-cell RNA-seq data

2021 ◽  
Author(s):  
Snehalika Lall ◽  
Sumanta Ray ◽  
Sanghamitra Bandyopadhyay

Annotation of cells in single-cell clustering requires a homogeneous grouping of cell populations. There are various issues in single cell sequencing that effect homogeneous grouping (clustering) of cells, such as small amount of starting RNA, limited per-cell sequenced reads, cell-to-cell variability due to cell-cycle, cellular morphology, and variable reagent concentrations. Moreover, single cell data is susceptible to technical noise, which affects the quality of genes (or features) selected/extracted prior to clustering. Here we introduce sc-CGconv (copula based graph convolution network for single cell clustering), a stepwise robust unsupervised feature extraction and clustering approach that formulates and aggregates cell–cell relationships using copula correlation (Ccor), followed by a graph convolution network based clustering approach. sc-CGconv formulates a cell-cell graph using Ccor that is learned by a graph-based artificial intelligence model, graph convolution network. The learned representation (low dimensional embedding) is utilized for cell clustering. sc-CGconv features the following advantages. a. sc-CGconv works with substantially smaller sample sizes to identify homogeneous clusters. b. sc-CGconv can model the expression co-variability of a large number of genes, thereby outperforming state-of-the-art gene selection/extraction methods for clustering. c. sc-CGconv preserves the cell-to-cell variability within the selected gene set by constructing a cell-cell graph through copula correlation measure. d. sc-CGconv provides a topology-preserving embedding of cells in low dimensional space. The source code and usage information are available at https://github.com/Snehalikalall/CopulaGCN .

2021 ◽  
Author(s):  
Zixiang Luo ◽  
Chenyu Xu ◽  
Zhen Zhang ◽  
Wenfei Jin

ABSTRACTDimensionality reduction is crucial for the visualization and interpretation of the high-dimensional single-cell RNA sequencing (scRNA-seq) data. However, preserving topological structure among cells to low dimensional space remains a challenge. Here, we present the single-cell graph autoencoder (scGAE), a dimensionality reduction method that preserves topological structure in scRNA-seq data. scGAE builds a cell graph and uses a multitask-oriented graph autoencoder to preserve topological structure information and feature information in scRNA-seq data simultaneously. We further extended scGAE for scRNA-seq data visualization, clustering, and trajectory inference. Analyses of simulated data showed that scGAE accurately reconstructs developmental trajectory and separates discrete cell clusters under different scenarios, outperforming recently developed deep learning methods. Furthermore, implementation of scGAE on empirical data showed scGAE provided novel insights into cell developmental lineages and preserved inter-cluster distances.


2019 ◽  
Vol 85 (18) ◽  
Author(s):  
Yutaka Yawata ◽  
Tatsunori Kiyokawa ◽  
Yuhki Kawamura ◽  
Tomohiro Hirayama ◽  
Kyosuke Takabe ◽  
...  

ABSTRACT Here we analyzed the innate fluorescence signature of the single microbial cell, within both clonal and mixed populations of microorganisms. We found that even very similarly shaped cells differ noticeably in their autofluorescence features and that the innate fluorescence signatures change dynamically with growth phases. We demonstrated that machine learning models can be trained with a data set of single-cell innate fluorescence signatures to annotate cells according to their phenotypes and physiological status, for example, distinguishing a wild-type Aspergillus nidulans cell from its nitrogen metabolism mutant counterpart and log-phase cells from stationary-phase cells of Pseudomonas putida. We developed a minimally invasive method (confocal reflection microscopy-assisted single-cell innate fluorescence [CRIF] analysis) to optically extract and catalog the innate cellular fluorescence signatures of each of the individual live microbial cells in a three-dimensional space. This technique represents a step forward from traditional techniques which analyze the innate fluorescence signatures at the population level and necessitate a clonal culture. Since the fluorescence signature is an innate property of a cell, our technique allows the prediction of the types or physiological status of intact and tag-free single cells, within a cell population distributed in a three-dimensional space. Our study presents a blueprint for a streamlined cell analysis where one can directly assess the potential phenotype of each single cell in a heterogenous population by its autofluorescence signature under a microscope, without cell tagging. IMPORTANCE A cell’s innate fluorescence signature is an assemblage of fluorescence signals emitted by diverse biomolecules within a cell. It is known that the innate fluoresce signature reflects various cellular properties and physiological statuses; thus, they can serve as a rich source of information in cell characterization as well as cell identification. However, conventional techniques focus on the analysis of the innate fluorescence signatures at the population level but not at the single-cell level and thus necessitate a clonal culture. In the present study, we developed a technique to analyze the innate fluorescence signature of a single microbial cell. Using this novel method, we found that even very similarly shaped cells differ noticeably in their autofluorescence features, and the innate fluorescence signature changes dynamically with growth phases. We also demonstrated that the different cell types can be classified accurately within a mixed population under a microscope at the resolution of a single cell, depending solely on the innate fluorescence signature information. We suggest that single-cell autofluoresce signature analysis is a promising tool to directly assess the taxonomic or physiological heterogeneity within a microbial population, without cell tagging.


Author(s):  
Samuel Melton ◽  
Sharad Ramanathan

Abstract Motivation Recent technological advances produce a wealth of high-dimensional descriptions of biological processes, yet extracting meaningful insight and mechanistic understanding from these data remains challenging. For example, in developmental biology, the dynamics of differentiation can now be mapped quantitatively using single-cell RNA sequencing, yet it is difficult to infer molecular regulators of developmental transitions. Here, we show that discovering informative features in the data is crucial for statistical analysis as well as making experimental predictions. Results We identify features based on their ability to discriminate between clusters of the data points. We define a class of problems in which linear separability of clusters is hidden in a low-dimensional space. We propose an unsupervised method to identify the subset of features that define a low-dimensional subspace in which clustering can be conducted. This is achieved by averaging over discriminators trained on an ensemble of proposed cluster configurations. We then apply our method to single-cell RNA-seq data from mouse gastrulation, and identify 27 key transcription factors (out of 409 total), 18 of which are known to define cell states through their expression levels. In this inferred subspace, we find clear signatures of known cell types that eluded classification prior to discovery of the correct low-dimensional subspace. Availability and implementation https://github.com/smelton/SMD. Supplementary information Supplementary data are available at Bioinformatics online.


Author(s):  
Dongshunyi Li ◽  
Jun Ding ◽  
Ziv Bar-Joseph

Abstract Motivation Recent technological advances enable the profiling of spatial single-cell expression data. Such data present a unique opportunity to study cell–cell interactions and the signaling genes that mediate them. However, most current methods for the analysis of these data focus on unsupervised descriptive modeling, making it hard to identify key signaling genes and quantitatively assess their impact. Results We developed a Mixture of Experts for Spatial Signaling genes Identification (MESSI) method to identify active signaling genes within and between cells. The mixture of experts strategy enables MESSI to subdivide cells into subtypes. MESSI relies on multi-task learning using information from neighboring cells to improve the prediction of response genes within a cell. Applying the methods to three spatial single-cell expression datasets, we show that MESSI accurately predicts the levels of response genes, improving upon prior methods and provides useful biological insights about key signaling genes and subtypes of excitatory neuron cells. Availability and implementation MESSI is available at: https://github.com/doraadong/MESSI


2019 ◽  
Author(s):  
Grace A McLaughlin ◽  
Erin M Langdon ◽  
John M Crutchley ◽  
Liam J Holt ◽  
Mark Gregory Forest ◽  
...  

The spatial structure and physical properties of the cytosol are not well understood. Measurements of the material state of the cytosol are challenging due to its spatial and temporal heterogeneity, the lack of truly passive probes, and the need for probes of many sizes to accurately describe the state across scales. Recent development of genetically encoded multimeric nanoparticles (GEMs) has opened up study of the cytosol at the length scales of multiprotein complexes (20-60 nm). Using these probes to spatially resolve diffusivity of a cytoplasmic volume within a cell requires accurate and automated 3D tracking methods. We developed an image analysis pipeline for whole-cell imaging of GEMs in the context of large, multinucleate fungi where there is evidence of functional compartmentalization of the cytosol for both the nuclear division cycle and branching. We apply a neural network to track particles in 3D, generate surface meshes to project data on representations of the cell, and create dynamic visualizations of local diffusivities. Using this pipeline, we have found that there is remarkable variability in the properties of the cytosol both within a single cell and between cells. By analyzing the spatial diffusivity patterns, we saw an enrichment of low diffusivity zones at hyphal tips and near some nuclei. These results show that the physical state of the cytosol varies spatially within a single cell and exhibits significant cell-to-cell variability. Thus, molecular crowding contributes to heterogeneity within individual cells and across populations.


2021 ◽  
Vol 12 ◽  
Author(s):  
Jianping Zhao ◽  
Na Wang ◽  
Haiyun Wang ◽  
Chunhou Zheng ◽  
Yansen Su

Dimensionality reduction of high-dimensional data is crucial for single-cell RNA sequencing (scRNA-seq) visualization and clustering. One prominent challenge in scRNA-seq studies comes from the dropout events, which lead to zero-inflated data. To address this issue, in this paper, we propose a scRNA-seq data dimensionality reduction algorithm based on a hierarchical autoencoder, termed SCDRHA. The proposed SCDRHA consists of two core modules, where the first module is a deep count autoencoder (DCA) that is used to denoise data, and the second module is a graph autoencoder that projects the data into a low-dimensional space. Experimental results demonstrate that SCDRHA has better performance than existing state-of-the-art algorithms on dimension reduction and noise reduction in five real scRNA-seq datasets. Besides, SCDRHA can also dramatically improve the performance of data visualization and cell clustering.


2021 ◽  
Author(s):  
Anjun Ma ◽  
Xiaoying Wang ◽  
Cankun Wang ◽  
Jingxian Li ◽  
Tong Xiao ◽  
...  

We present DeepMAPS, a deep learning platform for cell-type-specific biological gene network inference from single-cell multi-omics (scMulti-omics). DeepMAPS includes both cells and genes in a heterogeneous graph to infer cell-cell, cell-gene, and gene-gene relations simultaneously. The graph attention neural network considers a cell and a gene with both local and global information, making DeepMAPS more robust to data noises. We benchmarked DeepMAPS on 18 datasets for cell clustering and network inference, and the results showed that our method outperforms various existing tools. We further applied DeepMAPS on a case study of lung tumor leukocyte CITE-seq data and observed superior performance in cell clustering, and predicted biologically meaningful cell-cell communication pathways based on the inferred gene networks. To improve the feasibility and ensure the reproducibility of analyzing scMulti-omics data, we deployed a webserver with multi-functions and various visualizations. Overall, we valued DeepMAPS as a novel platform of the state-of-the-art deep learning model in the single-cell study and can promote the use of scMulti-omics data in the community.


2021 ◽  
Author(s):  
Dongshunyi Li ◽  
Jun Ding ◽  
Ziv Bar-Joseph

One of the first steps in the analysis of single cell RNA-Sequencing data (scRNA-Seq) is the assignment of cell types. While a number of supervised methods have been developed for this, in most cases such assignment is performed by first clustering cells in low-dimensional space and then assigning cell types to different clusters. To overcome noise and to improve cell type assignments we developed UNIFAN, a neural network method that simultaneously clusters and annotates cells using known gene sets. UNIFAN combines both, low dimension representation for all genes and cell specific gene set activity scores to determine the clustering. We applied UNIFAN to human and mouse scRNA-Seq datasets from several different organs. As we show, by using knowledge on gene sets, UNIFAN greatly outperforms prior methods developed for clustering scRNA-Seq data. The gene sets assigned by UNIFAN to different clusters provide strong evidence for the cell type that is represented by this cluster making annotations easier.


2018 ◽  
Author(s):  
Peng Xie ◽  
Mingxuan Gao ◽  
Chunming Wang ◽  
Pawan Noel ◽  
Chaoyong Yang ◽  
...  

AbstractCharacterization of individual cell types is fundamental to the study of multicellular samples such as tumor tissues. Single-cell RNAseq techniques, which allow high-throughput expression profiling of individual cells, have significantly advanced our ability of this task. Currently, most of the scRNA-seq data analyses are commenced with unsupervised clustering of cells followed by visualization of clusters in a low-dimensional space. Clusters are often assigned to different cell types based on canonical markers. However, the efficiency of characterizing the known cell types in this way is low and limited by the investigator[s] knowledge. In this study, we present a technical framework of training the expandable supervised-classifier in order to reveal the single-cell identities based on their RNA expression profiles. Using multiple scRNA-seq datasets we demonstrate the superior accuracy, robustness, compatibility and expandability of this new solution compared to the traditional methods. We use two examples of model upgrade to demonstrate how the projected evolution of the cell-type classifier is realized.


2021 ◽  
Author(s):  
Snehalika Lall ◽  
Abhik Ghosh ◽  
Sumanta Ray ◽  
Sanghamitra Bandyopadhyay

Abstract Annotation of cells in single-cell clustering requires a homogeneous grouping of cell populations. Since single cell data is susceptible to technical noise, the quality of genes selected prior to clustering is of crucial importance in the preliminary steps of downstream analysis. Therefore, interest in robust gene selection has gained considerable attention in recent years. We introduce sc-REnF, (robust entropy based feature (gene) selection method), aiming to leverage the advantages of Rényi and Tsallis> entropies in gene selection for single cell clustering. Experiments demonstrate that with tuned parameter (q), Rényi and Tsallis entropies select genes that improved the clustering results significantly, over the other competing methods. sc-REnF can capture relevancy and redundancy among the features of noisy data extremely well due to its robust objective function. Moreover, the selected features/genes can able to clusters the unknown cells with a high accuracy. Finally, sc-REnF yields good clustering performance in small sample, large feature scRNA-seq data.


Sign in / Sign up

Export Citation Format

Share Document