scholarly journals A Robust and Scalable Graph Neural Network for Accurate Single Cell Classification

2021 ◽  
Author(s):  
Yuansong Zeng ◽  
Xiang Zhou ◽  
Zixiang Pan ◽  
Yutong Lu ◽  
Yuedong Yang

Single-cell RNA sequencing (scRNA-seq) techniques provide high-resolution data on cellular heterogeneity in diverse tissues, and a critical step for the data analysis is cell type identification. Traditional methods usually cluster the cells and manually identify cell clusters through marker genes, which is time-consuming and subjective. With the launch of several large-scale single-cell projects, millions of sequenced cells have been annotated and it is promising to transfer labels from the annotated datasets to newly generated datasets. One powerful way for the transferring is to learn cell relations through the graph neural network (GNN), while vanilla GNN is difficult to process millions of cells due to the expensive costs of the message-passing procedure at each training epoch. Here, we have developed a robust and scalable GNN-based method for accurate single cell classification (GraphCS), where the graph is constructed to connect similar cells within and between labelled and unlabelled scRNA-seq datasets for propagation of shared information. To overcome the slow information propagation of GNN at each training epoch, the diffused information is pre-calculated via the approximate Generalized PageRank algorithm, enabling sublinear complexity for a high speed and scalability on millions of cells. Compared with existing methods, GraphCS demonstrates better performance on simulated, cross-platform, and cross-species scRNA-seq datasets. More importantly, our model can achieve superior performance on a large dataset with one million cells within 50 minutes.

Author(s):  
Young Hyun Kim ◽  
Eun-Gyu Ha ◽  
Kug Jin Jeon ◽  
Chena Lee ◽  
Sang-Sun Han

Objectives: This study aimed to develop a fully automated human identification method based on a convolutional neural network (CNN) with a large-scale dental panoramic radiograph (DPR) dataset. Methods: In total, 2,760 DPRs from 746 subjects who had 2 to 17 DPRs with various changes in image characteristics due to various dental treatments (tooth extraction, oral surgery, prosthetics, orthodontics, or tooth development) were collected. The test dataset included the latest DPR of each subject (746 images) and the other DPRs (2,014 images) were used for model training. A modified VGG16 model with two fully connected layers was applied for human identification. The proposed model was evaluated with rank-1, –3, and −5 accuracies, running time, and gradient-weighted class activation mapping (Grad-CAM)–applied images. Results: This model had rank-1,–3, and −5 accuracies of 82.84%, 89.14%, and 92.23%, respectively. All rank-1 accuracy values of the proposed model were above 80% regardless of changes in image characteristics. The average running time to train the proposed model was 60.9 sec per epoch, and the prediction time for 746 test DPRs was short (3.2 sec/image). The Grad-CAM technique verified that the model automatically identified humans by focusing on identifiable dental information. Conclusion: The proposed model showed good performance in fully automatic human identification despite differing image characteristics of DPRs acquired from the same patients. Our model is expected to assist in the fast and accurate identification by experts by comparing large amounts of images and proposing identification candidates at high speed.


2019 ◽  
Vol 21 (5) ◽  
pp. 1581-1595 ◽  
Author(s):  
Xinlei Zhao ◽  
Shuang Wu ◽  
Nan Fang ◽  
Xiao Sun ◽  
Jue Fan

Abstract Single-cell RNA sequencing (scRNA-seq) has been rapidly developing and widely applied in biological and medical research. Identification of cell types in scRNA-seq data sets is an essential step before in-depth investigations of their functional and pathological roles. However, the conventional workflow based on clustering and marker genes is not scalable for an increasingly large number of scRNA-seq data sets due to complicated procedures and manual annotation. Therefore, a number of tools have been developed recently to predict cell types in new data sets using reference data sets. These methods have not been generally adapted due to a lack of tool benchmarking and user guidance. In this article, we performed a comprehensive and impartial evaluation of nine classification software tools specifically designed for scRNA-seq data sets. Results showed that Seurat based on random forest, SingleR based on correlation analysis and CaSTLe based on XGBoost performed better than others. A simple ensemble voting of all tools can improve the predictive accuracy. Under nonideal situations, such as small-sized and class-imbalanced reference data sets, tools based on cluster-level similarities have superior performance. However, even with the function of assigning ‘unassigned’ labels, it is still challenging to catch novel cell types by solely using any of the single-cell classifiers. This article provides a guideline for researchers to select and apply suitable classification tools in their analysis workflows and sheds some lights on potential direction of future improvement on classification tools.


2019 ◽  
Vol 2019 ◽  
pp. 1-9
Author(s):  
Qian Yang ◽  
Bing Wei ◽  
Linqian Li ◽  
Debiao Ge

The plasma sheath is known as a popular topic of computational electromagnetics, and the plasma case is more resource-intensive than the non-plasma case. In this paper, a parallel shift-operator discontinuous Galerkin time-domain method using the MPI (Message Passing Interface) library is proposed to solve the large-scale plasma problems. To demonstrate our algorithm, a plasma sheath model of the high-speed blunt cone was established based on the results of the multiphysics software, and our algorithm was used to extract the radar cross-section (RCS) versus different incident angles of the model.


2018 ◽  
Vol 29 (8) ◽  
pp. 2060-2068 ◽  
Author(s):  
Nikos Karaiskos ◽  
Mahdieh Rahmatollahi ◽  
Anastasiya Boltengagen ◽  
Haiyue Liu ◽  
Martin Hoehne ◽  
...  

Background Three different cell types constitute the glomerular filter: mesangial cells, endothelial cells, and podocytes. However, to what extent cellular heterogeneity exists within healthy glomerular cell populations remains unknown.Methods We used nanodroplet-based highly parallel transcriptional profiling to characterize the cellular content of purified wild-type mouse glomeruli.Results Unsupervised clustering of nearly 13,000 single-cell transcriptomes identified the three known glomerular cell types. We provide a comprehensive online atlas of gene expression in glomerular cells that can be queried and visualized using an interactive and freely available database. Novel marker genes for all glomerular cell types were identified and supported by immunohistochemistry images obtained from the Human Protein Atlas. Subclustering of endothelial cells revealed a subset of endothelium that expressed marker genes related to endothelial proliferation. By comparison, the podocyte population appeared more homogeneous but contained three smaller, previously unknown subpopulations.Conclusions Our study comprehensively characterized gene expression in individual glomerular cells and sets the stage for the dissection of glomerular function at the single-cell level in health and disease.


Author(s):  
Justin Lakkis ◽  
David Wang ◽  
Yuanchao Zhang ◽  
Gang Hu ◽  
Kui Wang ◽  
...  

AbstractRecent development of single-cell RNA-seq (scRNA-seq) technologies has led to enormous biological discoveries. As the scale of scRNA-seq studies increases, a major challenge in analysis is batch effect, which is inevitable in studies involving human tissues. Most existing methods remove batch effect in a low-dimensional embedding space. Although useful for clustering, batch effect is still present in the gene expression space, leaving downstream gene-level analysis susceptible to batch effect. Recent studies have shown that batch effect correction in the gene expression space is much harder than in the embedding space. Popular methods such as Seurat3.0 rely on the mutual nearest neighbor (MNN) approach to remove batch effect in the gene expression space, but MNN can only analyze two batches at a time and it becomes computationally infeasible when the number of batches is large. Here we present CarDEC, a joint deep learning model that simultaneously clusters and denoises scRNA-seq data, while correcting batch effect both in the embedding and the gene expression space. Comprehensive evaluations spanning different species and tissues showed that CarDEC consistently outperforms scVI, DCA, and MNN. With CarDEC denoising, those non-highly variable genes offer as much signal for clustering as the highly variable genes, suggesting that CarDEC substantially boosted information content in scRNA-seq. We also showed that trajectory analysis using CarDEC’s denoised and batch corrected expression as input revealed marker genes and transcription factors that are otherwise obscured in the presence of batch effect. CarDEC is computationally fast, making it a desirable tool for large-scale scRNA-seq studies.


2021 ◽  
Author(s):  
Jiaxing Chen ◽  
Chinwang Cheong ◽  
Liang Lan ◽  
Xin Zhou ◽  
Jiming Liu ◽  
...  

AbstractSingle-cell RNA sequencing is used to capture cell-specific gene expression, thus allowing reconstruction of gene regulatory networks. The existing algorithms struggle to deal with dropouts and cellular heterogeneity, and commonly require pseudotime-ordered cells. Here, we describe DeepDRIM a supervised deep neural network that represents gene pair joint expression as images and considers the neighborhood context to eliminate the transitive interactions. Deep-DRIM yields significantly better performance than the other nine algorithms used on the eight cell lines tested, and can be used to successfully discriminate key functional modules between patients with mild and severe symptoms of coronavirus disease 2019 (COVID-19).


2021 ◽  
Author(s):  
Lorenzo Martini ◽  
Roberta Bardini ◽  
Stefano Di Carlo

The mammalian cortex contains a great variety of neuronal cells. In particular, GABAergic interneurons, which play a major role in neuronal circuit function, exhibit an extraordinary diversity of cell types. In this regard, single-cell RNA-seq analysis is crucial to study cellular heterogeneity. To identify and analyze rare cell types, it is necessary to reliably label cells through known markers. In this way, all the related studies are dependent on the quality of the employed marker genes. Therefore, in this work, we investigate how a set of chosen inhibitory interneurons markers perform. The gene set consists of both immunohistochemistry-derived genes and single-cell RNA-seq taxonomy ones. We employed various human and mouse datasets of the brain cortex, consequently processed with the Monocle3 pipeline. We defined metrics based on the relations between unsupervised cluster results and the marker expression. Specifically, we calculated the specificity, the fraction of cells expressing, and some metrics derived from decision tree analysis like entropy gain and impurity reduction. The results highlighted the strong reliability of some markers but also the low quality of others. More interestingly, though, a correlation emerges between the general performances of the genes set and the experimental quality of the datasets. Therefore, the proposed method allows evaluating the quality of a dataset in relation to its reliability regarding the inhibitory interneurons cellular heterogeneity study.


2020 ◽  
Author(s):  
Xin Shao ◽  
Haihong Yang ◽  
Xiang Zhuang ◽  
Jie Liao ◽  
Yueren Yang ◽  
...  

AbstractAdvances in single-cell RNA sequencing (scRNA-seq) have furthered the simultaneous classification of thousands of cells in a single assay based on transcriptome profiling. In most analysis protocols, single-cell type annotation relies on marker genes or RNA-seq profiles, resulting in poor extrapolation. Here, we introduce scDeepSort (https://github.com/ZJUFanLab/scDeepSort), a reference-free cell-type annotation tool for single-cell transcriptomics that uses a deep learning model with a weighted graph neural network. Using human and mouse scRNA-seq data resources, we demonstrate the feasibility of scDeepSort and its high accuracy in labeling 764,741 cells involving 56 human and 32 mouse tissues. Significantly, scDeepSort outperformed reference-dependent methods in annotating 76 external testing scRNA-seq datasets, including 126,384 cells (85.79%) from ten human tissues and 134,604 cells from 12 mouse tissues (81.30%). scDeepSort accurately revealed cell identities without prior reference knowledge, thus potentially providing new insights into mechanisms underlying biological processes, disease pathogenesis, and disease progression at a single-cell resolution.


2018 ◽  
Author(s):  
Nikos Konstantinides ◽  
Katarina Kapuralin ◽  
Chaimaa Fadil ◽  
Luendreo Barboza ◽  
Rahul Satija ◽  
...  

SummaryTranscription factors regulate the molecular, morphological, and physiological characters of neurons and generate their impressive cell type diversity. To gain insight into general principles that govern how transcription factors regulate cell type diversity, we used large-scale single-cell mRNA sequencing to characterize the extensive cellular diversity in the Drosophila optic lobes. We sequenced 55,000 single optic lobe neurons and glia and assigned them to 52 clusters of transcriptionally distinct single cells. We validated the clustering and annotated many of the clusters using RNA sequencing of characterized FACS-sorted single cell types, as well as marker genes specific to given clusters. To identify transcription factors responsible for inducing specific terminal differentiation features, we used machine-learning to generate a ‘random forest’ model. The predictive power of the model was confirmed by showing that two transcription factors expressed specifically in cholinergic (apterous) and glutamatergic (traffic-jam) neurons are necessary for the expression of ChAT and VGlut in many, but not all, cholinergic or glutamatergic neurons, respectively. We used a transcriptome-wide approach to show that the same terminal characters, including but not restricted to neurotransmitter identity, can be regulated by different transcription factors in different cell types, arguing for extensive phenotypic convergence. Our data provide a deep understanding of the developmental and functional specification of a complex brain structure.


Sign in / Sign up

Export Citation Format

Share Document