A Robust and Scalable Graph Neural Network for Accurate Single Cell Classification

Mapping Intimacies ◽

10.1101/2021.06.24.449752 ◽

2021 ◽

Author(s):

Yuansong Zeng ◽

Xiang Zhou ◽

Zixiang Pan ◽

Yutong Lu ◽

Yuedong Yang

Keyword(s):

Neural Network ◽

Single Cell ◽

Message Passing ◽

High Speed ◽

Large Scale ◽

Cellular Heterogeneity ◽

Superior Performance ◽

Marker Genes ◽

Cell Classification ◽

High Resolution Data

Single-cell RNA sequencing (scRNA-seq) techniques provide high-resolution data on cellular heterogeneity in diverse tissues, and a critical step for the data analysis is cell type identification. Traditional methods usually cluster the cells and manually identify cell clusters through marker genes, which is time-consuming and subjective. With the launch of several large-scale single-cell projects, millions of sequenced cells have been annotated and it is promising to transfer labels from the annotated datasets to newly generated datasets. One powerful way for the transferring is to learn cell relations through the graph neural network (GNN), while vanilla GNN is difficult to process millions of cells due to the expensive costs of the message-passing procedure at each training epoch. Here, we have developed a robust and scalable GNN-based method for accurate single cell classification (GraphCS), where the graph is constructed to connect similar cells within and between labelled and unlabelled scRNA-seq datasets for propagation of shared information. To overcome the slow information propagation of GNN at each training epoch, the diffused information is pre-calculated via the approximate Generalized PageRank algorithm, enabling sublinear complexity for a high speed and scalability on millions of cells. Compared with existing methods, GraphCS demonstrates better performance on simulated, cross-platform, and cross-species scRNA-seq datasets. More importantly, our model can achieve superior performance on a large dataset with one million cells within 50 minutes.

Download Full-text

Artificial Neural Network System for Cell Classification using Single Cell RNA Expression

2020 IEEE International Conference on Bioinformatics and Biomedicine (BIBM) ◽

10.1109/bibm49941.2020.9313498 ◽

2020 ◽

Author(s):

Xin Lin ◽

Jiahui Zhong ◽

Minjie Lyu ◽

Sen Lin ◽

Derin B. Keskin ◽

...

Keyword(s):

Neural Network ◽

Artificial Neural Network ◽

Single Cell ◽

Rna Expression ◽

Network System ◽

Cell Classification ◽

Neural Network System ◽

Artificial Neural

Download Full-text

A fully automated method of human identification based on dental panoramic radiographs using a convolutional neural network

Dentomaxillofacial Radiology ◽

10.1259/dmfr.20210383 ◽

2021 ◽

Author(s):

Young Hyun Kim ◽

Eun-Gyu Ha ◽

Kug Jin Jeon ◽

Chena Lee ◽

Sang-Sun Han

Keyword(s):

Neural Network ◽

Convolutional Neural Network ◽

High Speed ◽

Large Scale ◽

Oral Surgery ◽

Human Identification ◽

Running Time ◽

Automated Method ◽

Image Characteristics ◽

Proposed Model

Objectives: This study aimed to develop a fully automated human identification method based on a convolutional neural network (CNN) with a large-scale dental panoramic radiograph (DPR) dataset. Methods: In total, 2,760 DPRs from 746 subjects who had 2 to 17 DPRs with various changes in image characteristics due to various dental treatments (tooth extraction, oral surgery, prosthetics, orthodontics, or tooth development) were collected. The test dataset included the latest DPR of each subject (746 images) and the other DPRs (2,014 images) were used for model training. A modified VGG16 model with two fully connected layers was applied for human identification. The proposed model was evaluated with rank-1, –3, and −5 accuracies, running time, and gradient-weighted class activation mapping (Grad-CAM)–applied images. Results: This model had rank-1,–3, and −5 accuracies of 82.84%, 89.14%, and 92.23%, respectively. All rank-1 accuracy values of the proposed model were above 80% regardless of changes in image characteristics. The average running time to train the proposed model was 60.9 sec per epoch, and the prediction time for 746 test DPRs was short (3.2 sec/image). The Grad-CAM technique verified that the model automatically identified humans by focusing on identifiable dental information. Conclusion: The proposed model showed good performance in fully automatic human identification despite differing image characteristics of DPRs acquired from the same patients. Our model is expected to assist in the fast and accurate identification by experts by comparing large amounts of images and proposing identification candidates at high speed.

Download Full-text

Evaluation of single-cell classifiers for single-cell RNA sequencing data sets

Briefings in Bioinformatics ◽

10.1093/bib/bbz096 ◽

2019 ◽

Vol 21 (5) ◽

pp. 1581-1595 ◽

Cited By ~ 6

Author(s):

Xinlei Zhao ◽

Shuang Wu ◽

Nan Fang ◽

Xiao Sun ◽

Jue Fan

Keyword(s):

Single Cell ◽

Rna Sequencing ◽

Reference Data ◽

Predictive Accuracy ◽

Cell Types ◽

Superior Performance ◽

Marker Genes ◽

Data Sets ◽

Sequencing Data ◽

Single Cell Rna Sequencing

Abstract Single-cell RNA sequencing (scRNA-seq) has been rapidly developing and widely applied in biological and medical research. Identification of cell types in scRNA-seq data sets is an essential step before in-depth investigations of their functional and pathological roles. However, the conventional workflow based on clustering and marker genes is not scalable for an increasingly large number of scRNA-seq data sets due to complicated procedures and manual annotation. Therefore, a number of tools have been developed recently to predict cell types in new data sets using reference data sets. These methods have not been generally adapted due to a lack of tool benchmarking and user guidance. In this article, we performed a comprehensive and impartial evaluation of nine classification software tools specifically designed for scRNA-seq data sets. Results showed that Seurat based on random forest, SingleR based on correlation analysis and CaSTLe based on XGBoost performed better than others. A simple ensemble voting of all tools can improve the predictive accuracy. Under nonideal situations, such as small-sized and class-imbalanced reference data sets, tools based on cluster-level similarities have superior performance. However, even with the function of assigning ‘unassigned’ labels, it is still challenging to catch novel cell types by solely using any of the single-cell classifiers. This article provides a guideline for researchers to select and apply suitable classification tools in their analysis workflows and sheds some lights on potential direction of future improvement on classification tools.

Download Full-text

Analysis of the Calculation of a Plasma Sheath Using the Parallel SO-DGTD Method

International Journal of Antennas and Propagation ◽

10.1155/2019/7160913 ◽

2019 ◽

Vol 2019 ◽

pp. 1-9

Author(s):

Qian Yang ◽

Bing Wei ◽

Linqian Li ◽

Debiao Ge

Keyword(s):

Discontinuous Galerkin ◽

Cross Section ◽

Time Domain ◽

Message Passing ◽

High Speed ◽

Large Scale ◽

Message Passing Interface ◽

Shift Operator ◽

Plasma Sheath ◽

Blunt Cone

The plasma sheath is known as a popular topic of computational electromagnetics, and the plasma case is more resource-intensive than the non-plasma case. In this paper, a parallel shift-operator discontinuous Galerkin time-domain method using the MPI (Message Passing Interface) library is proposed to solve the large-scale plasma problems. To demonstrate our algorithm, a plasma sheath model of the high-speed blunt cone was established based on the results of the multiphysics software, and our algorithm was used to extract the radar cross-section (RCS) versus different incident angles of the model.

Download Full-text

A Single-Cell Transcriptome Atlas of the Mouse Glomerulus

Journal of the American Society of Nephrology ◽

10.1681/asn.2018030238 ◽

2018 ◽

Vol 29 (8) ◽

pp. 2060-2068 ◽

Cited By ~ 53

Author(s):

Nikos Karaiskos ◽

Mahdieh Rahmatollahi ◽

Anastasiya Boltengagen ◽

Haiyue Liu ◽

Martin Hoehne ◽

...

Keyword(s):

Gene Expression ◽

Endothelial Cells ◽

Single Cell ◽

Mesangial Cells ◽

Transcriptional Profiling ◽

Wild Type Mouse ◽

Cell Types ◽

Cellular Heterogeneity ◽

Marker Genes ◽

Glomerular Cell

Background Three different cell types constitute the glomerular filter: mesangial cells, endothelial cells, and podocytes. However, to what extent cellular heterogeneity exists within healthy glomerular cell populations remains unknown.Methods We used nanodroplet-based highly parallel transcriptional profiling to characterize the cellular content of purified wild-type mouse glomeruli.Results Unsupervised clustering of nearly 13,000 single-cell transcriptomes identified the three known glomerular cell types. We provide a comprehensive online atlas of gene expression in glomerular cells that can be queried and visualized using an interactive and freely available database. Novel marker genes for all glomerular cell types were identified and supported by immunohistochemistry images obtained from the Human Protein Atlas. Subclustering of endothelial cells revealed a subset of endothelium that expressed marker genes related to endothelial proliferation. By comparison, the podocyte population appeared more homogeneous but contained three smaller, previously unknown subpopulations.Conclusions Our study comprehensively characterized gene expression in individual glomerular cells and sets the stage for the dissection of glomerular function at the single-cell level in health and disease.

Download Full-text

A Joint Deep Learning Model for Simultaneous Batch Effect Correction, Denoising and Clustering in Single-Cell Transcriptomics

10.1101/2020.09.23.310003 ◽

2020 ◽

Cited By ~ 1

Author(s):

Justin Lakkis ◽

David Wang ◽

Yuanchao Zhang ◽

Gang Hu ◽

Kui Wang ◽

...

Keyword(s):

Gene Expression ◽

Deep Learning ◽

Single Cell ◽

Large Scale ◽

Nearest Neighbor ◽

Learning Model ◽

Batch Effect ◽

Marker Genes ◽

Deep Learning Model ◽

Variable Genes

AbstractRecent development of single-cell RNA-seq (scRNA-seq) technologies has led to enormous biological discoveries. As the scale of scRNA-seq studies increases, a major challenge in analysis is batch effect, which is inevitable in studies involving human tissues. Most existing methods remove batch effect in a low-dimensional embedding space. Although useful for clustering, batch effect is still present in the gene expression space, leaving downstream gene-level analysis susceptible to batch effect. Recent studies have shown that batch effect correction in the gene expression space is much harder than in the embedding space. Popular methods such as Seurat3.0 rely on the mutual nearest neighbor (MNN) approach to remove batch effect in the gene expression space, but MNN can only analyze two batches at a time and it becomes computationally infeasible when the number of batches is large. Here we present CarDEC, a joint deep learning model that simultaneously clusters and denoises scRNA-seq data, while correcting batch effect both in the embedding and the gene expression space. Comprehensive evaluations spanning different species and tissues showed that CarDEC consistently outperforms scVI, DCA, and MNN. With CarDEC denoising, those non-highly variable genes offer as much signal for clustering as the highly variable genes, suggesting that CarDEC substantially boosted information content in scRNA-seq. We also showed that trajectory analysis using CarDEC’s denoised and batch corrected expression as input revealed marker genes and transcription factors that are otherwise obscured in the presence of batch effect. CarDEC is computationally fast, making it a desirable tool for large-scale scRNA-seq studies.

Download Full-text

DeepDRIM: a deep neural network to reconstruct cell-type-specific gene regulatory network using single-cell RNA-Seq Data

10.1101/2021.02.03.429484 ◽

2021 ◽

Author(s):

Jiaxing Chen ◽

Chinwang Cheong ◽

Liang Lan ◽

Xin Zhou ◽

Jiming Liu ◽

...

Keyword(s):

Neural Network ◽

Single Cell ◽

Regulatory Networks ◽

Deep Neural Network ◽

Neighborhood Context ◽

Cellular Heterogeneity ◽

Specific Gene ◽

Rna Seq ◽

Cell Type Specific ◽

Gene Regulatory

AbstractSingle-cell RNA sequencing is used to capture cell-specific gene expression, thus allowing reconstruction of gene regulatory networks. The existing algorithms struggle to deal with dropouts and cellular heterogeneity, and commonly require pseudotime-ordered cells. Here, we describe DeepDRIM a supervised deep neural network that represents gene pair joint expression as images and considers the neighborhood context to eliminate the transitive interactions. Deep-DRIM yields significantly better performance than the other nine algorithms used on the eight cell lines tested, and can be used to successfully discriminate key functional modules between patients with mild and severe symptoms of coronavirus disease 2019 (COVID-19).

Download Full-text

Meta-Analysis of cortical inhibitory interneurons markers landscape and their performances in scRNA-seq studies.

10.1101/2021.11.03.467049 ◽

2021 ◽

Author(s):

Lorenzo Martini ◽

Roberta Bardini ◽

Stefano Di Carlo

Keyword(s):

Single Cell ◽

Meta Analysis ◽

Cell Types ◽

Cellular Heterogeneity ◽

Marker Genes ◽

Inhibitory Interneurons ◽

Rna Seq ◽

Circuit Function ◽

The Brain

The mammalian cortex contains a great variety of neuronal cells. In particular, GABAergic interneurons, which play a major role in neuronal circuit function, exhibit an extraordinary diversity of cell types. In this regard, single-cell RNA-seq analysis is crucial to study cellular heterogeneity. To identify and analyze rare cell types, it is necessary to reliably label cells through known markers. In this way, all the related studies are dependent on the quality of the employed marker genes. Therefore, in this work, we investigate how a set of chosen inhibitory interneurons markers perform. The gene set consists of both immunohistochemistry-derived genes and single-cell RNA-seq taxonomy ones. We employed various human and mouse datasets of the brain cortex, consequently processed with the Monocle3 pipeline. We defined metrics based on the relations between unsupervised cluster results and the marker expression. Specifically, we calculated the specificity, the fraction of cells expressing, and some metrics derived from decision tree analysis like entropy gain and impurity reduction. The results highlighted the strong reliability of some markers but also the low quality of others. More interestingly, though, a correlation emerges between the general performances of the genes set and the experimental quality of the datasets. Therefore, the proposed method allows evaluating the quality of a dataset in relation to its reliability regarding the inhibitory interneurons cellular heterogeneity study.

Download Full-text

Reference-free Cell-type Annotation for Single-cell Transcriptomics using Deep Learning with a Weighted Graph Neural Network

10.1101/2020.05.13.094953 ◽

2020 ◽

Author(s):

Xin Shao ◽

Haihong Yang ◽

Xiang Zhuang ◽

Jie Liao ◽

Yueren Yang ◽

...

Keyword(s):

Neural Network ◽

Deep Learning ◽

Single Cell ◽

Weighted Graph ◽

Transcriptome Profiling ◽

Free Cell ◽

Marker Genes ◽

Cell Type ◽

Mouse Tissues ◽

Reference Knowledge

AbstractAdvances in single-cell RNA sequencing (scRNA-seq) have furthered the simultaneous classification of thousands of cells in a single assay based on transcriptome profiling. In most analysis protocols, single-cell type annotation relies on marker genes or RNA-seq profiles, resulting in poor extrapolation. Here, we introduce scDeepSort (https://github.com/ZJUFanLab/scDeepSort), a reference-free cell-type annotation tool for single-cell transcriptomics that uses a deep learning model with a weighted graph neural network. Using human and mouse scRNA-seq data resources, we demonstrate the feasibility of scDeepSort and its high accuracy in labeling 764,741 cells involving 56 human and 32 mouse tissues. Significantly, scDeepSort outperformed reference-dependent methods in annotating 76 external testing scRNA-seq datasets, including 126,384 cells (85.79%) from ten human tissues and 134,604 cells from 12 mouse tissues (81.30%). scDeepSort accurately revealed cell identities without prior reference knowledge, thus potentially providing new insights into mechanisms underlying biological processes, disease pathogenesis, and disease progression at a single-cell resolution.

Download Full-text

Phenotypic convergence in the brain: distinct transcription factors regulate common terminal neuronal characters

10.1101/243113 ◽

2018 ◽

Cited By ~ 2

Author(s):

Nikos Konstantinides ◽

Katarina Kapuralin ◽

Chaimaa Fadil ◽

Luendreo Barboza ◽

Rahul Satija ◽

...

Keyword(s):

Transcription Factors ◽

Single Cell ◽

Large Scale ◽

Single Cells ◽

Deep Understanding ◽

Cell Types ◽

Marker Genes ◽

Cell Type ◽

Functional Specification ◽

Phenotypic Convergence

SummaryTranscription factors regulate the molecular, morphological, and physiological characters of neurons and generate their impressive cell type diversity. To gain insight into general principles that govern how transcription factors regulate cell type diversity, we used large-scale single-cell mRNA sequencing to characterize the extensive cellular diversity in the Drosophila optic lobes. We sequenced 55,000 single optic lobe neurons and glia and assigned them to 52 clusters of transcriptionally distinct single cells. We validated the clustering and annotated many of the clusters using RNA sequencing of characterized FACS-sorted single cell types, as well as marker genes specific to given clusters. To identify transcription factors responsible for inducing specific terminal differentiation features, we used machine-learning to generate a ‘random forest’ model. The predictive power of the model was confirmed by showing that two transcription factors expressed specifically in cholinergic (apterous) and glutamatergic (traffic-jam) neurons are necessary for the expression of ChAT and VGlut in many, but not all, cholinergic or glutamatergic neurons, respectively. We used a transcriptome-wide approach to show that the same terminal characters, including but not restricted to neurotransmitter identity, can be regulated by different transcription factors in different cell types, arguing for extensive phenotypic convergence. Our data provide a deep understanding of the developmental and functional specification of a complex brain structure.

Download Full-text