An efficient scRNA-seq dropout imputation method using graph attention network

Abstract Background Single-cell sequencing technology can address the amount of single-cell library data at the same time and display the heterogeneity of different cells. However, analyzing single-cell data is a computationally challenging problem. Because there are low counts in the gene expression region, it has a high chance of recognizing the non-zero entity as zero, which are called dropout events. At present, the mainstream dropout imputation methods cannot effectively recover the true expression of cells from dropout noise such as DCA, MAGIC, scVI, scImpute and SAVER. Results In this paper, we propose an autoencoder structure network, named GNNImpute. GNNImpute uses graph attention convolution to aggregate multi-level similar cell information and implements convolution operations on non-Euclidean space on scRNA-seq data. Distinct from current imputation tools, GNNImpute can accurately and effectively impute the dropout and reduce dropout noise. We use mean square error (MSE), mean absolute error (MAE), Pearson correlation coefficient (PCC) and Cosine similarity (CS) to measure the performance of different methods with GNNImpute. We analyze four real datasets, and our results show that the GNNImpute achieves 3.0130 MSE, 0.6781 MAE, 0.9073 PCC and 0.9134 CS. Furthermore, we use Adjusted rand index (ARI) and Normalized mutual information (NMI) to measure the clustering effect. The GNNImpute achieves 0.8199 (ARI) and 0.8368 (NMI), respectively. Conclusions In this investigation, we propose a single-cell dropout imputation method (GNNImpute), which effectively utilizes shared information for imputing the dropout of scRNA-seq data. We test it with different real datasets and evaluate its effectiveness in MSE, MAE, PCC and CS. The results show that graph attention convolution and autoencoder structure have great potential in single-cell dropout imputation.

Download Full-text

CMF-Impute: an accurate imputation tool for single-cell RNA-seq data

Bioinformatics ◽

10.1093/bioinformatics/btaa109 ◽

2020 ◽

Vol 36 (10) ◽

pp. 3139-3147 ◽

Cited By ~ 5

Author(s):

Junlin Xu ◽

Lijun Cai ◽

Bo Liao ◽

Wen Zhu ◽

JiaLiang Yang

Keyword(s):

Single Cell ◽

Pearson Correlation ◽

Cell Lineage ◽

Supplementary Information ◽

Adjusted Rand Index ◽

Clustering Methods ◽

Normalized Mutual Information ◽

Cell Subpopulations ◽

Matlab Package ◽

Expression Matrix

Abstract Motivation Single-cell RNA-sequencing (scRNA-seq) technology provides a powerful tool for investigating cell heterogeneity and cell subpopulations by allowing the quantification of gene expression at single-cell level. However, scRNA-seq data analysis remains challenging because of various technical noises such as dropout events (i.e. excessive zero counts in the expression matrix). Results By taking consideration of the association among cells and genes, we propose a novel collaborative matrix factorization-based method called CMF-Impute to impute the dropout entries of a given scRNA-seq expression matrix. We test CMF-Impute and compare it with the other five state-of-the-art methods on six popular real scRNA-seq datasets of various sizes and three simulated datasets. For simulated datasets, CMF-Impute outperforms other methods in imputing the closest dropouts to the original expression values as evaluated by both the sum of squared error and Pearson correlation coefficient. For real datasets, CMF-Impute achieves the most accurate cell classification results in spite of the choice of different clustering methods like SC3 or T-SNE followed by K-means as evaluated by both adjusted rand index and normalized mutual information. Finally, we demonstrate that CMF-Impute is powerful in reconstructing cell-to-cell and gene-to-gene correlation, and in inferring cell lineage trajectories. Availability and implementation CMF-Impute is written as a Matlab package which is available at https://github.com/xujunlin123/CMFImpute.git. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Expression Based Species Deconvolution and Realignment Removes Misalignent Error in Multi-species Single Cell Data

10.1101/2021.04.04.438147 ◽

2021 ◽

Author(s):

Jaeyong Choi ◽

Woochan Lee ◽

Jung-Ki Yoon ◽

Jong-Il Kim

Keyword(s):

Single Cell ◽

Single Cell Analysis ◽

Cost Effective ◽

Cell Types ◽

Computational Method ◽

Cell Library ◽

Public Data ◽

Cell Data ◽

Misalignment Error ◽

Human And Mouse

Background: Although single cell RNAseq of xenograft samples are widely used, there is no comprehensive pipeline for human and mouse mixed single cell analysis. Method: We used public data to assess misalignment error when using human and mouse combined reference, and generated a pipeline based on expression-based species deconvolution with species matching reference realignment to remove errors. We also found false-positive signals presumed to originate from ambient RNA of the other species, and use computational method to adequately remove them. Result: Misaligned reads account to on average 0.5% of total reads but expression of few genees were greatly affected leading to 99.8% loss in expression. Human and mouse mixed single cell data analyzed by our pipeline clustered well with unmixed data. We also applied our pipeline to multi-species multi-sample single cell library containing breast cancer xenograft tissue and successfully identified all identities along with the diverse cell types of tumor microenvironment. Conclusion: We present our pipeline for mixed human and mose single cell data which can also be applied to pooled libraries to obtain cost effective single cell data. We also address consideration points when analyzing mixed single cell data for future development.

Download Full-text

Faculty Opinions recommendation of Systems biology. Conditional density-based analysis of T cell signaling in single-cell data.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.723891088.793520867 ◽

2016 ◽

Author(s):

Anuj Kumar

Keyword(s):

Systems Biology ◽

T Cell ◽

Cell Signaling ◽

Single Cell ◽

Conditional Density ◽

T Cell Signaling ◽

Cell Data

Download Full-text

Prioritization of cell types responsive to biological perturbations in single-cell data with Augur

Nature Protocols ◽

10.1038/s41596-021-00561-x ◽

2021 ◽

Author(s):

Jordan W. Squair ◽

Michael A. Skinnider ◽

Matthieu Gautier ◽

Leonard J. Foster ◽

Grégoire Courtine

Keyword(s):

Single Cell ◽

Cell Types ◽

Cell Data

Download Full-text

Identifying cell types from single-cell data based on similarities and dissimilarities between cells

BMC Bioinformatics ◽

10.1186/s12859-020-03873-z ◽

2021 ◽

Vol 22 (S3) ◽

Author(s):

Yuanyuan Li ◽

Ping Luo ◽

Yi Lu ◽

Fang-Xiang Wu

Keyword(s):

Gene Expression ◽

Single Cell ◽

Spectral Clustering ◽

Incidence Matrix ◽

Expression Patterns ◽

Cell Types ◽

Clustering Method ◽

Different Types ◽

Cell Data ◽

Spectral Clustering Method

Abstract Background With the development of the technology of single-cell sequence, revealing homogeneity and heterogeneity between cells has become a new area of computational systems biology research. However, the clustering of cell types becomes more complex with the mutual penetration between different types of cells and the instability of gene expression. One way of overcoming this problem is to group similar, related single cells together by the means of various clustering analysis methods. Although some methods such as spectral clustering can do well in the identification of cell types, they only consider the similarities between cells and ignore the influence of dissimilarities on clustering results. This methodology may limit the performance of most of the conventional clustering algorithms for the identification of clusters, it needs to develop special methods for high-dimensional sparse categorical data. Results Inspired by the phenomenon that same type cells have similar gene expression patterns, but different types of cells evoke dissimilar gene expression patterns, we improve the existing spectral clustering method for clustering single-cell data that is based on both similarities and dissimilarities between cells. The method first measures the similarity/dissimilarity among cells, then constructs the incidence matrix by fusing similarity matrix with dissimilarity matrix, and, finally, uses the eigenvalues of the incidence matrix to perform dimensionality reduction and employs the K-means algorithm in the low dimensional space to achieve clustering. The proposed improved spectral clustering method is compared with the conventional spectral clustering method in recognizing cell types on several real single-cell RNA-seq datasets. Conclusions In summary, we show that adding intercellular dissimilarity can effectively improve accuracy and achieve robustness and that improved spectral clustering method outperforms the traditional spectral clustering method in grouping cells.

Download Full-text

Integrated analysis of multimodal single-cell data

Cell ◽

10.1016/j.cell.2021.04.048 ◽

2021 ◽

Author(s):

Yuhan Hao ◽

Stephanie Hao ◽

Erica Andersen-Nissen ◽

William M. Mauck ◽

Shiwei Zheng ◽

...

Keyword(s):

Single Cell ◽

Integrated Analysis ◽

Cell Data

Download Full-text

Multi-omics integration in the age of million single-cell data

Nature Reviews Nephrology ◽

10.1038/s41581-021-00463-x ◽

2021 ◽

Author(s):

Zhen Miao ◽

Benjamin D. Humphreys ◽

Andrew P. McMahon ◽

Junhyong Kim

Keyword(s):

Single Cell ◽

Omics Integration ◽

Cell Data

Download Full-text

Mechanistic models of cell-fate transitions from single-cell data

Current Opinion in Systems Biology ◽

10.1016/j.coisb.2021.04.004 ◽

2021 ◽

Author(s):

Gabriel Torregrosa ◽

Jordi Garcia-Ojalvo

Keyword(s):

Single Cell ◽

Cell Fate ◽

Mechanistic Models ◽

Cell Data

Download Full-text

High-throughput single cell data analysis – A tutorial

Analytica Chimica Acta ◽

10.1016/j.aca.2021.338872 ◽

2021 ◽

pp. 338872

Author(s):

Gerjen H. Tinnevelt ◽

Kristiaan Wouters ◽

Geert J. Postma ◽

Rita Folcarelli ◽

Jeroen J. Jansen

Keyword(s):

Data Analysis ◽

Single Cell ◽

High Throughput ◽

Cell Data

Download Full-text

484 Bioturing browser: interactively explore public single cell sequencing data

Journal for ImmunoTherapy of Cancer ◽

10.1136/jitc-2020-sitc2020.0484 ◽

2020 ◽

Vol 8 (Suppl 3) ◽

pp. A520-A520

Author(s):

Son Pham ◽

Tri Le ◽

Tan Phan ◽

Minh Pham ◽

Huy Nguyen ◽

...

Keyword(s):

Single Cell ◽

Immune Cell ◽

Expression Profiles ◽

Meta Analysis ◽

Cell Types ◽

Sequencing Data ◽

Single Cell Sequencing ◽

Data Formats ◽

Cancer Types ◽

Cell Data

BackgroundSingle-cell sequencing technology has opened an unprecedented ability to interrogate cancer. It reveals significant insights into the intratumoral heterogeneity, metastasis, therapeutic resistance, which facilitates target discovery and validation in cancer treatment. With rapid advancements in throughput and strategies, a particular immuno-oncology study can produce multi-omics profiles for several thousands of individual cells. This overflow of single-cell data poses formidable challenges, including standardizing data formats across studies, performing reanalysis for individual datasets and meta-analysis.MethodsN/AResultsWe present BioTuring Browser, an interactive platform for accessing and reanalyzing published single-cell omics data. The platform is currently hosting a curated database of more than 10 million cells from 247 projects, covering more than 120 immune cell types and subtypes, and 15 different cancer types. All data are processed and annotated with standardized labels of cell types, diseases, therapeutic responses, etc. to be instantly accessed and explored in a uniform visualization and analytics interface. Based on this massive curated database, BioTuring Browser supports searching similar expression profiles, querying a target across datasets and automatic cell type annotation. The platform supports single-cell RNA-seq, CITE-seq and TCR-seq data. BioTuring Browser is now available for download at www.bioturing.com.ConclusionsN/A

Download Full-text