LRSK: a low-rank self-representation K-means method for clustering single-cell RNA-sequencing data

Ye-Sen Sun; Le Ou-Yang; Dao-Qing Dai

doi:10.1039/d0mo00034e

Single-cell data clustering based on sparse optimization and low-rank matrix factorization

G3 Genes|Genome|Genetics ◽

10.1093/g3journal/jkab098 ◽

2021 ◽

Author(s):

Yinlei Hu ◽

Bin Li ◽

Falai Chen ◽

Kun Qu

Keyword(s):

Single Cell ◽

Rna Sequencing ◽

Matrix Factorization ◽

Data Clustering ◽

Cell Types ◽

Low Rank ◽

Sequencing Data ◽

Rank Matrix ◽

Single Cell Rna Sequencing ◽

Low Rank Matrix

Abstract Unsupervised clustering is a fundamental step of single-cell RNA sequencing data analysis. This issue has inspired several clustering methods to classify cells in single-cell RNA sequencing data. However, accurate prediction of the cell clusters remains a substantial challenge. In this study, we propose a new algorithm for single-cell RNA sequencing data clustering based on Sparse Optimization and low-rank matrix factorization (scSO). We applied our scSO algorithm to analyze multiple benchmark datasets and showed that the cluster number predicted by scSO was close to the number of reference cell types and that most cells were correctly classified. Our scSO algorithm is available at https://github.com/QuKunLab/scSO. Overall, this study demonstrates a potent cell clustering approach that can help researchers distinguish cell types in single-cell RNA sequencing data.

Download Full-text

Goals and approaches for each processing step for single-cell RNA sequencing data

Briefings in Bioinformatics ◽

10.1093/bib/bbaa314 ◽

2020 ◽

Author(s):

Zilong Zhang ◽

Feifei Cui ◽

Chunyu Wang ◽

Lingling Zhao ◽

Quan Zou

Keyword(s):

Gene Expression ◽

Single Cell ◽

Rna Sequencing ◽

Cellular Level ◽

Sequencing Data ◽

Analysis Tools ◽

Processing Step ◽

Study Gene Expression ◽

Single Cell Rna Sequencing ◽

Cell Data

Abstract Single-cell RNA sequencing (scRNA-seq) has enabled researchers to study gene expression at the cellular level. However, due to the extremely low levels of transcripts in a single cell and technical losses during reverse transcription, gene expression at a single-cell resolution is usually noisy and highly dimensional; thus, statistical analyses of single-cell data are a challenge. Although many scRNA-seq data analysis tools are currently available, a gold standard pipeline is not available for all datasets. Therefore, a general understanding of bioinformatics and associated computational issues would facilitate the selection of appropriate tools for a given set of data. In this review, we provide an overview of the goals and most popular computational analysis tools for the quality control, normalization, imputation, feature selection and dimension reduction of scRNA-seq data.

Download Full-text

Joint Gene Network Construction by Single-Cell RNA Sequencing Data

10.1101/2021.07.14.452387 ◽

2021 ◽

Author(s):

Meichen Dong ◽

Yiping He ◽

Yuchao Jiang ◽

Fei Zou

Keyword(s):

Single Cell ◽

Rna Sequencing ◽

Graphical Models ◽

Gene Networks ◽

Regulatory Networks ◽

Single Gene ◽

Matrix Completion ◽

Low Rank ◽

Sequencing Data ◽

Single Cell Rna Sequencing

In contrast to differential gene expression analysis at single-gene level, gene regulatory networks (GRN) analysis depicts complex transcriptomic interactions among genes for better understandings of underlying genetic architectures of human diseases and traits. Recently, single-cell RNA sequencing (scRNA-seq) data has started to be used for constructing GRNs at a much finer resolution than bulk RNA-seq data and microarray data. However, scRNA-seq data are inherently sparse which hinders the direct application of the popular Gaussian graphical models (GGMs). Furthermore, most existing approaches for constructing GRNs with scRNA-seq data only consider gene networks under one condition. To better understand GRNs under different but related conditions with single-cell resolution, we propose to construct Joint Gene Networks with scRNA-seq data (JGNsc) using the GGMs framework. To facilitate the use of GGMs, JGNsc first proposes a hybrid imputation procedure that combines a Bayesian zero-inflated Poisson (ZIP) model with an iterative low-rank matrix completion step to efficiently impute zero-inflated counts resulted from technical artifacts. JGNsc then transforms the imputed data via a nonparanormal transformation, based on which joint GGMs are constructed. We demonstrate JGNsc and assess its performance using synthetic data. The application of JGNsc on two cancer clinical studies of medulloblastoma and glioblastoma identifies novel findings in addition to confirming well-known biological results.

Download Full-text

Single-cell RNA Sequencing Data Clustering by Low-Rank Subspace Ensemble Framework

IEEE/ACM Transactions on Computational Biology and Bioinformatics ◽

10.1109/tcbb.2020.3029187 ◽

2020 ◽

pp. 1-1

Author(s):

ChuanYuan Wang ◽

Ying-Lian Gao ◽

Jin-Xing Liu ◽

Xiong-Zhen Kong ◽

Chun-Hou Zheng

Keyword(s):

Single Cell ◽

Rna Sequencing ◽

Data Clustering ◽

Low Rank ◽

Sequencing Data ◽

Single Cell Rna Sequencing

Download Full-text

McImpute: Matrix completion based imputation for single cell RNA-seq data

10.1101/361980 ◽

2018 ◽

Cited By ~ 3

Author(s):

Aanchal Mongia ◽

Debarka Sengupta ◽

Angshul Majumdar

Keyword(s):

Single Cell ◽

Rna Sequencing ◽

Expression Analysis ◽

Matrix Completion ◽

Differential Expression Analysis ◽

Low Rank ◽

Specific Cell ◽

Sequencing Data ◽

Reduction Techniques ◽

Single Cell Rna Sequencing

AbstractMotivationSingle cell RNA sequencing has been proved to be revolutionary for its potential of zooming into complex biological systems. Genome wide expression analysis at single cell resolution, provides a window into dynamics of cellular phenotypes. This facilitates characterization of transcriptional heterogeneity in normal and diseased tissues under various conditions. It also sheds light on development or emergence of specific cell populations and phenotypes. However, owing to the paucity of input RNA, a typical single cell RNA sequencing data features a high number of dropout events where transcripts fail to get amplified.ResultsWe introduce mcImpute, a low-rank matrix completion based technique to impute dropouts in single cell expression data. On a number of real datasets, application of mcImpute yields significant improvements in separation of true zeros from dropouts, cell-clustering, differential expression analysis, cell type separability, performance of dimensionality reduction techniques for cell visualization and gene distribution.Availability and Implementationhttps://github.com/aanchalMongia/McImpute_scRNAseq

Download Full-text

Zero-preserving imputation of single-cell RNA-seq data

Nature Communications ◽

10.1038/s41467-021-27729-z ◽

2022 ◽

Vol 13 (1) ◽

Author(s):

George C. Linderman ◽

Jun Zhao ◽

Manolis Roulis ◽

Piotr Bielecki ◽

Richard A. Flavell ◽

...

Keyword(s):

Single Cell ◽

Rna Sequencing ◽

Low Rank ◽

Matrix Approximation ◽

Rna Seq ◽

Sequencing Data ◽

Theoretical Justification ◽

Rank Matrix ◽

Single Cell Rna Sequencing ◽

Low Rank Matrix

AbstractA key challenge in analyzing single cell RNA-sequencing data is the large number of false zeros, where genes actually expressed in a given cell are incorrectly measured as unexpressed. We present a method based on low-rank matrix approximation which imputes these values while preserving biologically non-expressed genes (true biological zeros) at zero expression levels. We provide theoretical justification for this denoising approach and demonstrate its advantages relative to other methods on simulated and biological datasets.

Download Full-text

Mixed Distribution Models Based on Single-Cell RNA Sequencing Data

Interdisciplinary Sciences Computational Life Sciences ◽

10.1007/s12539-021-00427-6 ◽

2021 ◽

Author(s):

Min Wu ◽

Junhua Xu ◽

Tao Ding ◽

Jie Gao

Keyword(s):

Single Cell ◽

Rna Sequencing ◽

Sequencing Data ◽

Distribution Models ◽

Mixed Distribution ◽

Single Cell Rna Sequencing

Download Full-text

IMMU-27. SINGLE CELL RNA-SEQUENCING IDENTIFIES NOVEL BONE MARROW DERIVED MYELOID CELLS IN GLIOBLASTOMA ASSOCIATED WITH TUMOR AGGRESSION

Neuro-Oncology ◽

10.1093/neuonc/noaa215.457 ◽

2020 ◽

Vol 22 (Supplement_2) ◽

pp. ii110-ii110

Author(s):

Christina Jackson ◽

Christopher Cherry ◽

Sadhana Bom ◽

Hao Zhang ◽

John Choi ◽

...

Keyword(s):

Bone Marrow ◽

Single Cell ◽

Tumor Cells ◽

Rna Sequencing ◽

Metabolic Pathways ◽

Myeloid Cells ◽

Tumor Grade ◽

Sequencing Data ◽

Single Cell Rna Sequencing ◽

Two Populations

Abstract BACKGROUND Glioma associated myeloid cells (GAMs) can be induced to adopt an immunosuppressive phenotype that can lead to inhibition of anti-tumor responses in glioblastoma (GBM). Understanding the composition and phenotypes of GAMs is essential to modulating the myeloid compartment as a therapeutic adjunct to improve anti-tumor immune response. METHODS We performed single-cell RNA-sequencing (sc-RNAseq) of 435,400 myeloid and tumor cells to identify transcriptomic and phenotypic differences in GAMs across glioma grades. We further correlated the heterogeneity of the GAM landscape with tumor cell transcriptomics to investigate interactions between GAMs and tumor cells. RESULTS sc-RNAseq revealed a diverse landscape of myeloid-lineage cells in gliomas with an increase in preponderance of bone marrow derived myeloid cells (BMDMs) with increasing tumor grade. We identified two populations of BMDMs unique to GBMs; Mac-1and Mac-2. Mac-1 demonstrates upregulation of immature myeloid gene signature and altered metabolic pathways. Mac-2 is characterized by expression of scavenger receptor MARCO. Pseudotime and RNA velocity analysis revealed the ability of Mac-1 to transition and differentiate to Mac-2 and other GAM subtypes. We further found that the presence of these two populations of BMDMs are associated with the presence of tumor cells with stem cell and mesenchymal features. Bulk RNA-sequencing data demonstrates that gene signatures of these populations are associated with worse survival in GBM. CONCLUSION We used sc-RNAseq to identify a novel population of immature BMDMs that is associated with higher glioma grades. This population exhibited altered metabolic pathways and stem-like potentials to differentiate into other GAM populations including GAMs with upregulation of immunosuppressive pathways. Our results elucidate unique interactions between BMDMs and GBM tumor cells that potentially drives GBM progression and the more aggressive mesenchymal subtype. Our discovery of these novel BMDMs have implications in new therapeutic targets in improving the efficacy of immune-based therapies in GBM.

Download Full-text

Software Benchmark—Classification Tree Algorithms for Cell Atlases Annotation Using Single-Cell RNA-Sequencing Data

Microbiology Research ◽

10.3390/microbiolres12020022 ◽

2021 ◽

Vol 12 (2) ◽

pp. 317-334

Author(s):

Omar Alaqeeli ◽

Li Xing ◽

Xuekui Zhang

Keyword(s):

Single Cell ◽

Rna Sequencing ◽

Classification Tree ◽

Area Under The Curve ◽

Data Sets ◽

Sequencing Data ◽

Single Cell Rna Sequencing ◽

Tree Algorithms ◽

R Packages

Classification tree is a widely used machine learning method. It has multiple implementations as R packages; rpart, ctree, evtree, tree and C5.0. The details of these implementations are not the same, and hence their performances differ from one application to another. We are interested in their performance in the classification of cells using the single-cell RNA-Sequencing data. In this paper, we conducted a benchmark study using 22 Single-Cell RNA-sequencing data sets. Using cross-validation, we compare packages’ prediction performances based on their Precision, Recall, F1-score, Area Under the Curve (AUC). We also compared the Complexity and Run-time of these R packages. Our study shows that rpart and evtree have the best Precision; evtree is the best in Recall, F1-score and AUC; C5.0 prefers more complex trees; tree is consistently much faster than others, although its complexity is often higher than others.

Download Full-text

Modeling dynamic correlation in zero‐inflated bivariate count data with applications to single‐cell RNA sequencing data

Biometrics ◽

10.1111/biom.13457 ◽

2021 ◽

Author(s):

Zhen Yang ◽

Yen‐Yi Ho

Keyword(s):

Single Cell ◽

Rna Sequencing ◽

Count Data ◽

Sequencing Data ◽

Dynamic Correlation ◽

Single Cell Rna Sequencing

Download Full-text