Ensemble Adaptive Total Variation Graph Regularized NMF for Single-cell RNA-Seq Data Analysis

2021 ◽  
Vol 16 ◽  
Author(s):  
Ya-Li Zhu ◽  
Ying-Lian Gao ◽  
Jin-Xing Liu ◽  
Rong Zhu ◽  
Xiang-Zhen Kong

Background: Single-cell RNA sequencing techniques have emerged as effective approaches for finding the heterogeneity between cells and discovering the differentiation stage. Adaptive total variation graph regularized nonnegative matrix factorization (ATV-NMF) has been proposed to capture the inner geometric structure and determine whether to retain feature details or denoise, which is suitable for analyzing single-cell data. However, the rank of matrix factorization significantly affects clustering performance greatly, and it is still challenging to determine the optimal rank. Objective: To solve the problem, in this paper, we propose an ensemble clustering method ANMF-CE to integrate several base clustering results corresponding to different parameter rank values. Method: First, we use the ATV-NMF algorithm to obtain clustering results with different dimension reduction ranks. Second, the consensus function based on connected-triple-based similarity is applied to obtain the similarity matrix. Finally, the spectral clustering method is used to find the final optimal partition. Results: Clustering results on six single-cell sequencing datasets show that our method is more advanced than the individual ATV-NMF method and other comparison methods, which can illustrate that our method is effective in finding the heterogeneity in single-cell datasets. Moreover, the identification of gene markers also achieves accurate results. Conclusion: In summary, our method is effective for analyzing single-cell RNA sequencing datasets.

Author(s):  
Yinlei Hu ◽  
Bin Li ◽  
Falai Chen ◽  
Kun Qu

Abstract Unsupervised clustering is a fundamental step of single-cell RNA sequencing data analysis. This issue has inspired several clustering methods to classify cells in single-cell RNA sequencing data. However, accurate prediction of the cell clusters remains a substantial challenge. In this study, we propose a new algorithm for single-cell RNA sequencing data clustering based on Sparse Optimization and low-rank matrix factorization (scSO). We applied our scSO algorithm to analyze multiple benchmark datasets and showed that the cluster number predicted by scSO was close to the number of reference cell types and that most cells were correctly classified. Our scSO algorithm is available at https://github.com/QuKunLab/scSO. Overall, this study demonstrates a potent cell clustering approach that can help researchers distinguish cell types in single-cell RNA sequencing data.


2019 ◽  
Vol 35 (22) ◽  
pp. 4827-4829 ◽  
Author(s):  
Xiao-Fei Zhang ◽  
Le Ou-Yang ◽  
Shuo Yang ◽  
Xing-Ming Zhao ◽  
Xiaohua Hu ◽  
...  

Abstract Summary Imputation of dropout events that may mislead downstream analyses is a key step in analyzing single-cell RNA-sequencing (scRNA-seq) data. We develop EnImpute, an R package that introduces an ensemble learning method for imputing dropout events in scRNA-seq data. EnImpute combines the results obtained from multiple imputation methods to generate a more accurate result. A Shiny application is developed to provide easier implementation and visualization. Experiment results show that EnImpute outperforms the individual state-of-the-art methods in almost all situations. EnImpute is useful for correcting the noisy scRNA-seq data before performing downstream analysis. Availability and implementation The R package and Shiny application are available through Github at https://github.com/Zhangxf-ccnu/EnImpute. Supplementary information Supplementary data are available at Bioinformatics online.


2019 ◽  
Vol 15 (8) ◽  
pp. e1007040 ◽  
Author(s):  
Ming-Wen Hu ◽  
Dong Won Kim ◽  
Sheng Liu ◽  
Donald J. Zack ◽  
Seth Blackshaw ◽  
...  

Author(s):  
Lan Wu ◽  
Yan-Fei Li ◽  
Jun-wei Shen ◽  
Qian Zhu ◽  
Jing Jiang ◽  
...  

Previous studies have revealed the diversity of the whole cardiac cellulome but not refined the left ventricle, which was essential for finding therapeutic targets. Here, we characterized single-cell transcriptional profiles of the mouse left ventricular cellular landscape using single-cell RNA sequencing (10×Genomics). Detailed t-Distributed Stochastic Neighbor Embedding (tSNE) analysis revealed the cell types of left ventricle with gene markers. Left ventricular cellulome contained cardiomyocytes highly expressed Trdn, endothelial cells highly expressed Pcdh17, fibroblast highly expressed Lama2 and macrophages highly expressed Hpgds, also proved by in situ hybridization. Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes pathway (KEGG) enrichment analysis (ListHits>2, p<0.05) were employed with the DAVID database to investigate subtypes of each cell type with the underlying functions of differentially expressed genes (DEGs). Endothelial cells included five subtypes, fibroblasts comprised of seven subtypes and macrophages contained eleven subtypes. The key representative DEGs (p<0.001) were Gja4 and Gja5 in cluster 3 of endothelial cells, Aqp2 and Thbs4 in cluster 2 of fibroblasts, as well as Clec4e and Trem-1 in in cluster 3 of marcophages perhaps involved in the occur of atherosclerosis, heart failure and acute myocardial infarction proved by literature review. We also revealed extensive networks of intercellular communication in left ventricle. We suggested possible therapeutic targets for cardiovascular disease and autocrine and paracrine signaling underpins left ventricular homeostasis. This study provided new insights into the structure and function of the mammalian left ventricular cellulome and offers an important resource that will stimulate studies in cardiovascular research.


2020 ◽  
Vol 2 (3) ◽  
Author(s):  
Shuqin Zhang ◽  
Liu Yang ◽  
Jinwen Yang ◽  
Zhixiang Lin ◽  
Michael K Ng

Abstract Single cell RNA-sequencing (scRNA-seq) technology, a powerful tool for analyzing the entire transcriptome at single cell level, is receiving increasing research attention. The presence of dropouts is an important characteristic of scRNA-seq data that may affect the performance of downstream analyses, such as dimensionality reduction and clustering. Cells sequenced to lower depths tend to have more dropouts than those sequenced to greater depths. In this study, we aimed to develop a dimensionality reduction method to address both dropouts and the non-negativity constraints in scRNA-seq data. The developed method simultaneously performs dimensionality reduction and dropout imputation under the non-negative matrix factorization (NMF) framework. The dropouts were modeled as a non-negative sparse matrix. Summation of the observed data matrix and dropout matrix was approximated by NMF. To ensure the sparsity pattern was maintained, a weighted ℓ1 penalty that took into account the dependency of dropouts on the sequencing depth in each cell was imposed. An efficient algorithm was developed to solve the proposed optimization problem. Experiments using both synthetic data and real data showed that dimensionality reduction via the proposed method afforded more robust clustering results compared with those obtained from the existing methods, and that dropout imputation improved the differential expression analysis.


PeerJ ◽  
2020 ◽  
Vol 8 ◽  
pp. e10469
Author(s):  
Daniel Dimitrov ◽  
Quan Gu

Background RNA sequencing is an indispensable research tool used in a broad range of transcriptome analysis studies. The most common application of RNA Sequencing is differential expression analysis and it is used to determine genetic loci with distinct expression across different conditions. An emerging field called single-cell RNA sequencing is used for transcriptome profiling at the individual cell level. The standard protocols for both of these approaches include the processing of sequencing libraries and result in the generation of count matrices. An obstacle to these analyses and the acquisition of meaningful results is that they require programing expertise. Although some effort has been directed toward the development of user-friendly RNA-Seq analysis analysis tools, few have the flexibility to explore both Bulk and single-cell RNA sequencing. Implementation BingleSeq was developed as an intuitive application that provides a user-friendly solution for the analysis of count matrices produced by both Bulk and Single-cell RNA-Seq experiments. This was achieved by building an interactive dashboard-like user interface which incorporates three state-of-the-art software packages for each type of the aforementioned analyses. Furthermore, BingleSeq includes additional features such as visualization techniques, extensive functional annotation analysis and rank-based consensus for differential gene analysis results. As a result, BingleSeq puts some of the best reviewed and most widely used packages and tools for RNA-Seq analyses at the fingertips of biologists with no programing experience. Availability BingleSeq is as an easy-to-install R package available on GitHub at https://github.com/dbdimitrov/BingleSeq/.


2021 ◽  
Author(s):  
He-zuo Lü ◽  
Xin-Yi Lyu ◽  
Jing-Lu Li ◽  
Shu-Qin Ding ◽  
Jian-Guo Hu

Abstract Background The myeloid cells play a vital role in health and disease of central nervous system (CNS). However, how to clearly distinguish them is still a knotty problem. At present, single-cell RNA Sequencing (scRNA-Seq) technology can sequence thousands of cells at the single-cell level, and then divide the cells into different clusters according to the similarity of gene expression, but it is still difficult to further identity these cell clusters. Generally, there are some specific marker genes for cell-type identities. However, it is difficult to distinguish a variety of myeloid cells in the CNS, because these cells often have the same or cross gene markers, and some markers will change significantly in different pathological states. Therefore, establishing a simple and practical method to distinguish these cell populations is of great significance for the analysis of scRNA-Seq data. Methods Referring to CellMarker (http://biocc.hrbmu.edu.cn/CellMarker/), PanglaoDB (https://panglaodb.se/) and Mouse Cell Atlas (http://bis.zju.edu.cn/MCA/gallery.html), combining with the recent literatures, a simple Excel template was designed, in which a panel of gene makers corresponding to the myeloid cells were included. The 83 cell clusters from several recently reported single-cell data were used to verify the accuracy of this template. Results This template could easily distinguish myeloid cell-subtypes and non-myeloid cells. Comparing with literatures, the overall consistency rate was 93.98%. There was no statistically significant difference between the two groups (Bowker’s test, P >0.05). Kappa symmetric measures showed that the Kappa value = 0.642 (P < 0.01). Conclusions The cell identities of scRNA-Seq cluster data could be performed using our simple Excel formulae, a panel of gene markers and ideal cell clustering data are the basis for accurate identification of CNS myeloid cell-subtypes.


2020 ◽  
Vol 32 (5) ◽  
pp. 111-120
Author(s):  
Maria Andreevna Akimenkova ◽  
Anna Anatolyevna Maznina ◽  
Anton Yurievich Naumov ◽  
Evgeny Andreevich Karpulevich

One of the main tasks in the analysis of single cell RNA sequencing (scRNA-seq) data is the identification of cell types and subtypes, which is usually based on some method of clustering. There is a number of generally accepted approaches to solving the clustering problem, one of which is implemented in the Seurat package. In addition, the quality of clustering is influenced by the use of preprocessing algorithms, such as imputation, dimensionality reduction, feature selection, etc. In the article, the HDBSCAN hierarchical clustering method is used to cluster scRNA-seq data. For a more complete comparison Experiments and comparisons were made on two labeled datasets: Zeisel (3005 cells) and Romanov (2881 cells). To compare the quality of clustering, two external metrics were used: Adjusted Rand index and V-measure. The experiments demonstrated a higher quality of clustering by the HDBSCAN method on the Zeisel dataset and a poorer quality on the Romanov dataset.


Sign in / Sign up

Export Citation Format

Share Document