Truncated Robust Principal Component Analysis and Noise Reduction for Single Cell RNA-seq Data

AbstractSingle-cell RNA-Seq measurements are commonly affected by high levels of technical noise, posing challenges for data analysis and visualization. A diverse array of methods has been proposed to computationally remove noise by sharing information across similar cells or genes, however their respective accuracies have been difficult to establish. Here, we propose a simple denoising strategy based on principal component analysis (PCA). We show that while PCA performed on raw data is biased towards highly expressed genes, this bias can be mitigated with a cell aggregation step, allowing the recovery of denoised expression values for both highly and lowly expressed genes. We benchmark our resulting ENHANCE algorithm and three previously described methods on simulated data that closely mimic real datasets, showing that ENHANCE provides the best overall denoising accuracy, recovering modules of co-expressed genes and cell subpopulations. Implementations of our algorithm are available at https://github.com/yanailab/enhance.

Download Full-text

Robust principal component analysis for accurate outlier sample detection in RNA-Seq data

BMC Bioinformatics ◽

10.1186/s12859-020-03608-0 ◽

2020 ◽

Vol 21 (1) ◽

Cited By ~ 1

Author(s):

Xiaoying Chen ◽

Bo Zhang ◽

Ting Wang ◽

Azad Bonni ◽

Guoyan Zhao

Keyword(s):

Principal Component Analysis ◽

Principal Component ◽

Component Analysis ◽

Rna Seq ◽

Robust Principal Component Analysis

Download Full-text

Structure-Aware Principal Component Analysis for Single-Cell RNA-seq Data

Journal of Computational Biology ◽

10.1089/cmb.2018.0027 ◽

2018 ◽

Vol 25 (12) ◽

pp. 1365-1373 ◽

Cited By ~ 6

Author(s):

Snehalika Lall ◽

Debajyoti Sinha ◽

Sanghamitra Bandyopadhyay ◽

Debarka Sengupta

Keyword(s):

Principal Component Analysis ◽

Single Cell ◽

Principal Component ◽

Component Analysis ◽

Rna Seq

Download Full-text

Benchmarking principal component analysis for large-scale single-cell RNA-sequencing

10.1101/642595 ◽

2019 ◽

Cited By ~ 1

Author(s):

Koki Tsuyuzaki ◽

Hiroyuki Sato ◽

Kenta Sato ◽

Itoshi Nikaido

Keyword(s):

Principal Component Analysis ◽

Single Cell ◽

Large Scale ◽

Principal Component ◽

Component Analysis ◽

Rna Seq ◽

Large Memory ◽

Synthetic Datasets ◽

Selection Of ◽

Memory Efficient

AbstractPrincipal component analysis (PCA) is an essential method for analyzing single-cell RNA-seq (scRNA-seq) datasets, but large-scale scRNA-seq datasets require long computational times and a large memory capacity.In this work, we review 21 fast and memory-efficient PCA implementations (10 algorithms) and evaluate their application using 4 real and 18 synthetic datasets. Our benchmarking showed that some PCA algorithms are faster, more memory efficient, and more accurate than others. In consideration of the differences in the computational environments of users and developers, we have also developed guidelines to assist with selection of appropriate PCA implementations.

Download Full-text

Noise reduction and brain mapping based robust principal component analysis

2015 IEEE 12th International Conference on Networking, Sensing and Control ◽

10.1109/icnsc.2015.7116096 ◽

2015 ◽

Cited By ~ 1

Author(s):

Arjon Turnip

Keyword(s):

Principal Component Analysis ◽

Noise Reduction ◽

Brain Mapping ◽

Principal Component ◽

Component Analysis ◽

Robust Principal Component Analysis

Download Full-text

Visualizing Single-Cell RNA-seq Data with Semisupervised Principal Component Analysis

International Journal of Molecular Sciences ◽

10.3390/ijms21165797 ◽

2020 ◽

Vol 21 (16) ◽

pp. 5797

Author(s):

Zhenqiu Liu

Keyword(s):

Principal Component Analysis ◽

Dimension Reduction ◽

Single Cell ◽

Optimal Solution ◽

Principal Component ◽

Component Analysis ◽

Biological Information ◽

Rna Seq ◽

Computationally Efficient ◽

Leibler Divergence

Single-cell RNA-seq (scRNA-seq) is a powerful tool for analyzing heterogeneous and functionally diverse cell population. Visualizing scRNA-seq data can help us effectively extract meaningful biological information and identify novel cell subtypes. Currently, the most popular methods for scRNA-seq visualization are principal component analysis (PCA) and t-distributed stochastic neighbor embedding (t-SNE). While PCA is an unsupervised dimension reduction technique, t-SNE incorporates cluster information into pairwise probability, and then maximizes the Kullback–Leibler divergence. Uniform Manifold Approximation and Projection (UMAP) is another recently developed visualization method similar to t-SNE. However, one limitation with UMAP and t-SNE is that they can only capture the local structure of the data, the global structure of the data is not faithfully preserved. In this manuscript, we propose a semisupervised principal component analysis (ssPCA) approach for scRNA-seq visualization. The proposed approach incorporates cluster-labels into dimension reduction and discovers principal components that maximize both data variance and cluster dependence. ssPCA must have cluster-labels as its input. Therefore, it is most useful for visualizing clusters from a scRNA-seq clustering software. Our experiments with simulation and real scRNA-seq data demonstrate that ssPCA is able to preserve both local and global structures of the data, and uncover the transition and progressions in the data, if they exist. In addition, ssPCA is convex and has a global optimal solution. It is also robust and computationally efficient, making it viable for scRNA-seq cluster visualization.

Download Full-text