Q-Mer Analysis: A Generalized Method for Analyzing RNA-Seq Data

Mapping Intimacies ◽

10.21203/rs.3.rs-914457/v1 ◽

2021 ◽

Author(s):

Tatsuma Shoji ◽

Yoshiharu Sato

Keyword(s):

Principal Component Analysis ◽

Homo Sapiens ◽

Principal Component ◽

Component Analysis ◽

Rna Seq ◽

Disease Mechanisms ◽

Number Of Genes ◽

Q 14

Abstract Background: RNA-Seq data are usually summarized by counting the number of transcript reads aligned to each gene. However, count-based methods do not take alignment information, where and how each read was mapped in the gene, into account. This information is essential to characterize samples accurately. In this study, we developed a method to summarize RNA-Seq data without losing alignment information. Results: To include alignment information, we introduce “q-mer analysis,” which summarizes RNA-Seq data with 4q kinds of q-length oligomers. Using publicly available RNA-Seq datasets, we demonstrate that at least q ≧ 9 is required for capturing alignment information in Homo sapiens. It should be noted that 49 = 262,144 is approximately 10 times larger than the number of genes in H. sapiens (20,022 genes). Furthermore, principal component analysis showed that q-mer analysis with q = 14 linearly distinguished samples from controls, while a count-based method failed. These results indicate that alignment information is essential to characterize transcriptomics samples. Conclusions: In conclusion, we introduce q-mer analysis to include alignment information in RNA-Seq analysis and demonstrate the superiority of q-mer analysis over count-based methods in that q-mer analysis can distinguish case samples from controls. Combining RNA-Seq research with q-mer analysis could be useful for identifying distinguishing transcriptomic features that could provide hypotheses for disease mechanisms.

Download Full-text

Truncated Robust Principal Component Analysis and Noise Reduction for Single Cell RNA-seq Data

Bioinformatics Research and Applications - Lecture Notes in Computer Science ◽

10.1007/978-3-319-94968-0_32 ◽

2018 ◽

pp. 335-346

Author(s):

Krzysztof Gogolewski ◽

Maciej Sykulski ◽

Neo Christopher Chung ◽

Anna Gambin

Keyword(s):

Principal Component Analysis ◽

Noise Reduction ◽

Single Cell ◽

Principal Component ◽

Component Analysis ◽

Rna Seq ◽

Robust Principal Component Analysis

Download Full-text

Accurate denoising of single-cell RNA-Seq data using unbiased principal component analysis

10.1101/655365 ◽

2019 ◽

Cited By ~ 11

Author(s):

Florian Wagner ◽

Dalia Barkley ◽

Itai Yanai

Keyword(s):

Principal Component Analysis ◽

Single Cell ◽

Simulated Data ◽

Principal Component ◽

Cell Aggregation ◽

Component Analysis ◽

Rna Seq ◽

Highly Expressed Genes ◽

Cell Subpopulations ◽

Aggregation Step

AbstractSingle-cell RNA-Seq measurements are commonly affected by high levels of technical noise, posing challenges for data analysis and visualization. A diverse array of methods has been proposed to computationally remove noise by sharing information across similar cells or genes, however their respective accuracies have been difficult to establish. Here, we propose a simple denoising strategy based on principal component analysis (PCA). We show that while PCA performed on raw data is biased towards highly expressed genes, this bias can be mitigated with a cell aggregation step, allowing the recovery of denoised expression values for both highly and lowly expressed genes. We benchmark our resulting ENHANCE algorithm and three previously described methods on simulated data that closely mimic real datasets, showing that ENHANCE provides the best overall denoising accuracy, recovering modules of co-expressed genes and cell subpopulations. Implementations of our algorithm are available at https://github.com/yanailab/enhance.

Download Full-text

Robust principal component analysis for accurate outlier sample detection in RNA-Seq data

BMC Bioinformatics ◽

10.1186/s12859-020-03608-0 ◽

2020 ◽

Vol 21 (1) ◽

Cited By ~ 1

Author(s):

Xiaoying Chen ◽

Bo Zhang ◽

Ting Wang ◽

Azad Bonni ◽

Guoyan Zhao

Keyword(s):

Principal Component Analysis ◽

Principal Component ◽

Component Analysis ◽

Rna Seq ◽

Robust Principal Component Analysis

Download Full-text

Structure-Aware Principal Component Analysis for Single-Cell RNA-seq Data

Journal of Computational Biology ◽

10.1089/cmb.2018.0027 ◽

2018 ◽

Vol 25 (12) ◽

pp. 1365-1373 ◽

Cited By ~ 6

Author(s):

Snehalika Lall ◽

Debajyoti Sinha ◽

Sanghamitra Bandyopadhyay ◽

Debarka Sengupta

Keyword(s):

Principal Component Analysis ◽

Single Cell ◽

Principal Component ◽

Component Analysis ◽

Rna Seq

Download Full-text

PCAGO: An interactive web service to analyze RNA-Seq data with principal component analysis

10.1101/433078 ◽

2018 ◽

Cited By ~ 1

Author(s):

Ruman Gerst ◽

Martin Hölzer

Keyword(s):

Principal Component Analysis ◽

Web Service ◽

Principal Components ◽

Clustering Algorithm ◽

Gene Annotation ◽

Principal Component ◽

Component Analysis ◽

Rna Seq ◽

Gene Sets ◽

Relationship Of

ABSTRACTThe initial characterization and clustering of biological samples is a critical step in the analysis of any transcriptomic study. In many studies, principal component analysis (PCA) is the clustering algorithm of choice to predict the relationship of samples or cells based solely on differential gene expression. In addition to the pure quality evaluation of the data, a PCA can also provide initial insights into the biological background of an experiment and help researchers to interpret the data and design the subsequent computational steps accordingly. However, to avoid misleading clusterings and interpretations, an appropriate selection of the underlying gene sets to build the PCA and the choice of the most fitting principal components for the visualization are crucial parts. Here, we present PCAGO, an easy-to-use and interactive web service to analyze gene quantification data derived from RNA sequencing (RNA-Seq) experiments with PCA. The tool includes features such as read-count normalization, filtering of read counts by gene annotation, and various visualization options. Additionally, PCAGO helps to select appropriate parameters such as the number of genes and principal components to create meaningful visualizations.Availability and implementationThe web service is implemented in R and freely available at [email protected]

Download Full-text

Coronavirus Disease Predictor: An RNA-Seq based pipeline for dimension reduction and prediction of COVID-19

Journal of Physics Conference Series ◽

10.1088/1742-6596/2089/1/012025 ◽

2021 ◽

Vol 2089 (1) ◽

pp. 012025

Author(s):

Naiyar Iqbal ◽

Pradeep Kumar

Keyword(s):

Machine Learning ◽

Principal Component Analysis ◽

Dimension Reduction ◽

Soft Computing ◽

Principal Component ◽

Component Analysis ◽

Formal Concept ◽

Genome Wide Association Studies ◽

Rna Seq ◽

Soft Computing Techniques

Abstract SARS CoV-2, the novel coronavirus behind the COVID-19 infection, has caused destruction around the world with human life, detecting a range of complexity which has knocked medical care specialists to investigate new innovative solutions and diagnosis strategies. The soft computing-based approach has assumed a significant role in resolving complex issues, and numerous societies have been shifted to implement and convert these innovations in response to the encounters created by the COVID-19 pandemic. To perform genome-wide association studies using RNA-Seq of COVID-19 and identify gene biomarkers, classification, and prediction using soft computing techniques of Coronavirus disease studies to fight this emergency pandemic in the epidemiological domain, and disease prognosis. The RNA-Seq profiles of both healthy and COVID-19 positive patients’ samples were considered. We have proposed an integrated pipeline from bioinformatics in-silico phase for-omic profile data processing to dimension reduction using various prominent techniques such as formal concept analysis and principal component analysis followed by machine learning phase for prediction of the disease. In this experimental research, we have applied different eminent machine learning techniques to implement an effective integrated model using Classifier Subset Evaluator (CSE) followed by principal component analysis (PCA) for dimension reduction to select the highly significant features and then to do the classification and prediction of Coronavirus disease, different eminent classifiers have been applied on the selected features. In this analysis, the Hoeffding Tree model found the topmost performance classifier with a classification accuracy of 99.21% as well as sensitivity and specificity of 99% and 100% respectively.

Download Full-text

Benchmarking principal component analysis for large-scale single-cell RNA-sequencing

10.1101/642595 ◽

2019 ◽

Cited By ~ 1

Author(s):

Koki Tsuyuzaki ◽

Hiroyuki Sato ◽

Kenta Sato ◽

Itoshi Nikaido

Keyword(s):

Principal Component Analysis ◽

Single Cell ◽

Large Scale ◽

Principal Component ◽

Component Analysis ◽

Rna Seq ◽

Large Memory ◽

Synthetic Datasets ◽

Selection Of ◽

Memory Efficient

AbstractPrincipal component analysis (PCA) is an essential method for analyzing single-cell RNA-seq (scRNA-seq) datasets, but large-scale scRNA-seq datasets require long computational times and a large memory capacity.In this work, we review 21 fast and memory-efficient PCA implementations (10 algorithms) and evaluate their application using 4 real and 18 synthetic datasets. Our benchmarking showed that some PCA algorithms are faster, more memory efficient, and more accurate than others. In consideration of the differences in the computational environments of users and developers, we have also developed guidelines to assist with selection of appropriate PCA implementations.

Download Full-text

Visualizing Single-Cell RNA-seq Data with Semisupervised Principal Component Analysis

International Journal of Molecular Sciences ◽

10.3390/ijms21165797 ◽

2020 ◽

Vol 21 (16) ◽

pp. 5797

Author(s):

Zhenqiu Liu

Keyword(s):

Principal Component Analysis ◽

Dimension Reduction ◽

Single Cell ◽

Optimal Solution ◽

Principal Component ◽

Component Analysis ◽

Biological Information ◽

Rna Seq ◽

Computationally Efficient ◽

Leibler Divergence

Single-cell RNA-seq (scRNA-seq) is a powerful tool for analyzing heterogeneous and functionally diverse cell population. Visualizing scRNA-seq data can help us effectively extract meaningful biological information and identify novel cell subtypes. Currently, the most popular methods for scRNA-seq visualization are principal component analysis (PCA) and t-distributed stochastic neighbor embedding (t-SNE). While PCA is an unsupervised dimension reduction technique, t-SNE incorporates cluster information into pairwise probability, and then maximizes the Kullback–Leibler divergence. Uniform Manifold Approximation and Projection (UMAP) is another recently developed visualization method similar to t-SNE. However, one limitation with UMAP and t-SNE is that they can only capture the local structure of the data, the global structure of the data is not faithfully preserved. In this manuscript, we propose a semisupervised principal component analysis (ssPCA) approach for scRNA-seq visualization. The proposed approach incorporates cluster-labels into dimension reduction and discovers principal components that maximize both data variance and cluster dependence. ssPCA must have cluster-labels as its input. Therefore, it is most useful for visualizing clusters from a scRNA-seq clustering software. Our experiments with simulation and real scRNA-seq data demonstrate that ssPCA is able to preserve both local and global structures of the data, and uncover the transition and progressions in the data, if they exist. In addition, ssPCA is convex and has a global optimal solution. It is also robust and computationally efficient, making it viable for scRNA-seq cluster visualization.

Download Full-text

A German version of the Intermittent Claudication Questionnaire (ICQ): cultural adaptation and validation

VASA ◽

10.1024/0301-1526/a000218 ◽

2012 ◽

Vol 41 (5) ◽

pp. 333-342 ◽

Cited By ~ 3

Author(s):

Kirchberger ◽

Finger ◽

Müller-Bühl

Keyword(s):

Principal Component Analysis ◽

Intermittent Claudication ◽

Completion Time ◽

Short Form ◽

Principal Component ◽

Component Analysis ◽

German Version ◽

Average Completion Time ◽

Sf 36 ◽

Related Quality

Background: The Intermittent Claudication Questionnaire (ICQ) is a short questionnaire for the assessment of health-related quality of life (HRQOL) in patients with intermittent claudication (IC). The objective of this study was to translate the ICQ into German and to investigate the psychometric properties of the German ICQ version in patients with IC. Patients and methods: The original English version was translated using a forward-backward method. The resulting German version was reviewed by the author of the original version and an experienced clinician. Finally, it was tested for clarity with 5 German patients with IC. A sample of 81 patients were administered the German ICQ. The sample consisted of 58.0 % male patients with a median age of 71 years and a median IC duration of 36 months. Test of feasibility included completeness of questionnaires, completion time, and ratings of clarity, length and relevance. Reliability was assessed through a retest in 13 patients at 14 days, and analysis of Cronbachs alpha for internal consistency. Construct validity was investigated using principal component analysis. Concurrent validity was assessed by correlating the ICQ scores with the Short Form 36 Health Survey (SF-36) as well as clinical measures. Results: The ICQ was completely filled in by 73 subjects (90.1 %) with an average completion time of 6.3 minutes. Cronbachs alpha coefficient reached 0.75. Intra-class correlation for test-retest reliability was r = 0.88. Principal component analysis resulted in a 3 factor solution. The first factor explained 51.5 of the total variation and all items had loadings of at least 0.65 on it. The ICQ was significantly associated with the SF-36 and treadmill-walking distances whereas no association was found for resting ABPI. Conclusions: The German version of the ICQ demonstrated good feasibility, satisfactory reliability and good validity. Responsiveness should be investigated in further validation studies.

Download Full-text

Review of Three-Mode Principal Component Analysis: Theory and Applications, Vol. 2.

Contemporary Psychology ◽

10.1037/022425 ◽

1984 ◽

Vol 29 (11) ◽

pp. 915-916

Author(s):

William R Koch

Keyword(s):

Principal Component Analysis ◽

Principal Component ◽

Component Analysis ◽

Analysis Theory

Download Full-text