Gene set analysis approaches for RNA-seq data: performance evaluation and application guideline

Gene set analysis is a quantitative approach for generating biological insight from gene expression datasets. The abundance of gene set analysis methods speaks to their popularity, but raises the question of the extent to which results are affected by the choice of method. Our systematic analysis of 13 popular methods using 6 different datasets, from both DNA microarray and RNA-Seq origin, shows that this choice matters a great deal. We observed that the overall number of gene sets reported by each method differed by up to 2 orders of magnitude, and there was a bias toward reporting large gene sets with some methods. Furthermore, there was substantial disagreement between the 20 most statistically significant gene sets reported by the methods. This was also observed when expanding to the 100 most statistically significant reported gene sets. For different datasets of the same phenotype/condition, the top 20 and top 100 most significant results also showed little to no agreement even when using the same method. GAGE, PAGE, and ORA were the only methods able to achieve relatively high reproducibility when comparing the 20 and 100 most statistically significant gene sets. Biological validation on a juvenile idiopathic arthritis (JIA) dataset showed wide variation in terms of the relevance of the top 20 and top 100 most significant gene sets to known biology of the disease, where GAGE predicted the most relevant gene sets, followed by GSEA, ORA, and PAGE.

Download Full-text

Comparative evaluation of gene set analysis approaches for RNA-Seq data

BMC Bioinformatics ◽

10.1186/s12859-014-0397-8 ◽

2014 ◽

Vol 15 (1) ◽

Cited By ~ 16

Author(s):

Yasir Rahmatallah ◽

Frank Emmert-Streib ◽

Galina Glazko

Keyword(s):

Comparative Evaluation ◽

Gene Set Analysis ◽

Rna Seq ◽

Gene Set

Download Full-text

Gene set analysis controlling for length bias in RNA-seq experiments

BioData Mining ◽

10.1186/s13040-017-0125-9 ◽

2017 ◽

Vol 10 (1) ◽

Cited By ~ 4

Author(s):

Xing Ren ◽

Qiang Hu ◽

Song Liu ◽

Jianmin Wang ◽

Jeffrey C. Miecznikowski

Keyword(s):

Gene Set Analysis ◽

Rna Seq ◽

Length Bias ◽

Gene Set

Download Full-text

Soft truncation thresholding for gene set analysis of RNA-seq data: Application to a vaccine study

Scientific Reports ◽

10.1038/srep02898 ◽

2013 ◽

Vol 3 (1) ◽

Cited By ~ 15

Author(s):

Brooke L. Fridley ◽

Gregory D. Jenkins ◽

Diane E. Grill ◽

Richard B. Kennedy ◽

Gregory A. Poland ◽

...

Keyword(s):

Gene Set Analysis ◽

Rna Seq ◽

Gene Set ◽

Vaccine Study ◽

Data Application

Download Full-text

Gene set analysis methods for the functional interpretation of non-mRNA data—Genomic range and ncRNA data

Briefings in Bioinformatics ◽

10.1093/bib/bbz090 ◽

2019 ◽

Vol 21 (5) ◽

pp. 1495-1508 ◽

Cited By ~ 3

Author(s):

Antonio Mora

Keyword(s):

The State ◽

Gene Set Analysis ◽

Range Data ◽

Rna Seq ◽

Functional Interpretation ◽

Gene Set ◽

Analysis Methods ◽

Mrna Microarray ◽

Network Approaches

Abstract Gene set analysis (GSA) is one of the methods of choice for analyzing the results of current omics studies; however, it has been mainly developed to analyze mRNA (microarray, RNA-Seq) data. The following review includes an update regarding general methods and resources for GSA and then emphasizes GSA methods and tools for non-mRNA omics datasets, specifically genomic range data (ChIP-Seq, SNP and methylation) and ncRNA data (miRNAs, lncRNAs and others). In the end, the state of the GSA field for non-mRNA datasets is discussed, and some current challenges and trends are highlighted, especially the use of network approaches to face complexity issues.

Download Full-text

Gene Set Analysis Using Spatial Statistics

Mathematics ◽

10.3390/math9050521 ◽

2021 ◽

Vol 9 (5) ◽

pp. 521

Author(s):

Angela L. Riffo-Campos ◽

Guillermo Ayala ◽

Francisco Montes

Keyword(s):

Gene Expression ◽

Differential Expression ◽

Differential Expression Analysis ◽

Gene Set Analysis ◽

Point Pattern ◽

Rna Seq ◽

P Values ◽

Gene Set ◽

Gene Differential Expression ◽

Per Gene

Gene differential expression consists of the study of the possible association between the gene expression, evaluated using different types of data as DNA microarray or RNA-Seq technologies, and the phenotype. This can be performed marginally for each gene (differential gene expression) or using a gene set collection (gene set analysis). A previous (marginal) per-gene analysis of differential expression is usually performed in order to obtain a set of significant genes or marginal p-values used later in the study of association between phenotype and gene expression. This paper proposes the use of methods of spatial statistics for testing gene set differential expression analysis using paired samples of RNA-Seq counts. This approach is not based on a previous per-gene differential expression analysis. Instead, we compare the paired counts within each sample/control using a binomial test. Each pair per gene will produce a p-value so gene expression profile is transformed into a vector of p-values which will be considered as an event belonging to a point pattern. This would be the first component of a bivariate point pattern. The second component is generated by applying two different randomization distributions to the correspondence between samples and treatment. The self-contained null hypothesis considered in gene set analysis can be formulated in terms of the associated point pattern as a random labeling of the considered bivariate point pattern. The gene sets were defined by the Gene Ontology (GO) terms and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways. The proposed methodology was tested in four RNA-Seq datasets of colorectal cancer (CRC) patients and the results were contrasted with those obtained using the edgeR-GOseq pipeline. The proposed methodology has proved to be consistent at the biological and statistical level, in particular using Cuzick and Edwards test with one realization of the second component and between-pair distribution.

Download Full-text