Gene set analysis methods for the functional interpretation of non-mRNA data—Genomic range and ncRNA data

2019 ◽  
Vol 21 (5) ◽  
pp. 1495-1508 ◽  
Author(s):  
Antonio Mora

Abstract Gene set analysis (GSA) is one of the methods of choice for analyzing the results of current omics studies; however, it has been mainly developed to analyze mRNA (microarray, RNA-Seq) data. The following review includes an update regarding general methods and resources for GSA and then emphasizes GSA methods and tools for non-mRNA omics datasets, specifically genomic range data (ChIP-Seq, SNP and methylation) and ncRNA data (miRNAs, lncRNAs and others). In the end, the state of the GSA field for non-mRNA datasets is discussed, and some current challenges and trends are highlighted, especially the use of network approaches to face complexity issues.

2019 ◽  
Vol 17 (05) ◽  
pp. 1940010 ◽  
Author(s):  
Farhad Maleki ◽  
Katie L. Ovens ◽  
Daniel J. Hogan ◽  
Elham Rezaei ◽  
Alan M. Rosenberg ◽  
...  

Gene set analysis is a quantitative approach for generating biological insight from gene expression datasets. The abundance of gene set analysis methods speaks to their popularity, but raises the question of the extent to which results are affected by the choice of method. Our systematic analysis of 13 popular methods using 6 different datasets, from both DNA microarray and RNA-Seq origin, shows that this choice matters a great deal. We observed that the overall number of gene sets reported by each method differed by up to 2 orders of magnitude, and there was a bias toward reporting large gene sets with some methods. Furthermore, there was substantial disagreement between the 20 most statistically significant gene sets reported by the methods. This was also observed when expanding to the 100 most statistically significant reported gene sets. For different datasets of the same phenotype/condition, the top 20 and top 100 most significant results also showed little to no agreement even when using the same method. GAGE, PAGE, and ORA were the only methods able to achieve relatively high reproducibility when comparing the 20 and 100 most statistically significant gene sets. Biological validation on a juvenile idiopathic arthritis (JIA) dataset showed wide variation in terms of the relevance of the top 20 and top 100 most significant gene sets to known biology of the disease, where GAGE predicted the most relevant gene sets, followed by GSEA, ORA, and PAGE.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Chengshu Xie ◽  
Shaurya Jauhari ◽  
Antonio Mora

Abstract Background Gene Set Analysis (GSA) is arguably the method of choice for the functional interpretation of omics results. The following paper explores the popularity and the performance of all the GSA methodologies and software published during the 20 years since its inception. "Popularity" is estimated according to each paper's citation counts, while "performance" is based on a comprehensive evaluation of the validation strategies used by papers in the field, as well as the consolidated results from the existing benchmark studies. Results Regarding popularity, data is collected into an online open database ("GSARefDB") which allows browsing bibliographic and method-descriptive information from 503 GSA paper references; regarding performance, we introduce a repository of jupyter workflows and shiny apps for automated benchmarking of GSA methods (“GSA-BenchmarKING”). After comparing popularity versus performance, results show discrepancies between the most popular and the best performing GSA methods. Conclusions The above-mentioned results call our attention towards the nature of the tool selection procedures followed by researchers and raise doubts regarding the quality of the functional interpretation of biological datasets in current biomedical studies. Suggestions for the future of the functional interpretation field are made, including strategies for education and discussion of GSA tools, better validation and benchmarking practices, reproducibility, and functional re-analysis of previously reported data.


2015 ◽  
Vol 17 (3) ◽  
pp. 393-407 ◽  
Author(s):  
Yasir Rahmatallah ◽  
Frank Emmert-Streib ◽  
Galina Glazko

2007 ◽  
Vol 8 (1) ◽  
pp. 431 ◽  
Author(s):  
Qi Liu ◽  
Irina Dinu ◽  
Adeniyi J Adewale ◽  
John D Potter ◽  
Yutaka Yasui

2018 ◽  
Author(s):  
Farhad Maleki ◽  
Anthony J. Kusalik

AbstractGene set analysis methods are widely used to analyze data from high-throughput “omics” technologies. One drawback of these methods is their low specificity or high false positive rate. Over-representation analysis is one of the most commonly used gene set analysis methods. In this paper, we propose a systematic approach to investigate the hypothesis that gene set overlap is an underlying cause of low specificity in over-representation analysis. We quantify gene set overlap and show that it is a ubiquitous phenomenon across gene set databases. Statistical analysis indicates a strong negative correlation between gene set overlap and the specificity of over-representation analysis. We conclude that gene set overlap is an underlying cause of the low specificity. This result highlights the importance of considering gene set overlap in gene set analysis and explains the lack of specificity of methods that ignore gene set overlap. This research also establishes the direction for developing new gene set analysis methods.


2015 ◽  
Vol 26 (3) ◽  
pp. 1248-1260 ◽  
Author(s):  
Dougu Nam

Gene-set enrichment analysis and its modified versions have commonly been used for identifying altered functions or pathways in disease from microarray data. In particular, the simple gene-sampling gene-set analysis methods have been heavily used for datasets with only a few sample replicates. The biggest problem with this approach is the highly inflated false-positive rate. In this paper, the effect of absolute gene statistic on gene-sampling gene-set analysis methods is systematically investigated. Thus far, the absolute gene statistic has merely been regarded as a supplementary method for capturing the bidirectional changes in each gene set. Here, it is shown that incorporating the absolute gene statistic in gene-sampling gene-set analysis substantially reduces the false-positive rate and improves the overall discriminatory ability. Its effect was investigated by power, false-positive rate, and receiver operating curve for a number of simulated and real datasets. The performances of gene-set analysis methods in one-tailed (genome-wide association study) and two-tailed (gene expression data) tests were also compared and discussed.


2014 ◽  
Vol 15 (1) ◽  
Author(s):  
Yasir Rahmatallah ◽  
Frank Emmert-Streib ◽  
Galina Glazko

Sign in / Sign up

Export Citation Format

Share Document