scholarly journals Microarray-based gene set analysis: a comparison of current methods

2008 ◽  
Vol 9 (1) ◽  
Author(s):  
Sarah Song ◽  
Michael A Black
Keyword(s):  
2015 ◽  
Vol 31 (18) ◽  
pp. 3069-3071 ◽  
Author(s):  
Minjae Yoo ◽  
Jimin Shin ◽  
Jihye Kim ◽  
Karen A. Ryall ◽  
Kyubum Lee ◽  
...  
Keyword(s):  

2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Chengshu Xie ◽  
Shaurya Jauhari ◽  
Antonio Mora

Abstract Background Gene Set Analysis (GSA) is arguably the method of choice for the functional interpretation of omics results. The following paper explores the popularity and the performance of all the GSA methodologies and software published during the 20 years since its inception. "Popularity" is estimated according to each paper's citation counts, while "performance" is based on a comprehensive evaluation of the validation strategies used by papers in the field, as well as the consolidated results from the existing benchmark studies. Results Regarding popularity, data is collected into an online open database ("GSARefDB") which allows browsing bibliographic and method-descriptive information from 503 GSA paper references; regarding performance, we introduce a repository of jupyter workflows and shiny apps for automated benchmarking of GSA methods (“GSA-BenchmarKING”). After comparing popularity versus performance, results show discrepancies between the most popular and the best performing GSA methods. Conclusions The above-mentioned results call our attention towards the nature of the tool selection procedures followed by researchers and raise doubts regarding the quality of the functional interpretation of biological datasets in current biomedical studies. Suggestions for the future of the functional interpretation field are made, including strategies for education and discussion of GSA tools, better validation and benchmarking practices, reproducibility, and functional re-analysis of previously reported data.


F1000Research ◽  
2019 ◽  
Vol 8 ◽  
pp. 129 ◽  
Author(s):  
Michael Prummer

Differential gene expression (DGE) studies often suffer from poor interpretability of their primary results, i.e., thousands of differentially expressed genes. This has led to the introduction of gene set analysis (GSA) methods that aim at identifying interpretable global effects by grouping genes into sets of common context, such as, molecular pathways, biological function or tissue localization. In practice, GSA often results in hundreds of differentially regulated gene sets. Similar to the genes they contain, gene sets are often regulated in a correlative fashion because they share many of their genes or they describe related processes. Using these kind of neighborhood information to construct networks of gene sets allows to identify highly connected sub-networks as well as poorly connected islands or singletons. We show here how topological information and other network features can be used to filter and prioritize gene sets in routine DGE studies. Community detection in combination with automatic labeling and the network representation of gene set clusters further constitute an appealing and intuitive visualization of GSA results. The RICHNET workflow described here does not require human intervention and can thus be conveniently incorporated in automated analysis pipelines.


2019 ◽  
Vol 17 (05) ◽  
pp. 1940010 ◽  
Author(s):  
Farhad Maleki ◽  
Katie L. Ovens ◽  
Daniel J. Hogan ◽  
Elham Rezaei ◽  
Alan M. Rosenberg ◽  
...  

Gene set analysis is a quantitative approach for generating biological insight from gene expression datasets. The abundance of gene set analysis methods speaks to their popularity, but raises the question of the extent to which results are affected by the choice of method. Our systematic analysis of 13 popular methods using 6 different datasets, from both DNA microarray and RNA-Seq origin, shows that this choice matters a great deal. We observed that the overall number of gene sets reported by each method differed by up to 2 orders of magnitude, and there was a bias toward reporting large gene sets with some methods. Furthermore, there was substantial disagreement between the 20 most statistically significant gene sets reported by the methods. This was also observed when expanding to the 100 most statistically significant reported gene sets. For different datasets of the same phenotype/condition, the top 20 and top 100 most significant results also showed little to no agreement even when using the same method. GAGE, PAGE, and ORA were the only methods able to achieve relatively high reproducibility when comparing the 20 and 100 most statistically significant gene sets. Biological validation on a juvenile idiopathic arthritis (JIA) dataset showed wide variation in terms of the relevance of the top 20 and top 100 most significant gene sets to known biology of the disease, where GAGE predicted the most relevant gene sets, followed by GSEA, ORA, and PAGE.


2020 ◽  
Vol 2 (7) ◽  
pp. 387-395 ◽  
Author(s):  
Sheng Wang ◽  
Emily R. Flynn ◽  
Russ B. Altman

2019 ◽  
Vol 44 (9) ◽  
pp. 1562-1569 ◽  
Author(s):  
Alexandros Rammos ◽  
◽  
Lara A. Neira Gonzalez ◽  
Daniel R. Weinberger ◽  
Kevin J. Mitchell ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document