GOing Bayesian: model-based gene set analysis of genome-scale data

The emergence of large gene expression datasets has revealed the need for improved tools to identify enriched gene categories and visualize enrichment patterns. While Gene Ontogeny (GO) provides a valuable tool for gene set enrichment analysis, it has several limitations. First, it is difficult to graphically compare multiple GO analyses. Second, genes from some model systems are not well represented. For example, around 30% of Caenorhabditis elegans genes are missing from analysis in commonly used databases. To allow categorization and visualization of enriched C. elegans gene sets in different types of genome-scale data, we developed WormCat, a web-based tool that uses a near-complete annotation of the C. elegans genome to identify co-expressed gene sets and scaled heat map for enrichment visualization. We tested the performance of WormCat using a variety of published transcriptomic datasets and show that it reproduces major categories identified by GO. Importantly, we also found previously unidentified categories that are informative for interpreting phenotypes or predicting biological function. For example, we analyzed published RNA-seq data from C. elegans treated with combinations of lifespan-extending drugs where one combination paradoxically shortened lifespan. Using WormCat, we identified sterol metabolism as a category that was not enriched in the single or double combinations but emerged in a triple combination along with the lifespan shortening. Thus, WormCat identified a gene set with potential phenotypic relevance that was not uncovered with previous GO analysis. In conclusion, WormCat provides a powerful tool for the analysis and visualization of gene set enrichment in different types of C. elegans datasets.

Download Full-text

WormCat: An Online Tool for Annotation and Visualization of Caenorhabditis elegans Genome-Scale Data

Genetics ◽

10.1534/genetics.119.302919 ◽

2019 ◽

Vol 214 (2) ◽

pp. 279-294 ◽

Cited By ~ 9

Author(s):

Amy D. Holdorf ◽

Daniel P. Higgins ◽

Anne C. Hart ◽

Peter R. Boag ◽

Gregory J. Pazour ◽

...

Keyword(s):

Caenorhabditis Elegans ◽

Gene Set Enrichment Analysis ◽

Model Systems ◽

Gene Set Enrichment ◽

Gene Set ◽

C Elegans ◽

Gene Sets ◽

Different Types ◽

Genome Scale ◽

Scale Data

The emergence of large gene expression datasets has revealed the need for improved tools to identify enriched gene categories and visualize enrichment patterns. While gene ontogeny (GO) provides a valuable tool for gene set enrichment analysis, it has several limitations. First, it is difficult to graph multiple GO analyses for comparison. Second, genes from some model systems are not well represented. For example, ∼30% of Caenorhabditis elegans genes are missing from the analysis in commonly used databases. To allow categorization and visualization of enriched C. elegans gene sets in different types of genome-scale data, we developed WormCat, a web-based tool that uses a near-complete annotation of the C. elegans genome to identify coexpressed gene sets and scaled heat map for enrichment visualization. We tested the performance of WormCat using a variety of published transcriptomic datasets, and show that it reproduces major categories identified by GO. Importantly, we also found previously unidentified categories that are informative for interpreting phenotypes or predicting biological function. For example, we analyzed published RNA-seq data from C. elegans treated with combinations of lifespan-extending drugs, where one combination paradoxically shortened lifespan. Using WormCat, we identified sterol metabolism as a category that was not enriched in the single or double combinations, but emerged in a triple combination along with the lifespan shortening. Thus, WormCat identified a gene set with potential. phenotypic relevance not found with previous GO analysis. In conclusion, WormCat provides a powerful tool for the analysis and visualization of gene set enrichment in different types of C. elegans datasets.

Download Full-text

A multi-functional analyzer uses parameter constraints to improve the efficiency of model-based gene-set analysis

The Annals of Applied Statistics ◽

10.1214/14-aoas777 ◽

2015 ◽

Vol 9 (1) ◽

pp. 225-246 ◽

Cited By ~ 5

Author(s):

Zhishi Wang ◽

Qiuling He ◽

Bret Larget ◽

Michael A. Newton

Keyword(s):

Gene Set Analysis ◽

Gene Set ◽

Model Based ◽

Parameter Constraints

Download Full-text

WTFgenes: What's The Function of these genes? Static sites for model-based gene set analysis

F1000Research ◽

10.12688/f1000research.11175.1 ◽

2017 ◽

Vol 6 ◽

pp. 423

Author(s):

Christopher J. Mungall ◽

Ian H. Holmes

Keyword(s):

Gene Ontology ◽

Hypergeometric Distribution ◽

Gene Set Analysis ◽

Web Browser ◽

Gene Set ◽

Model Based ◽

Common Technique

A common technique for interpreting experimentally-identified lists of genes is to look for enrichment of genes associated with particular ontology terms. The most common test uses the hypergeometric distribution; more recently, a model-based test was proposed. These approaches must typically be run using downloaded software, or on a server. We develop a collapsed likelihood for model-based gene set analysis and present WTFgenes, an implementation of both hypergeometric and model-based approaches, that can be published as a static site with computation run in JavaScript on the user's web browser client. Apart from hosting files, zero server resources are required: the site can (for example) be served directly from Amazon S3 or GitHub Pages. A C++11 implementation yielding identical results runs roughly twice as fast as the JavaScript version. WTFgenes is available from https://github.com/evoldoers/wtfgenes under the BSD3 license. A demonstration for the Gene Ontology is usable at https://evoldoers.github.io/wtfgo.

Download Full-text

WTFgenes: What's The Function of these genes? Static sites for model-based gene set analysis

10.1101/114785 ◽

2017 ◽

Author(s):

Christopher J. Mungall ◽

Ian H. Holmes

Keyword(s):

Gene Ontology ◽

Hypergeometric Distribution ◽

Gene Set Analysis ◽

Web Browser ◽

Gene Set ◽

Model Based ◽

Link Type ◽

Common Technique

AbstractA common technique for interpreting experimentally-identified lists of genes is to look for enrichment of genes associated to particular ontology terms. The most common technique uses the hypergeometric distribution; more recently, a model-based approach was proposed. These approaches must typically be run using downloaded software, or on a server. We develop a collapsed likelihood for model-based gene set analysis and present WTFgenes, an implementation of both hypergeometric and model-based approaches, that can be published as a static site with computation run in JavaScript on the user's web browser client. Apart from hosting files, zero server resources are required: the site can (for example) be served directly from Amazon S3 or GitHub Pages. A C++11 implementation yielding identical results runs roughly twice as fast as the JavaScript version. WTFgenes is available from https://github.com/evoldoers/wtfgenes under the BSD3 license. A demonstration for the Gene Ontology is usable at https://evoldoers.github.io/wtfgo. Contact: Ian Holmes [email protected].

Download Full-text