GSA-Genie: a web application for gene set analysis

Mapping Intimacies ◽

10.1101/125443 ◽

2017 ◽

Author(s):

Zhe Zhang ◽

Deanne Taylor

Keyword(s):

Statistical Methods ◽

Web Application ◽

Gene List ◽

Gene Set Analysis ◽

Gene Set ◽

Gene Sets ◽

Level Statistics ◽

Gene Level ◽

Amazon Web Services ◽

Selection Of

AbstractGene set analysis is often used to interpret results from upstream analysis through predefined gene sets that are linked to biological features such as cell cycle or tumorgenesis. Gene sets have been defined in the literature via various criteria and are archived by numerous databases. We compiled over 2.3 million gene sets from 17 sources, and made them accessible through a web application, GSA-Genie. Selected gene sets can be analyzed online using one of 16 statistical methods. These methods can be grouped into two strategies: test of gene set over-representation within a gene list, or comparison of a gene-level statistics between gene set and background. GSA-Genie operates on a Shiny web server, hosted in a cloud instance within Amazon Web Services. GSA-Genie offers a broad selection of gene sets and statistical methods comparing to existing tools. GSA-Genie is freely available at http://gsagenie.awsomics.org.

Download Full-text

To select relevant features for longitudinal gene expression data by extending a pathway analysis method

F1000Research ◽

10.12688/f1000research.15357.1 ◽

2018 ◽

Vol 7 ◽

pp. 1166 ◽

Cited By ~ 2

Author(s):

Suyan Tian ◽

Chi Wang ◽

Howard H. Chang

Keyword(s):

Feature Selection ◽

Expression Profiles ◽

Simulated Data ◽

Biological Information ◽

Gene Set Analysis ◽

Analysis Method ◽

Gene Set ◽

Gene Sets ◽

Selection Algorithms ◽

Selection Of

The emerging field of pathway-based feature selection that incorporates biological information conveyed by gene sets/pathways to guide the selection of relevant genes has become increasingly popular and widespread. In this study, we adapt a gene set analysis method – the significance analysis of microarray gene set reduction (SAMGSR) algorithm to carry out feature selection for longitudinal microarray data, and propose a pathway-based feature selection algorithm – the two-level SAMGSR method. By using simulated data and a real-world application, we demonstrate that a gene’s expression profiles over time can be considered as a gene set. Thus a suitable gene set analysis method can be utilized or modified to execute the selection of relevant genes for longitudinal omics data. We believe this work paves the way for more research to bridge feature selection and gene set analysis with the development of novel pathway-based feature selection algorithms.

Download Full-text

Enhancing gene set enrichment using networks

F1000Research ◽

10.12688/f1000research.17824.2 ◽

2019 ◽

Vol 8 ◽

pp. 129 ◽

Cited By ~ 1

Author(s):

Michael Prummer

Keyword(s):

Biological Function ◽

Automated Analysis ◽

Gene Set Analysis ◽

Molecular Pathways ◽

Human Intervention ◽

Gene Set Enrichment ◽

Topological Information ◽

Gene Set ◽

Gene Sets ◽

Differential Gene

Differential gene expression (DGE) studies often suffer from poor interpretability of their primary results, i.e., thousands of differentially expressed genes. This has led to the introduction of gene set analysis (GSA) methods that aim at identifying interpretable global effects by grouping genes into sets of common context, such as, molecular pathways, biological function or tissue localization. In practice, GSA often results in hundreds of differentially regulated gene sets. Similar to the genes they contain, gene sets are often regulated in a correlative fashion because they share many of their genes or they describe related processes. Using these kind of neighborhood information to construct networks of gene sets allows to identify highly connected sub-networks as well as poorly connected islands or singletons. We show here how topological information and other network features can be used to filter and prioritize gene sets in routine DGE studies. Community detection in combination with automatic labeling and the network representation of gene set clusters further constitute an appealing and intuitive visualization of GSA results. The RICHNET workflow described here does not require human intervention and can thus be conveniently incorporated in automated analysis pipelines.

Download Full-text

Measuring consistency among gene set analysis methods: A systematic study

Journal of Bioinformatics and Computational Biology ◽

10.1142/s0219720019400109 ◽

2019 ◽

Vol 17 (05) ◽

pp. 1940010 ◽

Cited By ~ 1

Author(s):

Farhad Maleki ◽

Katie L. Ovens ◽

Daniel J. Hogan ◽

Elham Rezaei ◽

Alan M. Rosenberg ◽

...

Keyword(s):

Gene Set Analysis ◽

Rna Seq ◽

Systematic Analysis ◽

Gene Set ◽

Large Gene ◽

Analysis Methods ◽

Gene Sets ◽

Significant Gene ◽

Biological Insight ◽

Relevant Gene

Gene set analysis is a quantitative approach for generating biological insight from gene expression datasets. The abundance of gene set analysis methods speaks to their popularity, but raises the question of the extent to which results are affected by the choice of method. Our systematic analysis of 13 popular methods using 6 different datasets, from both DNA microarray and RNA-Seq origin, shows that this choice matters a great deal. We observed that the overall number of gene sets reported by each method differed by up to 2 orders of magnitude, and there was a bias toward reporting large gene sets with some methods. Furthermore, there was substantial disagreement between the 20 most statistically significant gene sets reported by the methods. This was also observed when expanding to the 100 most statistically significant reported gene sets. For different datasets of the same phenotype/condition, the top 20 and top 100 most significant results also showed little to no agreement even when using the same method. GAGE, PAGE, and ORA were the only methods able to achieve relatively high reproducibility when comparing the 20 and 100 most statistically significant gene sets. Biological validation on a juvenile idiopathic arthritis (JIA) dataset showed wide variation in terms of the relevance of the top 20 and top 100 most significant gene sets to known biology of the disease, where GAGE predicted the most relevant gene sets, followed by GSEA, ORA, and PAGE.

Download Full-text

GenSensor Suite: A Web-Based Tool for the Analysis of Gene and Protein Interactions, Pathways, and Regulation

Advances in Bioinformatics ◽

10.1155/2011/271563 ◽

2011 ◽

Vol 2011 ◽

pp. 1-7 ◽

Cited By ~ 4

Author(s):

Mark Gosink ◽

Sawsan Khuri ◽

Camilo Valdes ◽

Zhijie Jiang ◽

Nicholas F. Tsinoremas

Keyword(s):

Protein Interactions ◽

Single Gene ◽

Gene List ◽

Input List ◽

Web Based ◽

Common Gene ◽

Gene Set ◽

Web Tools ◽

Gene Sets

The GenSensor Suite consists of four web tools for elucidating relationships among genes and proteins. GenPath results show which biochemical, regulatory, or other gene set categories are over- or under-represented in an input list compared to a background list. All common gene sets are available for searching in GenPath, plus some specialized sets. Users can add custom background lists. GenInteract builds an interaction gene list from a single gene input and then analyzes this in GenPath. GenPubMed uses a PubMed query to identify a list of PubMed IDs, from which a gene list is extracted and queried in GenPath. GenViewer allows the user to query one gene set against another in GenPath. GenPath results are presented with relevant P- and q-values in an uncluttered, fully linked, and integrated table. Users can easily copy this table and paste it directly into a spreadsheet or document.

Download Full-text

PhenoExam: an R package and Web application for the examination of phenotypes linked to genes and gene sets

10.1101/2021.06.29.450324 ◽

2021 ◽

Author(s):

Alejandro Cisterna García ◽

Aurora González-Vidal ◽

Daniel Ruiz Villa ◽

Jordi Ortiz Murillo ◽

Alicia Gómez-Pascual ◽

...

Keyword(s):

Web Application ◽

Enrichment Analysis ◽

R Package ◽

Web Interface ◽

Gene Set ◽

New Genes ◽

Gene Sets ◽

Phenotype Analysis ◽

New Gene ◽

Early Onset Parkinson’S Disease

Gene set based phenotype enrichment analysis (detecting phenotypic terms that emerge as significant in a set of genes) can improve the rate of genetic diagnoses amongst other research purposes. To facilitate diverse phenotype analysis, we developed PhenoExam, a freely available R package for tool developers and a web interface for users, which performs: (1) phenotype and disease enrichment analysis on a gene set; (2) measures statistically significant phenotype similarities between gene sets and (3) detects significant differential phenotypes or disease terms across different databases. PhenoExam achieves these tasks by integrating databases or resources such as the HPO, MGD, CRISPRbrain, CTD, ClinGen, CGI, OrphaNET, UniProt, PsyGeNET, and Genomics England Panel App. PhenoExam accepts both human and mouse genes as input. We developed PhenoExam to assist a variety of users, including clinicians, computational biologists and geneticists. It can be used to support the validation of new gene-to-disease discoveries, and in the detection of differential phenotypes between two gene sets (a phenotype linked to one of the gene set but no to the other) that are useful for differential diagnosis and to improve genetic panels. We validated PhenoExam performance through simulations and its application to real cases. We demonstrate that PhenoExam is effective in distinguishing gene sets or Mendelian diseases with very similar phenotypes through projecting the disease-causing genes into their annotation-based phenotypic spaces. We also tested the tool with early onset Parkinson's disease and dystonia genes, to show phenotype-level similarities but also potentially interesting differences. More specifically, we used PhenoExam to validate computationally predicted new genes potentially associated with epilepsy. Therefore, PhenoExam effectively discovers links between phenotypic terms across annotation databases through effective integration. The R package is available at https://github.com/alexcis95/PhenoExam and the Web tool is accessible at https://snca.atica.um.es/PhenoExamWeb/.

Download Full-text

Novel Ultra-Rare Exonic Variants Identified in a Founder Population Implicate Cadherins in Schizophrenia

10.1101/2020.05.29.20115352 ◽

2020 ◽

Author(s):

Todd Lencz ◽

Jin Yu ◽

Raiyan Rashid Khan ◽

Shai Carmi ◽

Max Lam ◽

...

Keyword(s):

Rare Variant ◽

Rare Variants ◽

Gene List ◽

Ashkenazi Jewish ◽

Founder Population ◽

Total N ◽

Gene Set ◽

Complex Disorders ◽

Gene Sets ◽

Common Genetic Variants

AbstractIMPORTANCESchizophrenia is a serious mental illness with high heritability. While common genetic variants account for a portion of the heritability, identification of rare variants associated with the disorder has proven challenging.OBJECTIVETo identify genes and gene sets associated with schizophrenia in a founder population (Ashkenazi Jewish), and to determine the relative power of this population for rare variant discovery.DESIGN, SETTING, AND PARTICIPANTSData on exonic variants were extracted from whole genome sequences drawn from 786 patients with schizophrenia and 463 healthy control subjects, all drawn from the Ashkenazi Jewish population. Variants observed in two large publicly available datasets (total n≈153,000, excluding neuropsychiatric patients) were filtered out, and novel ultra-rare variants (URVs) were compared in cases and controls.MAIN OUTCOMES AND MEASURESThe number of novel URVs and genes carrying them were compared across cases and controls. Genes in which only cases or only controls carried novel, functional URVs were examined using gene set analyses.RESULTSCases had a higher frequency of novel missense or loss of function (MisLoF) variants compared to controls, as well as a greater number of genes impacted by MisLoF variants. Characterizing 141 “case-only” genes (in which ≥ 3 AJ cases in our dataset had MisLoF URVs with none found in our AJ controls), we replicated prior findings of both enrichment for synaptic gene sets, as well as specific genes such as SETD1A and TRIO. Additionally, we identified cadherins as a novel gene set associated with schizophrenia including a recurrent mutation in PCDHA3. Several genes associated with autism and other neurodevelopmental disorders including CACNA1E, ASXL3, SETBP1, and WDFY3, were also identified in our case-only gene list, as was TSC2, which is linked to tuberous sclerosis. Modeling the effects of purifying selection demonstrated that deleterious rare variants are greatly over-represented in a founder population with a tight bottleneck and rapidly expanding census, resulting in enhanced power for rare variant association studies.CONCLUSIONS AND RELEVANCEIdentification of cell adhesion genes in the cadherin/protocadherin family is consistent with evidence from large-scale GWAS in schizophrenia, helps specify the synaptic abnormalities that may be central to the disorder, and suggests novel potential treatment strategies (e.g., inhibition of protein kinase C). Study of founder populations may serve as a cost-effective way to rapidly increase gene discovery in schizophrenia and other complex disorders.

Download Full-text

GeneSetCluster: a tool for summarizing and integrating gene-set analysis results

BMC Bioinformatics ◽

10.1186/s12859-020-03784-z ◽

2020 ◽

Vol 21 (1) ◽

Cited By ~ 1

Author(s):

Ewoud Ewing ◽

Nuria Planell-Picola ◽

Maja Jagodic ◽

David Gomez-Cabrero

Keyword(s):

Gene Content ◽

Gene Set Analysis ◽

Gene Set ◽

Overlapping Gene ◽

Analysis Tools ◽

Novel Approach ◽

Gene Sets ◽

Distance Score ◽

Significant Gene ◽

Similar Gene

Abstract Background Gene-set analysis tools, which make use of curated sets of molecules grouped based on their shared functions, aim to identify which gene-sets are over-represented in the set of features that have been associated with a given trait of interest. Such tools are frequently used in gene-centric approaches derived from RNA-sequencing or microarrays such as Ingenuity or GSEA, but they have also been adapted for interval-based analysis derived from DNA methylation or ChIP/ATAC-sequencing. Gene-set analysis tools return, as a result, a list of significant gene-sets. However, while these results are useful for the researcher in the identification of major biological insights, they may be complex to interpret because many gene-sets have largely overlapping gene contents. Additionally, in many cases the result of gene-set analysis consists of a large number of gene-sets making it complicated to identify the major biological insights. Results We present GeneSetCluster, a novel approach which allows clustering of identified gene-sets, from one or multiple experiments and/or tools, based on shared genes. GeneSetCluster calculates a distance score based on overlapping gene content, which is then used to cluster them together and as a result, GeneSetCluster identifies groups of gene-sets with similar gene-set definitions (i.e. gene content). These groups of gene-sets can aid the researcher to focus on such groups for biological interpretations. Conclusions GeneSetCluster is a novel approach for grouping together post gene-set analysis results based on overlapping gene content. GeneSetCluster is implemented as a package in R. The package and the vignette can be downloaded at https://github.com/TranslationalBioinformaticsUnit

Download Full-text

Epigenome-450K-wide methylation signatures of active cigarette smoking: The Young Finns Study

Bioscience Reports ◽

10.1042/bsr20200596 ◽

2020 ◽

Vol 40 (7) ◽

Author(s):

Pashupati P. Mishra ◽

Ismo Hänninen ◽

Emma Raitoharju ◽

Saara Marttila ◽

Binisha H. Mishra ◽

...

Keyword(s):

Dna Methylation ◽

Olfactory Receptor ◽

Cpg Island ◽

Receptor Activity ◽

Gene Set Analysis ◽

Gene Set ◽

Cpg Sites ◽

Gene Sets ◽

The Impact ◽

Young Finns Study

Abstract Smoking as a major risk factor for morbidity affects numerous regulatory systems of the human body including DNA methylation. Most of the previous studies with genome-wide methylation data are based on conventional association analysis and earliest threshold-based gene set analysis that lacks sensitivity to be able to reveal all the relevant effects of smoking. The aim of the present study was to investigate the impact of active smoking on DNA methylation at three biological levels: 5′-C-phosphate-G-3′ (CpG) sites, genes and functionally related genes (gene sets). Gene set analysis was done with mGSZ, a modern threshold-free method previously developed by us that utilizes all the genes in the experiment and their differential methylation scores. Application of such method in DNA methylation study is novel. Epigenome-wide methylation levels were profiled from Young Finns Study (YFS) participants’ whole blood from 2011 follow-up using Illumina Infinium HumanMethylation450 BeadChips. We identified three novel smoking related CpG sites and replicated 57 of the previously identified ones. We found that smoking is associated with hypomethylation in shore (genomic regions 0–2 kilobases from CpG island). We identified smoking related methylation changes in 13 gene sets with false discovery rate (FDR) ≤ 0.05, among which is olfactory receptor activity, the flagship novel finding of the present study. Overall, we extended the current knowledge by identifying: (i) three novel smoking related CpG sites, (ii) similar effects as aging on average methylation in shore, and (iii) a novel finding that olfactory receptor activity pathway responds to tobacco smoke and toxin exposure through epigenetic mechanisms.

Download Full-text

Silver: Forging almost Gold Standard Datasets

Genes ◽

10.3390/genes12101523 ◽

2021 ◽

Vol 12 (10) ◽

pp. 1523

Author(s):

Farhad Maleki ◽

Katie Ovens ◽

Ian McQuillan ◽

Anthony J. Kusalik

Keyword(s):

Gold Standard ◽

Best Practice ◽

Evaluation Studies ◽

A Priori ◽

Real Data ◽

Gene Set Analysis ◽

Gene Set ◽

Analysis Methods ◽

Gene Sets ◽

New Gene

Gene set analysis has been widely used to gain insight from high-throughput expression studies. Although various tools and methods have been developed for gene set analysis, there is no consensus among researchers regarding best practice(s). Most often, evaluation studies have reported contradictory recommendations of which methods are superior. Therefore, an unbiased quantitative framework for evaluations of gene set analysis methods will be valuable. Such a framework requires gene expression datasets where enrichment status of gene sets is known a priori. In the absence of such gold standard datasets, artificial datasets are commonly used for evaluations of gene set analysis methods; however, they often rely on oversimplifying assumptions that make them biased in favor of or against a given method. In this paper, we propose a quantitative framework for evaluation of gene set analysis methods by synthesizing expression datasets using real data, without relying on oversimplifying or unrealistic assumptions, while preserving complex gene–gene correlations and retaining the distribution of expression values. The utility of the quantitative approach is shown by evaluating ten widely used gene set analysis methods. An implementation of the proposed method is publicly available. We suggest using Silver to evaluate existing and new gene set analysis methods. Evaluation using Silver provides a better understanding of current methods and can aid in the development of gene set analysis methods to achieve higher specificity without sacrificing sensitivity.

Download Full-text

Excitatory/inhibitory imbalance in autism: the role of glutamate and GABA gene-sets in symptoms and cortical brain structure

10.1101/2021.12.20.473501 ◽

2021 ◽

Author(s):

Viola Hollestein ◽

Geert Poelmans ◽

Natalie Forde ◽

Christian F Beckmann ◽

Christine Ecker ◽

...

Keyword(s):

Gene Expression ◽

Sensory Processing ◽

Brain Structure ◽

Symptom Severity ◽

Expression Profiles ◽

Brain Regions ◽

Gene Set Analysis ◽

Gene Set ◽

Gene Sets ◽

Autism Symptomatology

Background: The excitatory/inhibitory (E/I) imbalance hypothesis posits that an imbalance between excitatory (glutamatergic) and inhibitory (GABAergic) mechanisms underlies the behavioral characteristics of autism spectrum disorder (autism). However, how E/I imbalance arises and how it may differ across autism symptomatology and brain regions is not well understood. Methods: We used innovative analysis methods - combining competitive gene-set analysis and gene-expression profiles in relation to cortical thickness (CT)- to investigate the relationship between genetic variance, brain structure and autism symptomatology of participants from the EU-AIMS LEAP cohort (autism=360, male/female=259/101; neurotypical control participants=279, male/female=178/101) aged 6 to 30 years. Competitive gene-set analysis investigated associations between glutamatergic and GABAergic signaling pathway gene-sets and clinical measures, and CT. Additionally, we investigated expression profiles of the genes within those sets throughout the brain and how those profiles relate to differences in CT between autistic and neurotypical control participants in the same regions. Results: The glutamate gene-set was associated with all autism symptom severity scores on the Autism Diagnostic Observation Schedule-2 (ADOS-2) and the Autism Diagnostic Interview-Revised (ADI-R) within the autistic group, while the GABA set was associated with sensory processing measures (using the SSP subscales) across all participants. Brain regions with greater gene expression of both glutamate and GABA genes showed greater differences in CT between autistic and neurotypical control participants. Conclusions: Our results suggest crucial roles for glutamate and GABA genes in autism symptomatology as well as CT, where GABA is more strongly associated with sensory processing and glutamate more with autism symptom severity.

Download Full-text