GSAn: an alternative to enrichment analysis for annotating gene sets

Abstract The revolution in new sequencing technologies is greatly leading to new understandings of the relations between genotype and phenotype. To interpret and analyze data that are grouped according to a phenotype of interest, methods based on statistical enrichment became a standard in biology. However, these methods synthesize the biological information by a priori selecting the over-represented terms and may suffer from focusing on the most studied genes that represent a limited coverage of annotated genes within a gene set. Semantic similarity measures have shown great results within the pairwise gene comparison by making advantage of the underlying structure of the Gene Ontology. We developed GSAn, a novel gene set annotation method that uses semantic similarity measures to synthesize a priori Gene Ontology annotation terms. The originality of our approach is to identify the best compromise between the number of retained annotation terms that has to be drastically reduced and the number of related genes that has to be as large as possible. Moreover, GSAn offers interactive visualization facilities dedicated to the multi-scale analysis of gene set annotations. Compared to enrichment analysis tools, GSAn has shown excellent results in terms of maximizing the gene coverage while minimizing the number of terms.

Download Full-text

Higher Acid-Base Imbalance Associated with Respiratory Failure Could Decrease the Survival of Patients with Scrub Typhus during Intensive Care Unit Stay: A Gene Set Enrichment Analysis

Journal of Clinical Medicine ◽

10.3390/jcm8101580 ◽

2019 ◽

Vol 8 (10) ◽

pp. 1580 ◽

Cited By ~ 1

Author(s):

Kyoung Min Moon ◽

Kyueng-Whan Min ◽

Mi-Hye Kim ◽

Dong-Hoon Kim ◽

Byoung Kwan Son ◽

...

Keyword(s):

Intensive Care Unit ◽

Intensive Care ◽

Respiratory Failure ◽

Scrub Typhus ◽

Enrichment Analysis ◽

Gene Set Enrichment Analysis ◽

Acid Base ◽

Gene Set Enrichment ◽

Gene Set ◽

Gene Sets

Ninety percent of patients with scrub typhus (SC) with vasculitis-like syndrome recover after mild symptoms; however, 10% can suffer serious complications, such as acute respiratory failure (ARF) and admission to the intensive care unit (ICU). Predictors for the progression of SC have not yet been established, and conventional scoring systems for ICU patients are insufficient to predict severity. We aimed to identify simple and robust indicators to predict aggressive behaviors of SC. We evaluated 91 patients with SC and 81 non-SC patients who were admitted to the ICU, and 32 cases from the public functional genomics data repository for gene expression analysis. We analyzed the relationships between several predictors and clinicopathological characteristics in patients with SC. We performed gene set enrichment analysis (GSEA) to identify SC-specific gene sets. The acid-base imbalance (ABI), measured 24 h before serious complications, was higher in patients with SC than in non-SC patients. A high ABI was associated with an increased incidence of ARF, leading to mechanical ventilation and worse survival. GSEA revealed that SC correlated to gene sets reflecting inflammation/apoptotic response and airway inflammation. ABI can be used to indicate ARF in patients with SC and assist with early detection.

Download Full-text

Drug perturbation gene set enrichment analysis (dpGSEA): a new transcriptomic drug screening approach

BMC Bioinformatics ◽

10.1186/s12859-020-03929-0 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Mike Fang ◽

Brian Richardson ◽

Cheryl M. Cameron ◽

Jean-Eudes Dazard ◽

Mark J. Cameron

Keyword(s):

Drug Targets ◽

T Regulatory Cells ◽

Enrichment Analysis ◽

Gene Set Enrichment Analysis ◽

Regulatory Cells ◽

Gene Set Enrichment ◽

Gene Set ◽

Gene Sets ◽

Gastroenteropancreatic Neuroendocrine Tumor ◽

Public Datasets

Abstract Background In this study, we demonstrate that our modified Gene Set Enrichment Analysis (GSEA) method, drug perturbation GSEA (dpGSEA), can detect phenotypically relevant drug targets through a unique transcriptomic enrichment that emphasizes biological directionality of drug-derived gene sets. Results We detail our dpGSEA method and show its effectiveness in detecting specific perturbation of drugs in independent public datasets by confirming fluvastatin, paclitaxel, and rosiglitazone perturbation in gastroenteropancreatic neuroendocrine tumor cells. In drug discovery experiments, we found that dpGSEA was able to detect phenotypically relevant drug targets in previously published differentially expressed genes of CD4+T regulatory cells from immune responders and non-responders to antiviral therapy in HIV-infected individuals, such as those involved with virion replication, cell cycle dysfunction, and mitochondrial dysfunction. dpGSEA is publicly available at https://github.com/sxf296/drug_targeting. Conclusions dpGSEA is an approach that uniquely enriches on drug-defined gene sets while considering directionality of gene modulation. We recommend dpGSEA as an exploratory tool to screen for possible drug targeting molecules.

Download Full-text

Weighted Kolmogorov Smirnov testing: an alternative for Gene Set Enrichment Analysis

Statistical Applications in Genetics and Molecular Biology ◽

10.1515/sagmb-2014-0077 ◽

2015 ◽

Vol 14 (3) ◽

Cited By ~ 13

Author(s):

Konstantina Charmpi ◽

Bernard Ycart

Keyword(s):

Weight Function ◽

Null Hypothesis ◽

Computing Time ◽

Enrichment Analysis ◽

Gene Set Enrichment Analysis ◽

Test Statistic ◽

Gene Set Enrichment ◽

Gene Set ◽

Gene Sets ◽

Kolmogorov Smirnov

AbstractGene Set Enrichment Analysis (GSEA) is a basic tool for genomic data treatment. Its test statistic is based on a cumulated weight function, and its distribution under the null hypothesis is evaluated by Monte-Carlo simulation. Here, it is proposed to subtract to the cumulated weight function its asymptotic expectation, then scale it. Under the null hypothesis, the convergence in distribution of the new test statistic is proved, using the theory of empirical processes. The limiting distribution needs to be computed only once, and can then be used for many different gene sets. This results in large savings in computing time. The test defined in this way has been called Weighted Kolmogorov Smirnov (WKS) test. Using expression data from the GEO repository, tested against the MSig Database C2, a comparison between the classical GSEA test and the new procedure has been conducted. Our conclusion is that, beyond its mathematical and algorithmic advantages, the WKS test could be more informative in many cases, than the classical GSEA test.

Download Full-text

GSEA-InContext Explorer: An interactive visualization tool for putting gene set enrichment analysis results into biological context

10.1101/659847 ◽

2019 ◽

Author(s):

Rani K. Powers ◽

Anthony Sun ◽

James C. Costello

Keyword(s):

Statistical Significance ◽

Null Distribution ◽

Enrichment Analysis ◽

Gene Set Enrichment Analysis ◽

Gene Set Enrichment ◽

Gene Set ◽

Link Type ◽

Interactive Interface ◽

Gene Sets ◽

Shiny App

AbstractSummaryGSEA-InContext Explorer is a Shiny app that allows users to perform two methods of gene set enrichment analysis (GSEA). The first, GSEAPreranked, applies the GSEA algorithm in which statistical significance is estimated from a null distribution of enrichment scores generated for randomly permuted gene sets. The second, GSEA-InContext, incorporates a user-defined set of background experiments to define the null distribution and calculate statistical significance. GSEA-InContext Explorer allows the user to build custom background sets from a compendium of over 5,700 curated experiments, run both GSEAPreranked and GSEA-InContext on their own uploaded experiment, and explore the results using an interactive interface. This tool will allow researchers to visualize gene sets that are commonly enriched across experiments and identify gene sets that are uniquely significant in their experiment, thus complementing current methods for interpreting gene set enrichment results.Availability and implementationThe code for GSEA-InContext Explorer is available at: https://github.com/CostelloLab/GSEA-InContext_Explorer and the interactive tool is at: http://gsea-incontext_explorer.ngrok.io

Download Full-text

PhenoExam: an R package and Web application for the examination of phenotypes linked to genes and gene sets

10.1101/2021.06.29.450324 ◽

2021 ◽

Author(s):

Alejandro Cisterna García ◽

Aurora González-Vidal ◽

Daniel Ruiz Villa ◽

Jordi Ortiz Murillo ◽

Alicia Gómez-Pascual ◽

...

Keyword(s):

Web Application ◽

Enrichment Analysis ◽

R Package ◽

Web Interface ◽

Gene Set ◽

New Genes ◽

Gene Sets ◽

Phenotype Analysis ◽

New Gene ◽

Early Onset Parkinson’S Disease

Gene set based phenotype enrichment analysis (detecting phenotypic terms that emerge as significant in a set of genes) can improve the rate of genetic diagnoses amongst other research purposes. To facilitate diverse phenotype analysis, we developed PhenoExam, a freely available R package for tool developers and a web interface for users, which performs: (1) phenotype and disease enrichment analysis on a gene set; (2) measures statistically significant phenotype similarities between gene sets and (3) detects significant differential phenotypes or disease terms across different databases. PhenoExam achieves these tasks by integrating databases or resources such as the HPO, MGD, CRISPRbrain, CTD, ClinGen, CGI, OrphaNET, UniProt, PsyGeNET, and Genomics England Panel App. PhenoExam accepts both human and mouse genes as input. We developed PhenoExam to assist a variety of users, including clinicians, computational biologists and geneticists. It can be used to support the validation of new gene-to-disease discoveries, and in the detection of differential phenotypes between two gene sets (a phenotype linked to one of the gene set but no to the other) that are useful for differential diagnosis and to improve genetic panels. We validated PhenoExam performance through simulations and its application to real cases. We demonstrate that PhenoExam is effective in distinguishing gene sets or Mendelian diseases with very similar phenotypes through projecting the disease-causing genes into their annotation-based phenotypic spaces. We also tested the tool with early onset Parkinson's disease and dystonia genes, to show phenotype-level similarities but also potentially interesting differences. More specifically, we used PhenoExam to validate computationally predicted new genes potentially associated with epilepsy. Therefore, PhenoExam effectively discovers links between phenotypic terms across annotation databases through effective integration. The R package is available at https://github.com/alexcis95/PhenoExam and the Web tool is accessible at https://snca.atica.um.es/PhenoExamWeb/.

Download Full-text

Alterations in the host transcriptome in vitro and in vivo following severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) infection

10.21203/rs.3.rs-37567/v1 ◽

2020 ◽

Author(s):

Xiaomei Lei ◽

Zhijun Feng ◽

Xiaojun Wang ◽

Xiaodong He

Keyword(s):

Gene Expression ◽

Cell Cycle ◽

Microarray Data ◽

Molecular Mechanisms ◽

Enrichment Analysis ◽

Gene Set Enrichment Analysis ◽

Gene Set ◽

Gene Sets ◽

Core Genes

Abstract Background. Exploring alterations in the host transcriptome following SARS-CoV-2 infection is not only highly warranted to help us understand molecular mechanisms of the disease, but also provide new prospective for screening effective antiviral drugs, finding new therapeutic targets, and evaluating the risk of systemic inflammatory response syndrome (SIRS) early.Methods. We downloaded three gene expression matrix files from the Gene Expression Omnibus (GEO) database, and extracted the gene expression data of the SARS-CoV-2 infection and non-infection in human samples and different cell line samples, and then performed gene set enrichment analysis (GSEA), respectively. Thereafter, we integrated the results of GSEA and obtained co-enriched gene sets and co-core genes in three various microarray data. Finally, we also constructed a protein-protein interaction (PPI) network and molecular modules for co-core genes and performed Kyoto encyclopedia of genes and genomes (KEGG) pathway analysis for the genes from modules to clarify their possible biological processes and underlying signaling pathway. Results. A total of 11 co-enriched gene sets were identiﬁed from the three various microarray data. Among them, 10 gene sets were activated, and involved in immune response and inflammatory reaction. 1 gene set was suppressed, and participated in cell cycle. The analysis of molecular modules showed that 2 modules might play a vital role in the pathogenic process of SARS-CoV-2 infection. The KEGG enrichment analysis showed that genes from module one enriched in signaling pathways related to inflammation, but genes from module two enriched in signaling of cell cycle and DNA replication. Particularly, necroptosis signaling, a newly identified type of programmed cell death that differed from apoptosis, was also determined in our findings. Additionally, for patients with SARS-CoV-2 infection, genes from module one showed a relatively high-level expression while genes from module two showed low-level. Conclusions. We identified two molecular modules were used to assess severity and predict the prognosis of the patients with SARS-CoV-2 infection. In addition, these results provide a unique opportunity to explore more molecular pathways as new potential targets on therapy in COVID 19.

Download Full-text

Gene Set Enrichment Analysis of Ki-67high CLL Clones Suggests Complex Interactions of B-Cell Receptor Signaling and Normal Cell Interactions in the Disease

Blood ◽

10.1182/blood.v118.21.2833.2833 ◽

2011 ◽

Vol 118 (21) ◽

pp. 2833-2833

Author(s):

Xiao J. Yan ◽

Daniel Kalenscher ◽

Erin Boyle ◽

Sophia Yancopoulos ◽

Rajendra N Damle ◽

...

Keyword(s):

T Cell ◽

B Cell ◽

Enrichment Analysis ◽

Gene Set Enrichment Analysis ◽

Gene Set Enrichment ◽

Ki 67 ◽

Mutation Status ◽

Gene Set ◽

Gene Sets ◽

Bcr Signaling

Abstract Abstract 2833 Introduction: In chronic lymphocytic leukemia (CLL), clonally expanded CD5+ B lymphocytes eventually overwhelm healthy immune cells, hindering normal immune function. To determine mechanisms fueling this expansion, gene expression data were gathered by microarray analysis of cells from CLL patients. Samples were grouped based on Ki-67 expression, an indicator of proliferation. To determine mechanisms correlating with B-cell proliferation and impacting on CLL B-cell biology, microarray profiles were compared using Gene Set Enrichment Analysis (GSEA) [Subramanian A, et al. PNAS 2005]. Methods: Samples were analyzed for intracellular expression of Ki-67 by flow cytometry and divided into 2 groups based on Ki-67 expression (cutoff at 5%). RNA was then purified from CD5+CD19+ CLL cells and gene expression microarray assays were performed using Illumina HumanHT12 beadchips. GSEA was carried out using a library of signatures by Dr. Louis Staudt [Shaffer AL, et al. Immunol Rev 2006] containing 305 gene sets encompassing 13, 564 genes biased towards hematopoietic signatures. Results: Of 61 cases, 14 were Ki-67high and 47 were Ki-67low. When time-to-first-treatment (TTFT) was compared between the groups, Ki67high patients had significantly shorter TTFT (2.76 yrs) compared to Ki-67low patients (23.46 yrs; P<0.0001). By GSEA, we determined 255/285 gene sets were upregulated in the Ki-67high group with 50 gene sets significantly enriched at a false discovery rate (FDR) <25%. For the Ki-67low group, 30/285 gene sets were upregulated with only one significant at FDR <25%. IGHV unmutated CLL (U-CLL) was enriched in only one gene set, termed CLLUNMUT-1, while mutated CLL (M-CLL) was only enriched in CLLMUT-1. CD38high and CD38low subsets were similarly enriched in these two gene sets, with 4 additional gene sets in the CD38high group, including MYD88UP-4 and IFN-2. Of the 50 significantly enriched gene sets in the Ki-67high group, 17 relate to signaling pathways, 16 to cellular differentiation, 6 to cellular processes, 4 to transcription factor targets, and the remaining 7 relate to cancer. Of these, the percentage of the signaling component is up 13% from its representation in the original Staudt library. The top 5 gene sets enriched in the Ki-67high group are: upregulated U-CLL compared to M-CLL (CLLUNMUT-1), myeloid tissue compared to other tissues (MYELOID-1), T cell cytokine induced proliferation (TCYTUP-8), BCR crosslinking CLL B cells (CLLBCRUP-1) and BDCA4+ dendritic cells compared to other hematopoietic cells (DC-1). The total number of genes enriched in these 50 sets is 769, with 217 genes shared in two or more gene sets. Twenty genes were enriched in the CLL BCR signature, CLLBCRUP-1 [Herishanu Y, et al. Blood 2011]. Of these, WARS, IRF4, MX1, OAS1, and NAMPT are also enriched in the T cell cytokine induced and T cell activation signatures. Only one gene set was enriched in the Ki-67low group, CLLMUT-1, upregulated in M-CLL compared to U-CLL. CD274 (PD-L1) was consistently elevated in the Ki-67low group in all the patients, irrespective of IGHV mutation status. Discussion: The observed GSEA profiles in Ki-67high patients correlated with gene signatures biased towards BCR signaling, signal transduction, and hematopoietic cancer, consistent with the Ki-67high group containing more (recently) proliferating cells influenced at least in part by BCR signaling. The profiles also suggest that additional cells (T lymphocytes and dendritic cells) may be involved. It is notable these gene sets were not observed for CLL patients subgrouped by IGHV mutation status or by CD38, and that these other subsets did not show as pronounced a distinction by GSEA profiling. Disclosures: No relevant conflicts of interest to declare.

Download Full-text

Towards a gold standard for benchmarking gene set enrichment analysis

10.1101/674267 ◽

2019 ◽

Cited By ~ 1

Author(s):

Ludwig Geistlinger ◽

Gergely Csaba ◽

Mara Santarelli ◽

Marcel Ramos ◽

Lucas Schiffer ◽

...

Keyword(s):

Ad Hoc ◽

Enrichment Analysis ◽

Gene Set Enrichment Analysis ◽

Data Sets ◽

Expression Data ◽

Rna Seq ◽

Gene Set Enrichment ◽

Gene Set ◽

Gene Sets ◽

Enrichment Methods

AbstractBackgroundAlthough gene set enrichment analysis has become an integral part of high-throughput gene expression data analysis, the assessment of enrichment methods remains rudimentary and ad hoc. In the absence of suitable gold standards, evaluations are commonly restricted to selected data sets and biological reasoning on the relevance of resulting enriched gene sets. However, this is typically incomplete and biased towards the goals of individual investigations.ResultsWe present a general framework for standardized and structured benchmarking of enrichment methods based on defined criteria for applicability, gene set prioritization, and detection of relevant processes. This framework incorporates a curated compendium of 75 expression data sets investigating 42 different human diseases. The compendium features microarray and RNA-seq measurements, and each dataset is associated with a precompiled GO/KEGG relevance ranking for the corresponding disease under investigation. We perform a comprehensive assessment of 10 major enrichment methods on the benchmark compendium, identifying significant differences in (i) runtime and applicability to RNA-seq data, (ii) fraction of enriched gene sets depending on the type of null hypothesis tested, and (iii) recovery of the a priori defined relevance rankings. Based on these findings, we make practical recommendations on (i) how methods originally developed for microarray data can efficiently be applied to RNA-seq data, (ii) how to interpret results depending on the type of gene set test conducted, and (iii) which methods are best suited to effectively prioritize gene sets with high relevance for the phenotype investigated.ConclusionWe carried out a systematic assessment of existing enrichment methods, and identified best performing methods, but also general shortcomings in how gene set analysis is currently conducted. We provide a directly executable benchmark system for straightforward assessment of additional enrichment methods.Availabilityhttp://bioconductor.org/packages/GSEABenchmarkeR

Download Full-text

Host transcriptome alterations in vitro and in vivo following severe acute respiratory syndrome coronavirus 2 infection

10.21203/rs.3.rs-37567/v2 ◽

2021 ◽

Author(s):

Yannian Luo ◽

Juan Xu ◽

Mingzhen Zhou ◽

Xiaomei Lei ◽

Wen Cao ◽

...

Keyword(s):

Gene Expression ◽

Cell Cycle ◽

Microarray Data ◽

Molecular Mechanisms ◽

Enrichment Analysis ◽

Gene Set Enrichment Analysis ◽

Gene Set ◽

Gene Sets ◽

Core Genes

Abstract Background. Exploring alterations in the host transcriptome following SARS-CoV-2 infection is not only highly warranted to help us understand molecular mechanisms of the disease, but also provide new prospective for screening effective antiviral drugs, finding new therapeutic targets, and evaluating the risk of systemic inflammatory response syndrome (SIRS) early.Methods. We downloaded three gene expression matrix files from the Gene Expression Omnibus (GEO) database, and extracted the gene expression data of the SARS-CoV-2 infection and non-infection in human samples and different cell line samples, and then performed gene set enrichment analysis (GSEA), respectively. Thereafter, we integrated the results of GSEA and obtained co-enriched gene sets and co-core genes in three various microarray data. Finally, we also constructed a protein-protein interaction (PPI) network and molecular modules for co-core genes and performed Kyoto encyclopedia of genes and genomes (KEGG) pathway analysis for the genes from modules to clarify their possible biological processes and underlying signaling pathway. Results. A total of 11 co-enriched gene sets were identiﬁed from the three various microarray data. Among them, 10 gene sets were activated, and involved in immune response and inflammatory reaction. 1 gene set was suppressed, and participated in cell cycle. The analysis of molecular modules showed that 2 modules might play a vital role in the pathogenic process of SARS-CoV-2 infection. The KEGG enrichment analysis showed that genes from module one enriched in signaling pathways related to inflammation, but genes from module two enriched in signaling of cell cycle and DNA replication. Particularly, necroptosis signaling, a newly identified type of programmed cell death that differed from apoptosis, was also determined in our findings. Additionally, for patients with SARS-CoV-2 infection, genes from module one showed a relatively high-level expression while genes from module two showed low-level. Conclusions. We identified two molecular modules were used to assess severity and predict the prognosis of the patients with SARS-CoV-2 infection. In addition, these results provide a unique opportunity to explore more molecular pathways as new potential targets on therapy in COVID 19.

Download Full-text