WhichGenes: a web-based tool for gathering, building, storing and exporting gene sets with application in gene set enrichment analysis

Ninety percent of patients with scrub typhus (SC) with vasculitis-like syndrome recover after mild symptoms; however, 10% can suffer serious complications, such as acute respiratory failure (ARF) and admission to the intensive care unit (ICU). Predictors for the progression of SC have not yet been established, and conventional scoring systems for ICU patients are insufficient to predict severity. We aimed to identify simple and robust indicators to predict aggressive behaviors of SC. We evaluated 91 patients with SC and 81 non-SC patients who were admitted to the ICU, and 32 cases from the public functional genomics data repository for gene expression analysis. We analyzed the relationships between several predictors and clinicopathological characteristics in patients with SC. We performed gene set enrichment analysis (GSEA) to identify SC-specific gene sets. The acid-base imbalance (ABI), measured 24 h before serious complications, was higher in patients with SC than in non-SC patients. A high ABI was associated with an increased incidence of ARF, leading to mechanical ventilation and worse survival. GSEA revealed that SC correlated to gene sets reflecting inflammation/apoptotic response and airway inflammation. ABI can be used to indicate ARF in patients with SC and assist with early detection.

Download Full-text

Drug perturbation gene set enrichment analysis (dpGSEA): a new transcriptomic drug screening approach

BMC Bioinformatics ◽

10.1186/s12859-020-03929-0 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Mike Fang ◽

Brian Richardson ◽

Cheryl M. Cameron ◽

Jean-Eudes Dazard ◽

Mark J. Cameron

Keyword(s):

Drug Targets ◽

T Regulatory Cells ◽

Enrichment Analysis ◽

Gene Set Enrichment Analysis ◽

Regulatory Cells ◽

Gene Set Enrichment ◽

Gene Set ◽

Gene Sets ◽

Gastroenteropancreatic Neuroendocrine Tumor ◽

Public Datasets

Abstract Background In this study, we demonstrate that our modified Gene Set Enrichment Analysis (GSEA) method, drug perturbation GSEA (dpGSEA), can detect phenotypically relevant drug targets through a unique transcriptomic enrichment that emphasizes biological directionality of drug-derived gene sets. Results We detail our dpGSEA method and show its effectiveness in detecting specific perturbation of drugs in independent public datasets by confirming fluvastatin, paclitaxel, and rosiglitazone perturbation in gastroenteropancreatic neuroendocrine tumor cells. In drug discovery experiments, we found that dpGSEA was able to detect phenotypically relevant drug targets in previously published differentially expressed genes of CD4+T regulatory cells from immune responders and non-responders to antiviral therapy in HIV-infected individuals, such as those involved with virion replication, cell cycle dysfunction, and mitochondrial dysfunction. dpGSEA is publicly available at https://github.com/sxf296/drug_targeting. Conclusions dpGSEA is an approach that uniquely enriches on drug-defined gene sets while considering directionality of gene modulation. We recommend dpGSEA as an exploratory tool to screen for possible drug targeting molecules.

Download Full-text

Abstract B1-35: Enrichr2: Next generation gene set enrichment analysis web-based tool

10.1158/1538-7445.compsysbio-b1-35 ◽

2015 ◽

Author(s):

Matthew R. Jones

Keyword(s):

Enrichment Analysis ◽

Gene Set Enrichment Analysis ◽

Next Generation ◽

Gene Set Enrichment ◽

Web Based ◽

Gene Set

Download Full-text

Weighted Kolmogorov Smirnov testing: an alternative for Gene Set Enrichment Analysis

Statistical Applications in Genetics and Molecular Biology ◽

10.1515/sagmb-2014-0077 ◽

2015 ◽

Vol 14 (3) ◽

Cited By ~ 13

Author(s):

Konstantina Charmpi ◽

Bernard Ycart

Keyword(s):

Weight Function ◽

Null Hypothesis ◽

Computing Time ◽

Enrichment Analysis ◽

Gene Set Enrichment Analysis ◽

Test Statistic ◽

Gene Set Enrichment ◽

Gene Set ◽

Gene Sets ◽

Kolmogorov Smirnov

AbstractGene Set Enrichment Analysis (GSEA) is a basic tool for genomic data treatment. Its test statistic is based on a cumulated weight function, and its distribution under the null hypothesis is evaluated by Monte-Carlo simulation. Here, it is proposed to subtract to the cumulated weight function its asymptotic expectation, then scale it. Under the null hypothesis, the convergence in distribution of the new test statistic is proved, using the theory of empirical processes. The limiting distribution needs to be computed only once, and can then be used for many different gene sets. This results in large savings in computing time. The test defined in this way has been called Weighted Kolmogorov Smirnov (WKS) test. Using expression data from the GEO repository, tested against the MSig Database C2, a comparison between the classical GSEA test and the new procedure has been conducted. Our conclusion is that, beyond its mathematical and algorithmic advantages, the WKS test could be more informative in many cases, than the classical GSEA test.

Download Full-text

GSEA-InContext Explorer: An interactive visualization tool for putting gene set enrichment analysis results into biological context

10.1101/659847 ◽

2019 ◽

Author(s):

Rani K. Powers ◽

Anthony Sun ◽

James C. Costello

Keyword(s):

Statistical Significance ◽

Null Distribution ◽

Enrichment Analysis ◽

Gene Set Enrichment Analysis ◽

Gene Set Enrichment ◽

Gene Set ◽

Link Type ◽

Interactive Interface ◽

Gene Sets ◽

Shiny App

AbstractSummaryGSEA-InContext Explorer is a Shiny app that allows users to perform two methods of gene set enrichment analysis (GSEA). The first, GSEAPreranked, applies the GSEA algorithm in which statistical significance is estimated from a null distribution of enrichment scores generated for randomly permuted gene sets. The second, GSEA-InContext, incorporates a user-defined set of background experiments to define the null distribution and calculate statistical significance. GSEA-InContext Explorer allows the user to build custom background sets from a compendium of over 5,700 curated experiments, run both GSEAPreranked and GSEA-InContext on their own uploaded experiment, and explore the results using an interactive interface. This tool will allow researchers to visualize gene sets that are commonly enriched across experiments and identify gene sets that are uniquely significant in their experiment, thus complementing current methods for interpreting gene set enrichment results.Availability and implementationThe code for GSEA-InContext Explorer is available at: https://github.com/CostelloLab/GSEA-InContext_Explorer and the interactive tool is at: http://gsea-incontext_explorer.ngrok.io

Download Full-text

Gene Set Enrichment Analysis of Ki-67high CLL Clones Suggests Complex Interactions of B-Cell Receptor Signaling and Normal Cell Interactions in the Disease

Blood ◽

10.1182/blood.v118.21.2833.2833 ◽

2011 ◽

Vol 118 (21) ◽

pp. 2833-2833

Author(s):

Xiao J. Yan ◽

Daniel Kalenscher ◽

Erin Boyle ◽

Sophia Yancopoulos ◽

Rajendra N Damle ◽

...

Keyword(s):

T Cell ◽

B Cell ◽

Enrichment Analysis ◽

Gene Set Enrichment Analysis ◽

Gene Set Enrichment ◽

Ki 67 ◽

Mutation Status ◽

Gene Set ◽

Gene Sets ◽

Bcr Signaling

Abstract Abstract 2833 Introduction: In chronic lymphocytic leukemia (CLL), clonally expanded CD5+ B lymphocytes eventually overwhelm healthy immune cells, hindering normal immune function. To determine mechanisms fueling this expansion, gene expression data were gathered by microarray analysis of cells from CLL patients. Samples were grouped based on Ki-67 expression, an indicator of proliferation. To determine mechanisms correlating with B-cell proliferation and impacting on CLL B-cell biology, microarray profiles were compared using Gene Set Enrichment Analysis (GSEA) [Subramanian A, et al. PNAS 2005]. Methods: Samples were analyzed for intracellular expression of Ki-67 by flow cytometry and divided into 2 groups based on Ki-67 expression (cutoff at 5%). RNA was then purified from CD5+CD19+ CLL cells and gene expression microarray assays were performed using Illumina HumanHT12 beadchips. GSEA was carried out using a library of signatures by Dr. Louis Staudt [Shaffer AL, et al. Immunol Rev 2006] containing 305 gene sets encompassing 13, 564 genes biased towards hematopoietic signatures. Results: Of 61 cases, 14 were Ki-67high and 47 were Ki-67low. When time-to-first-treatment (TTFT) was compared between the groups, Ki67high patients had significantly shorter TTFT (2.76 yrs) compared to Ki-67low patients (23.46 yrs; P<0.0001). By GSEA, we determined 255/285 gene sets were upregulated in the Ki-67high group with 50 gene sets significantly enriched at a false discovery rate (FDR) <25%. For the Ki-67low group, 30/285 gene sets were upregulated with only one significant at FDR <25%. IGHV unmutated CLL (U-CLL) was enriched in only one gene set, termed CLLUNMUT-1, while mutated CLL (M-CLL) was only enriched in CLLMUT-1. CD38high and CD38low subsets were similarly enriched in these two gene sets, with 4 additional gene sets in the CD38high group, including MYD88UP-4 and IFN-2. Of the 50 significantly enriched gene sets in the Ki-67high group, 17 relate to signaling pathways, 16 to cellular differentiation, 6 to cellular processes, 4 to transcription factor targets, and the remaining 7 relate to cancer. Of these, the percentage of the signaling component is up 13% from its representation in the original Staudt library. The top 5 gene sets enriched in the Ki-67high group are: upregulated U-CLL compared to M-CLL (CLLUNMUT-1), myeloid tissue compared to other tissues (MYELOID-1), T cell cytokine induced proliferation (TCYTUP-8), BCR crosslinking CLL B cells (CLLBCRUP-1) and BDCA4+ dendritic cells compared to other hematopoietic cells (DC-1). The total number of genes enriched in these 50 sets is 769, with 217 genes shared in two or more gene sets. Twenty genes were enriched in the CLL BCR signature, CLLBCRUP-1 [Herishanu Y, et al. Blood 2011]. Of these, WARS, IRF4, MX1, OAS1, and NAMPT are also enriched in the T cell cytokine induced and T cell activation signatures. Only one gene set was enriched in the Ki-67low group, CLLMUT-1, upregulated in M-CLL compared to U-CLL. CD274 (PD-L1) was consistently elevated in the Ki-67low group in all the patients, irrespective of IGHV mutation status. Discussion: The observed GSEA profiles in Ki-67high patients correlated with gene signatures biased towards BCR signaling, signal transduction, and hematopoietic cancer, consistent with the Ki-67high group containing more (recently) proliferating cells influenced at least in part by BCR signaling. The profiles also suggest that additional cells (T lymphocytes and dendritic cells) may be involved. It is notable these gene sets were not observed for CLL patients subgrouped by IGHV mutation status or by CD38, and that these other subsets did not show as pronounced a distinction by GSEA profiling. Disclosures: No relevant conflicts of interest to declare.

Download Full-text

Towards a gold standard for benchmarking gene set enrichment analysis

10.1101/674267 ◽

2019 ◽

Cited By ~ 1

Author(s):

Ludwig Geistlinger ◽

Gergely Csaba ◽

Mara Santarelli ◽

Marcel Ramos ◽

Lucas Schiffer ◽

...

Keyword(s):

Ad Hoc ◽

Enrichment Analysis ◽

Gene Set Enrichment Analysis ◽

Data Sets ◽

Expression Data ◽

Rna Seq ◽

Gene Set Enrichment ◽

Gene Set ◽

Gene Sets ◽

Enrichment Methods

AbstractBackgroundAlthough gene set enrichment analysis has become an integral part of high-throughput gene expression data analysis, the assessment of enrichment methods remains rudimentary and ad hoc. In the absence of suitable gold standards, evaluations are commonly restricted to selected data sets and biological reasoning on the relevance of resulting enriched gene sets. However, this is typically incomplete and biased towards the goals of individual investigations.ResultsWe present a general framework for standardized and structured benchmarking of enrichment methods based on defined criteria for applicability, gene set prioritization, and detection of relevant processes. This framework incorporates a curated compendium of 75 expression data sets investigating 42 different human diseases. The compendium features microarray and RNA-seq measurements, and each dataset is associated with a precompiled GO/KEGG relevance ranking for the corresponding disease under investigation. We perform a comprehensive assessment of 10 major enrichment methods on the benchmark compendium, identifying significant differences in (i) runtime and applicability to RNA-seq data, (ii) fraction of enriched gene sets depending on the type of null hypothesis tested, and (iii) recovery of the a priori defined relevance rankings. Based on these findings, we make practical recommendations on (i) how methods originally developed for microarray data can efficiently be applied to RNA-seq data, (ii) how to interpret results depending on the type of gene set test conducted, and (iii) which methods are best suited to effectively prioritize gene sets with high relevance for the phenotype investigated.ConclusionWe carried out a systematic assessment of existing enrichment methods, and identified best performing methods, but also general shortcomings in how gene set analysis is currently conducted. We provide a directly executable benchmark system for straightforward assessment of additional enrichment methods.Availabilityhttp://bioconductor.org/packages/GSEABenchmarkeR

Download Full-text

Gene Expression Profiling of Age-Related Epstein-Barr Virus (EBV)-Associated B-Cell Lymphoproliferative Disorder Uncovers Alterations in Immune and Inflammatory Genes: Possible Implications for Pathogenesis,

Blood ◽

10.1182/blood.v118.21.3448.3448 ◽

2011 ◽

Vol 118 (21) ◽

pp. 3448-3448

Author(s):

Harumi Kato ◽

Kazuhito Yamamoto ◽

Kennosuke Karube ◽

Miyuki Katayama ◽

Shinobu Tsuzuki ◽

...

Keyword(s):

Inflammatory Responses ◽

Enrichment Analysis ◽

Gene Set Enrichment Analysis ◽

Enrichment Score ◽

P Value ◽

Toll Like Receptor ◽

Gene Set Enrichment ◽

Gene Set ◽

Age Related ◽

Gene Sets

Abstract Abstract 3448 Age-related EBV-associated B-cell lymphoproliferative disorder (AR-EBLPD) is classified as a subtype of diffuse large cell lymphoma (DLBCL) according to the WHO classification. However, molecular genetic characterization of AR-EBLPD remains largely unknown. We studied expression profiles of 5 AR-EBLPD and 8 EB-negative DLBCL samples using the Agilent 44K human oligonucleotide microarray. Total RNA was extracted from fresh-frozen tumor samples. Each microarray slide was converted into datasets using the Agilent Micro Array Scanner and Feature extractions. Data was standardized with Z-scores. Differences in mRNA expression levels between two sample groups were calculated using a two-sided t-test. A total of 1973 probes showed a p-value less than 0.05 with less than a 25% false discovery rate (FDR). These probes included 1688 genes. The number of probes showing high expression in AR-EBLPD and EB-negative DLBCL was 804 (693 genes) and 1169 (995 genes), respectively. First, we selected the top 300 differentially expressed genes. Genes highly expressed in AR-EBLPD included IL6, TNFAIP3, HOPX, and SLAMF1. IL6 is known as a gene encoding a cytokine which functions in inflammation and the maturation of B lymphocytes, and TNFAIP3 is known as a negative regulatory gene of the NF-kB pathway. HOPX and SLAMF1 are reported as genes related to lymphocyte function or the immune system (Schwartzberg et al. Nature immunology 2009, Hawiger et al. Nature immunology 2011). For better characterization, we next performed Gene Ontology Analysis using the WEB-based GEne SeT AnaLysis Toolkit and found that categories of external stimulus and inflammatory responses were enriched in AR-EBLPD. The Kyoto Encyclopedia of Genes and Genomes (KEGG)-signaling analyses showed that pathways of the NOD-like receptor (p-value =1.30e-06), JAK-STAT (p-value =9.01e-06), and Toll-like receptor (p-value =0.0002) were characteristic of AR-EBLPD. These results implied that inflammation would be prominent in AR-EBLPD cases. For validation, we next performed Gene Set Enrichment Analysis (GSEA) using all the database of KEGG pathways (186 gene sets). Dominant gene sets in AR-EBLPD included the cytokine-cytokine receptor interaction [Normalized Enrichment Score (NES) =2.66, p-value<0.001], NOD-like receptor pathway (NES =2.26, p-value<0.001), TOLL-like receptor pathway (NES =2.14, p-value<0.001), and JAK-STAT pathway (NES =1.79, p-value<0.001). Since all the pathways were related to the NF-kB pathway, inflammatory responses were suggested to activate the NF-kB pathway or vice versa. For confirmation, we finally performed GSEA using gene sets of the NF-kB pathway, which were obtained from a gene set reported by an NIH group (Puente et al. Nature 2011) and 30 gene sets in the GSEA database, and found that the gene sets of the NF-kB pathway were enriched in AR-EBLPD (Figure 1). Our results suggested that the inflammatory and immune-related genes were enriched in AR-EBLPD and that activation of the genes may be associated with NF-kB activation. Aberrant immune and inflammatory responses could define the clinical presentations of AR-EBLPD cases. (Figure 1) Gene Set Enrichment Analysis of 5 AR-EBLPD and 8 EB-negative DLBCL samples. The NF-kB signature reported from an NIH group (Puente et al. Nature 2011) was enriched in AR-EBLPD [Normalized Enrichment Score (NES) =2.20, p-value<0.001]. Disclosures: No relevant conflicts of interest to declare.

Download Full-text

Gene Set Enrichment Analysis of Selenium-Deficient and High-Selenium Rat Liver Transcript Expression and Comparison With Turkey Liver Expression

Journal of Nutrition ◽

10.1093/jn/nxaa333 ◽

2020 ◽

Author(s):

Roger A Sunde

Keyword(s):

Rat Liver ◽

Enrichment Analysis ◽

Basal Diet ◽

Gene Set Enrichment Analysis ◽

Transcript Expression ◽

Gene Set Enrichment ◽

Gene Set ◽

Gene Sets ◽

Single Transcript ◽

Se Status

ABSTRACT Background Better biomarkers of selenium (Se) status and a better understanding of toxic Se biochemistry are needed to set safe dietary upper limits. In previous studies, differential expression (DE) of individual liver transcripts in rats and turkeys failed to identify a single transcript that was consistently and significantly (q < 0.05) altered by high Se. Objectives To evaluate the effect of Se status on rat liver transcript expression data at the level of gene sets, and to compare transcript expression in rats with that in turkeys to identify common regulated transcripts. Methods Gene set enrichment analysis (GSEA) was conducted on liver from weanling rats fed an Se-deficient basal diet (0.005 μg Se/g) supplemented with 0, 0.24 (Se-adequate), 2, or 5 μg Se/g diet as selenite for 28 d. In addition, transcript expression was compared with liver expression in turkeys fed 0, 0.4, 2, or 5 μg Se/g diet as selenite. Results Se deficiency significantly downregulated the rat selenoprotein gene set but also upregulated gene sets for a variety of pathways, processes, and disease states. GSEA of 2 compared with 0.24 μg Se/g found no significantly up- or downregulated gene sets, showing that 2 μg Se/g is not particularly toxic to the rat. GSEA analysis of 5 compared with 0.24 μg Se/g transcripts, however, found 27 significantly upregulated gene sets for a wide variety of conditions. Cross-species GSEA comparison of transcript expression, however, identified no common gene sets significantly and consistently regulated by high Se in rats and turkeys. In addition, comparison of individual marginally significant (unadjusted P < 0.05) DE transcripts between rats and turkeys also failed to find common transcripts. Conclusions The dramatic increase in significant liver transcript DE and GSEA gene sets in rats fed 5 compared with 2 μg Se/g clearly appears to be a biomarker for Se toxicity, albeit not Se-specific. These analyses, however, failed to identify specific transcripts or pathways, biological states, or processes that were directly linked with high Se status, strongly indicating that adaptation to high Se lies outside transcriptional regulation.

Download Full-text

XGSEA: CROSS-species Gene Set Enrichment Analysis via domain adaptation

10.1101/2020.07.21.213645 ◽

2020 ◽

Author(s):

Menglan Cai ◽

Canh Hao Nguyen ◽

Hiroshi Mamitsuka ◽

Limin Li

Keyword(s):

Gene Expression ◽

Domain Adaptation ◽

Gene Knockout ◽

Enrichment Analysis ◽

Real Data ◽

Gene Set Enrichment Analysis ◽

Data Sets ◽

Gene Set Enrichment ◽

Gene Set ◽

Gene Sets

AbstractGene set enrichment analysis (GSEA) has been widely used to identify gene sets with statistically significant difference between cases and controls against a large gene set. GSEA needs both phenotype labels and expression of genes. However, gene expression are assessed more often for model organisms than minor species. More importantly, gene expression could not be measured under specific conditions for human, due to high healthy risk of direct experiments, such as non-approved treatment or gene knockout, and then often substituted by mouse. Thus predicting enrichment significance (on a phenotype) of a given gene set of a species (target, say human), by using gene expression measured under the same phenotype of the other species (source, say mouse) is a vital and challenging problem, which we call CROSS-species Gene Set Enrichment Problem (XGSEP). For XGSEP, we propose XGSEA (Cross-species Gene Set Enrichment Analysis), with three steps of: 1) running GSEA for a source species to obtain enrichment scores and p-values of source gene sets; 2) representing the relation between source and target gene sets by domain adaptation; and 3) using regression to predict p-values of target gene sets, based on the representation in 2). We extensively validated XGSEA by using four real data sets under various settings, proving that XGSEA significantly outperformed three baseline methods. A case study of identifying important human pathways for T cell dysfunction and reprogramming from mouse ATAC-Seq data further confirmed the reliability of XGSEA. Source code is available through https://github.com/LiminLi-xjtu/XGSEAAuthor summaryGene set enrichment analysis (GSEA) is a powerful tool in the gene sets differential analysis given a ranked gene list. GSEA requires complete data, gene expression with phenotype labels. However, gene expression could not be measured under specific conditions for human, due to high risk of direct experiments, such as non-approved treatment or gene knockout, and then often substituted by mouse. Thus no availability of gene expression leads to more challenging problem, CROSS-species Gene Set Enrichment Problem (XGSEP), in which enrichment significance (on a phenotype) of a given gene set of a species (target, say human) is predicted by using gene expression measured under the same phenotype of the other species (source, say mouse). In this work, we propose XGSEA (Cross-species Gene Set Enrichment Analysis) for XGSEP, with three steps of: 1) GSEA; 2) domain adaptation; and 3) regression. The results of four real data sets and a case study indicate that XGSEA significantly outperformed three baseline methods and confirmed the reliability of XGSEA.

Download Full-text