XGSEA: CROSS-species Gene Set Enrichment Analysis via domain adaptation

AbstractGene set enrichment analysis (GSEA) has been widely used to identify gene sets with statistically significant difference between cases and controls against a large gene set. GSEA needs both phenotype labels and expression of genes. However, gene expression are assessed more often for model organisms than minor species. More importantly, gene expression could not be measured under specific conditions for human, due to high healthy risk of direct experiments, such as non-approved treatment or gene knockout, and then often substituted by mouse. Thus predicting enrichment significance (on a phenotype) of a given gene set of a species (target, say human), by using gene expression measured under the same phenotype of the other species (source, say mouse) is a vital and challenging problem, which we call CROSS-species Gene Set Enrichment Problem (XGSEP). For XGSEP, we propose XGSEA (Cross-species Gene Set Enrichment Analysis), with three steps of: 1) running GSEA for a source species to obtain enrichment scores and p-values of source gene sets; 2) representing the relation between source and target gene sets by domain adaptation; and 3) using regression to predict p-values of target gene sets, based on the representation in 2). We extensively validated XGSEA by using four real data sets under various settings, proving that XGSEA significantly outperformed three baseline methods. A case study of identifying important human pathways for T cell dysfunction and reprogramming from mouse ATAC-Seq data further confirmed the reliability of XGSEA. Source code is available through https://github.com/LiminLi-xjtu/XGSEAAuthor summaryGene set enrichment analysis (GSEA) is a powerful tool in the gene sets differential analysis given a ranked gene list. GSEA requires complete data, gene expression with phenotype labels. However, gene expression could not be measured under specific conditions for human, due to high risk of direct experiments, such as non-approved treatment or gene knockout, and then often substituted by mouse. Thus no availability of gene expression leads to more challenging problem, CROSS-species Gene Set Enrichment Problem (XGSEP), in which enrichment significance (on a phenotype) of a given gene set of a species (target, say human) is predicted by using gene expression measured under the same phenotype of the other species (source, say mouse). In this work, we propose XGSEA (Cross-species Gene Set Enrichment Analysis) for XGSEP, with three steps of: 1) GSEA; 2) domain adaptation; and 3) regression. The results of four real data sets and a case study indicate that XGSEA significantly outperformed three baseline methods and confirmed the reliability of XGSEA.

Download Full-text

Towards a gold standard for benchmarking gene set enrichment analysis

10.1101/674267 ◽

2019 ◽

Cited By ~ 1

Author(s):

Ludwig Geistlinger ◽

Gergely Csaba ◽

Mara Santarelli ◽

Marcel Ramos ◽

Lucas Schiffer ◽

...

Keyword(s):

Ad Hoc ◽

Enrichment Analysis ◽

Gene Set Enrichment Analysis ◽

Data Sets ◽

Expression Data ◽

Rna Seq ◽

Gene Set Enrichment ◽

Gene Set ◽

Gene Sets ◽

Enrichment Methods

AbstractBackgroundAlthough gene set enrichment analysis has become an integral part of high-throughput gene expression data analysis, the assessment of enrichment methods remains rudimentary and ad hoc. In the absence of suitable gold standards, evaluations are commonly restricted to selected data sets and biological reasoning on the relevance of resulting enriched gene sets. However, this is typically incomplete and biased towards the goals of individual investigations.ResultsWe present a general framework for standardized and structured benchmarking of enrichment methods based on defined criteria for applicability, gene set prioritization, and detection of relevant processes. This framework incorporates a curated compendium of 75 expression data sets investigating 42 different human diseases. The compendium features microarray and RNA-seq measurements, and each dataset is associated with a precompiled GO/KEGG relevance ranking for the corresponding disease under investigation. We perform a comprehensive assessment of 10 major enrichment methods on the benchmark compendium, identifying significant differences in (i) runtime and applicability to RNA-seq data, (ii) fraction of enriched gene sets depending on the type of null hypothesis tested, and (iii) recovery of the a priori defined relevance rankings. Based on these findings, we make practical recommendations on (i) how methods originally developed for microarray data can efficiently be applied to RNA-seq data, (ii) how to interpret results depending on the type of gene set test conducted, and (iii) which methods are best suited to effectively prioritize gene sets with high relevance for the phenotype investigated.ConclusionWe carried out a systematic assessment of existing enrichment methods, and identified best performing methods, but also general shortcomings in how gene set analysis is currently conducted. We provide a directly executable benchmark system for straightforward assessment of additional enrichment methods.Availabilityhttp://bioconductor.org/packages/GSEABenchmarkeR

Download Full-text

Application of biclustering of gene expression data and gene set enrichment analysis methods to identify potentially disease causing nanomaterials

Beilstein Journal of Nanotechnology ◽

10.3762/bjnano.6.252 ◽

2015 ◽

Vol 6 ◽

pp. 2438-2448 ◽

Cited By ~ 14

Author(s):

Andrew Williams ◽

Sabina Halappanavar

Keyword(s):

Gene Expression ◽

Pulmonary Fibrosis ◽

Expression Profiles ◽

Enrichment Analysis ◽

Gene Set Enrichment Analysis ◽

Data Driven ◽

Gene Set Enrichment ◽

Gene Set ◽

Analysis Methods ◽

Gene Sets

Background: The presence of diverse types of nanomaterials (NMs) in commerce is growing at an exponential pace. As a result, human exposure to these materials in the environment is inevitable, necessitating the need for rapid and reliable toxicity testing methods to accurately assess the potential hazards associated with NMs. In this study, we applied biclustering and gene set enrichment analysis methods to derive essential features of altered lung transcriptome following exposure to NMs that are associated with lung-specific diseases. Several datasets from public microarray repositories describing pulmonary diseases in mouse models following exposure to a variety of substances were examined and functionally related biclusters of genes showing similar expression profiles were identified. The identified biclusters were then used to conduct a gene set enrichment analysis on pulmonary gene expression profiles derived from mice exposed to nano-titanium dioxide (nano-TiO2), carbon black (CB) or carbon nanotubes (CNTs) to determine the disease significance of these data-driven gene sets. Results: Biclusters representing inflammation (chemokine activity), DNA binding, cell cycle, apoptosis, reactive oxygen species (ROS) and fibrosis processes were identified. All of the NM studies were significant with respect to the bicluster related to chemokine activity (DAVID; FDR p-value = 0.032). The bicluster related to pulmonary fibrosis was enriched in studies where toxicity induced by CNT and CB studies was investigated, suggesting the potential for these materials to induce lung fibrosis. The pro-fibrogenic potential of CNTs is well established. Although CB has not been shown to induce fibrosis, it induces stronger inflammatory, oxidative stress and DNA damage responses than nano-TiO2 particles. Conclusion: The results of the analysis correctly identified all NMs to be inflammogenic and only CB and CNTs as potentially fibrogenic. In addition to identifying several previously defined, functionally relevant gene sets, the present study also identified two novel genes sets: a gene set associated with pulmonary fibrosis and a gene set associated with ROS, underlining the advantage of using a data-driven approach to identify novel, functionally related gene sets. The results can be used in future gene set enrichment analysis studies involving NMs or as features for clustering and classifying NMs of diverse properties.

Download Full-text

Comprehensive gene expression profiling of rat lung reveals distinct acute and chronic responses to cigarette smoke inhalation

AJP Lung Cellular and Molecular Physiology ◽

10.1152/ajplung.00105.2007 ◽

2007 ◽

Vol 293 (5) ◽

pp. L1183-L1193 ◽

Cited By ~ 65

Author(s):

Christopher S. Stevenson ◽

Cerys Docx ◽

Ruth Webster ◽

Cliff Battram ◽

Debra Hynx ◽

...

Keyword(s):

Gene Expression ◽

Stress Response ◽

Cigarette Smoke ◽

Enrichment Analysis ◽

Gene Set Enrichment Analysis ◽

Smoke Inhalation ◽

Whole Body ◽

Gene Set Enrichment ◽

Gene Set ◽

Gene Sets

Chronic obstructive pulmonary disease (COPD) is a smoking-related disease that lacks effective therapies due partly to the poor understanding of disease pathogenesis. The aim of this study was to identify molecular pathways that could be responsible for the damaging consequences of smoking. To do this, we employed Gene Set Enrichment Analysis to analyze differences in global gene expression, which we then related to the pathological changes induced by cigarette smoke (CS). Sprague-Dawley rats were exposed to whole body CS for 1 day and for various periods up to 8 mo. Gene Set Enrichment Analysis of microarray data identified that metabolic processes were most significantly increased early in the response to CS. Gene sets involved in stress response and inflammation were also upregulated. CS exposure increased neutrophil chemokines, cytokines, and proteases (MMP-12) linked to the pathogenesis of COPD. After a transient acute response, the CS-exposed rats developed a distinct molecular signature after 2 wk, which was followed by the chronic phase of the response. During this phase, gene sets related to immunity and defense progressively increased and predominated at the later time points in smoke-exposed rats. Chronic CS inhalation recapitulated many of the phenotypic changes observed in COPD patients including oxidative damage to macrophages, a slowly resolving inflammation, epithelial damage, mucus hypersecretion, airway fibrosis, and emphysema. As such, it appears that metabolic pathways are central to dealing with the stress of CS exposure; however, over time, inflammation and stress response gene sets become the most significantly affected in the chronic response to CS.

Download Full-text

Higher Acid-Base Imbalance Associated with Respiratory Failure Could Decrease the Survival of Patients with Scrub Typhus during Intensive Care Unit Stay: A Gene Set Enrichment Analysis

Journal of Clinical Medicine ◽

10.3390/jcm8101580 ◽

2019 ◽

Vol 8 (10) ◽

pp. 1580 ◽

Cited By ~ 1

Author(s):

Kyoung Min Moon ◽

Kyueng-Whan Min ◽

Mi-Hye Kim ◽

Dong-Hoon Kim ◽

Byoung Kwan Son ◽

...

Keyword(s):

Intensive Care Unit ◽

Intensive Care ◽

Respiratory Failure ◽

Scrub Typhus ◽

Enrichment Analysis ◽

Gene Set Enrichment Analysis ◽

Acid Base ◽

Gene Set Enrichment ◽

Gene Set ◽

Gene Sets

Ninety percent of patients with scrub typhus (SC) with vasculitis-like syndrome recover after mild symptoms; however, 10% can suffer serious complications, such as acute respiratory failure (ARF) and admission to the intensive care unit (ICU). Predictors for the progression of SC have not yet been established, and conventional scoring systems for ICU patients are insufficient to predict severity. We aimed to identify simple and robust indicators to predict aggressive behaviors of SC. We evaluated 91 patients with SC and 81 non-SC patients who were admitted to the ICU, and 32 cases from the public functional genomics data repository for gene expression analysis. We analyzed the relationships between several predictors and clinicopathological characteristics in patients with SC. We performed gene set enrichment analysis (GSEA) to identify SC-specific gene sets. The acid-base imbalance (ABI), measured 24 h before serious complications, was higher in patients with SC than in non-SC patients. A high ABI was associated with an increased incidence of ARF, leading to mechanical ventilation and worse survival. GSEA revealed that SC correlated to gene sets reflecting inflammation/apoptotic response and airway inflammation. ABI can be used to indicate ARF in patients with SC and assist with early detection.

Download Full-text

Drug perturbation gene set enrichment analysis (dpGSEA): a new transcriptomic drug screening approach

BMC Bioinformatics ◽

10.1186/s12859-020-03929-0 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Mike Fang ◽

Brian Richardson ◽

Cheryl M. Cameron ◽

Jean-Eudes Dazard ◽

Mark J. Cameron

Keyword(s):

Drug Targets ◽

T Regulatory Cells ◽

Enrichment Analysis ◽

Gene Set Enrichment Analysis ◽

Regulatory Cells ◽

Gene Set Enrichment ◽

Gene Set ◽

Gene Sets ◽

Gastroenteropancreatic Neuroendocrine Tumor ◽

Public Datasets

Abstract Background In this study, we demonstrate that our modified Gene Set Enrichment Analysis (GSEA) method, drug perturbation GSEA (dpGSEA), can detect phenotypically relevant drug targets through a unique transcriptomic enrichment that emphasizes biological directionality of drug-derived gene sets. Results We detail our dpGSEA method and show its effectiveness in detecting specific perturbation of drugs in independent public datasets by confirming fluvastatin, paclitaxel, and rosiglitazone perturbation in gastroenteropancreatic neuroendocrine tumor cells. In drug discovery experiments, we found that dpGSEA was able to detect phenotypically relevant drug targets in previously published differentially expressed genes of CD4+T regulatory cells from immune responders and non-responders to antiviral therapy in HIV-infected individuals, such as those involved with virion replication, cell cycle dysfunction, and mitochondrial dysfunction. dpGSEA is publicly available at https://github.com/sxf296/drug_targeting. Conclusions dpGSEA is an approach that uniquely enriches on drug-defined gene sets while considering directionality of gene modulation. We recommend dpGSEA as an exploratory tool to screen for possible drug targeting molecules.

Download Full-text

Weighted Kolmogorov Smirnov testing: an alternative for Gene Set Enrichment Analysis

Statistical Applications in Genetics and Molecular Biology ◽

10.1515/sagmb-2014-0077 ◽

2015 ◽

Vol 14 (3) ◽

Cited By ~ 13

Author(s):

Konstantina Charmpi ◽

Bernard Ycart

Keyword(s):

Weight Function ◽

Null Hypothesis ◽

Computing Time ◽

Enrichment Analysis ◽

Gene Set Enrichment Analysis ◽

Test Statistic ◽

Gene Set Enrichment ◽

Gene Set ◽

Gene Sets ◽

Kolmogorov Smirnov

AbstractGene Set Enrichment Analysis (GSEA) is a basic tool for genomic data treatment. Its test statistic is based on a cumulated weight function, and its distribution under the null hypothesis is evaluated by Monte-Carlo simulation. Here, it is proposed to subtract to the cumulated weight function its asymptotic expectation, then scale it. Under the null hypothesis, the convergence in distribution of the new test statistic is proved, using the theory of empirical processes. The limiting distribution needs to be computed only once, and can then be used for many different gene sets. This results in large savings in computing time. The test defined in this way has been called Weighted Kolmogorov Smirnov (WKS) test. Using expression data from the GEO repository, tested against the MSig Database C2, a comparison between the classical GSEA test and the new procedure has been conducted. Our conclusion is that, beyond its mathematical and algorithmic advantages, the WKS test could be more informative in many cases, than the classical GSEA test.

Download Full-text

Revealing Biological Pathways Implicated in Lung Cancer from TCGA Gene Expression Data Using Gene Set Enrichment Analysis

Cancer Informatics ◽

10.4137/cin.s13882 ◽

2014 ◽

Vol 13s1 ◽

pp. CIN.S13882 ◽

Cited By ~ 4

Author(s):

Binghuang Cai ◽

Xia Jiang

Keyword(s):

Gene Expression ◽

Lung Cancer ◽

Gene Expression Data ◽

Lung Squamous Cell Carcinoma ◽

Enrichment Analysis ◽

Gene Set Enrichment Analysis ◽

Expression Data ◽

Gene Set Enrichment ◽

Gene Set ◽

Pathway Gene

Analyzing biological system abnormalities in cancer patients based on measures of biological entities, such as gene expression levels, is an important and challenging problem. This paper applies existing methods, Gene Set Enrichment Analysis and Signaling Pathway Impact Analysis, to pathway abnormality analysis in lung cancer using microarray gene expression data. Gene expression data from studies of Lung Squamous Cell Carcinoma (LUSC) in The Cancer Genome Atlas project, and pathway gene set data from the Kyoto Encyclopedia of Genes and Genomes were used to analyze the relationship between pathways and phenotypes. Results, in the form of pathway rankings, indicate that some pathways may behave abnormally in LUSC. For example, both the cell cycle and viral carcinogenesis pathways ranked very high in LUSC. Furthermore, some pathways that are known to be associated with cancer, such as the p53 and the PI3K-Akt signal transduction pathways, were found to rank high in LUSC. Other pathways, such as bladder cancer and thyroid cancer pathways, were also ranked high in LUSC.

Download Full-text

Differential gene expression in prostate tissue according to vasectomy.

Journal of Clinical Oncology ◽

10.1200/jco.2016.34.2_suppl.298 ◽

2016 ◽

Vol 34 (2_suppl) ◽

pp. 298-298

Author(s):

Kathryn M Wilson ◽

Travis Gerke ◽

Ericka Ebot ◽

Jennifer A Sinnott ◽

Jennifer R. Rider ◽

...

Keyword(s):

Gene Expression ◽

Prostate Cancer ◽

Cancer Diagnosis ◽

Enrichment Analysis ◽

Gene Set Enrichment Analysis ◽

Prostate Tissue ◽

Gene Set Enrichment ◽

Normal Prostate ◽

Gene Sets ◽

Normal Prostate Tissue

298 Background: We previously found that vasectomy was associated with an increased risk of prostate cancer, and particularly, risk of lethal prostate cancer in the Health Professionals Follow-up Study (HPFS). However, the possible biological basis for this finding is unclear. In this study, we explored possible biological mechanisms by assessing differences in gene expression in the prostate tissue of men with and without a history of vasectomy prostate cancer diagnosis. Methods: Within the HPFS, vasectomy data and gene expression data (20,254 genes) was available from archival tumor tissue from 263 cases, 124 of whom also had data for adjacent normal tissue. To relate expression of individual genes to vasectomy we used linear regression adjusting for age and year at diagnosis. We ran gene set enrichment analysis to identify pathways of genes associated with vasectomy. Results: Among 263 cases, 67 (25%) reported a vasectomy prior to cancer diagnosis. Mean age at diagnosis was 66 years among men without and 65 years among men with vasectomy. Median time between vasectomy and prostate cancer diagnosis was 25 years. Gene expression in tumor tissue was not associated with vasectomy status. In adjacent normal tissue, three individual genes were associated with vasectomy with Bonferroni-corrected p-values of < 0.10: RAPGEF6, OR4C3, and SLC35F4. Gene set enrichment analysis found five pathways upregulated and seven pathways downregulated in men with vasectomy compared to those without in normal prostate tissue with a FDR < 0.05. Upregulated pathways included several immune-related gene sets and G-protein-coupled receptor gene sets. Conclusions: We identified significant differences in gene expression profiles in normal prostate tissue according to vasectomy status among men treated for prostate cancer. The fact that such differences existed several decades after vasectomy provides support for the idea that vasectomy may play a role in the etiology of prostate cancer.

Download Full-text

GSEA-InContext Explorer: An interactive visualization tool for putting gene set enrichment analysis results into biological context

10.1101/659847 ◽

2019 ◽

Author(s):

Rani K. Powers ◽

Anthony Sun ◽

James C. Costello

Keyword(s):

Statistical Significance ◽

Null Distribution ◽

Enrichment Analysis ◽

Gene Set Enrichment Analysis ◽

Gene Set Enrichment ◽

Gene Set ◽

Link Type ◽

Interactive Interface ◽

Gene Sets ◽

Shiny App

AbstractSummaryGSEA-InContext Explorer is a Shiny app that allows users to perform two methods of gene set enrichment analysis (GSEA). The first, GSEAPreranked, applies the GSEA algorithm in which statistical significance is estimated from a null distribution of enrichment scores generated for randomly permuted gene sets. The second, GSEA-InContext, incorporates a user-defined set of background experiments to define the null distribution and calculate statistical significance. GSEA-InContext Explorer allows the user to build custom background sets from a compendium of over 5,700 curated experiments, run both GSEAPreranked and GSEA-InContext on their own uploaded experiment, and explore the results using an interactive interface. This tool will allow researchers to visualize gene sets that are commonly enriched across experiments and identify gene sets that are uniquely significant in their experiment, thus complementing current methods for interpreting gene set enrichment results.Availability and implementationThe code for GSEA-InContext Explorer is available at: https://github.com/CostelloLab/GSEA-InContext_Explorer and the interactive tool is at: http://gsea-incontext_explorer.ngrok.io

Download Full-text

Alterations in the host transcriptome in vitro and in vivo following severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) infection

10.21203/rs.3.rs-37567/v1 ◽

2020 ◽

Author(s):

Xiaomei Lei ◽

Zhijun Feng ◽

Xiaojun Wang ◽

Xiaodong He

Keyword(s):

Gene Expression ◽

Cell Cycle ◽

Microarray Data ◽

Molecular Mechanisms ◽

Enrichment Analysis ◽

Gene Set Enrichment Analysis ◽

Gene Set ◽

Gene Sets ◽

Core Genes

Abstract Background. Exploring alterations in the host transcriptome following SARS-CoV-2 infection is not only highly warranted to help us understand molecular mechanisms of the disease, but also provide new prospective for screening effective antiviral drugs, finding new therapeutic targets, and evaluating the risk of systemic inflammatory response syndrome (SIRS) early.Methods. We downloaded three gene expression matrix files from the Gene Expression Omnibus (GEO) database, and extracted the gene expression data of the SARS-CoV-2 infection and non-infection in human samples and different cell line samples, and then performed gene set enrichment analysis (GSEA), respectively. Thereafter, we integrated the results of GSEA and obtained co-enriched gene sets and co-core genes in three various microarray data. Finally, we also constructed a protein-protein interaction (PPI) network and molecular modules for co-core genes and performed Kyoto encyclopedia of genes and genomes (KEGG) pathway analysis for the genes from modules to clarify their possible biological processes and underlying signaling pathway. Results. A total of 11 co-enriched gene sets were identiﬁed from the three various microarray data. Among them, 10 gene sets were activated, and involved in immune response and inflammatory reaction. 1 gene set was suppressed, and participated in cell cycle. The analysis of molecular modules showed that 2 modules might play a vital role in the pathogenic process of SARS-CoV-2 infection. The KEGG enrichment analysis showed that genes from module one enriched in signaling pathways related to inflammation, but genes from module two enriched in signaling of cell cycle and DNA replication. Particularly, necroptosis signaling, a newly identified type of programmed cell death that differed from apoptosis, was also determined in our findings. Additionally, for patients with SARS-CoV-2 infection, genes from module one showed a relatively high-level expression while genes from module two showed low-level. Conclusions. We identified two molecular modules were used to assess severity and predict the prognosis of the patients with SARS-CoV-2 infection. In addition, these results provide a unique opportunity to explore more molecular pathways as new potential targets on therapy in COVID 19.

Download Full-text