NGSEA: network-based gene set enrichment analysis for interpreting gene expression phenotypes with functional gene sets

Mapping Intimacies ◽

10.1101/636498 ◽

2019 ◽

Cited By ~ 1

Author(s):

Heonjong Han ◽

Sangyoung Lee ◽

Insuk Lee

Keyword(s):

Gene Expression ◽

Enrichment Analysis ◽

Functional Gene ◽

Gene Set Enrichment Analysis ◽

Clinical Samples ◽

Functional Genes ◽

Biological Processes ◽

Gene Set Enrichment ◽

Gene Sets ◽

Anti Cancer

ABSTRACTGene set enrichment analysis (GSEA) is a popular tool to identify underlying biological processes in clinical samples using their gene expression phenotypes. GSEA measures the enrichment of annotated gene sets that represent biological processes for differentially expressed genes (DEGs) in clinical samples. GSEA may be suboptimal for functional gene sets, however, because DEGs from the expression dataset may not be functional genes per se but dysregulated genes perturbed by bona fide functional genes. To overcome this shortcoming, we developed network-based GSEA (NGSEA), which measures the enrichment score of functional gene sets using the expression difference of not only individual genes but also their neighbors in the functional network. We found that NGSEA outperformed GSEA in identifying pathway gene sets for matched gene expression phenotypes. We also observed that NGSEA substantially improved the ability to retrieve known anti-cancer drugs from patient-derived gene expression data using drug-target gene sets compared with another method, Connectivity Map. We also repurposed FDA-approved drugs using NGSEA and experimentally validated budesonide as a chemical with anti-cancer effects for colorectal cancer. We, therefore, expect that NGSEA will facilitate both pathway interpretation of gene expression phenotypes and anti-cancer drug repositioning. NGSEA is freely available at www.inetbio.org/ngsea.

Differential gene expression in prostate tissue according to vasectomy.

Journal of Clinical Oncology ◽

10.1200/jco.2016.34.2_suppl.298 ◽

2016 ◽

Vol 34 (2_suppl) ◽

pp. 298-298

Author(s):

Kathryn M Wilson ◽

Travis Gerke ◽

Ericka Ebot ◽

Jennifer A Sinnott ◽

Jennifer R. Rider ◽

...

Keyword(s):

Gene Expression ◽

Prostate Cancer ◽

Cancer Diagnosis ◽

Enrichment Analysis ◽

Gene Set Enrichment Analysis ◽

Prostate Tissue ◽

Gene Set Enrichment ◽

Normal Prostate ◽

Gene Sets ◽

Normal Prostate Tissue

298 Background: We previously found that vasectomy was associated with an increased risk of prostate cancer, and particularly, risk of lethal prostate cancer in the Health Professionals Follow-up Study (HPFS). However, the possible biological basis for this finding is unclear. In this study, we explored possible biological mechanisms by assessing differences in gene expression in the prostate tissue of men with and without a history of vasectomy prostate cancer diagnosis. Methods: Within the HPFS, vasectomy data and gene expression data (20,254 genes) was available from archival tumor tissue from 263 cases, 124 of whom also had data for adjacent normal tissue. To relate expression of individual genes to vasectomy we used linear regression adjusting for age and year at diagnosis. We ran gene set enrichment analysis to identify pathways of genes associated with vasectomy. Results: Among 263 cases, 67 (25%) reported a vasectomy prior to cancer diagnosis. Mean age at diagnosis was 66 years among men without and 65 years among men with vasectomy. Median time between vasectomy and prostate cancer diagnosis was 25 years. Gene expression in tumor tissue was not associated with vasectomy status. In adjacent normal tissue, three individual genes were associated with vasectomy with Bonferroni-corrected p-values of < 0.10: RAPGEF6, OR4C3, and SLC35F4. Gene set enrichment analysis found five pathways upregulated and seven pathways downregulated in men with vasectomy compared to those without in normal prostate tissue with a FDR < 0.05. Upregulated pathways included several immune-related gene sets and G-protein-coupled receptor gene sets. Conclusions: We identified significant differences in gene expression profiles in normal prostate tissue according to vasectomy status among men treated for prostate cancer. The fact that such differences existed several decades after vasectomy provides support for the idea that vasectomy may play a role in the etiology of prostate cancer.

Fast gene set enrichment analysis

10.1101/060012 ◽

2016 ◽

Cited By ~ 218

Author(s):

Gennady Korotkevich ◽

Vladimir Sukhov ◽

Alexey Sergushichev

Keyword(s):

Gene Expression ◽

Gene Expression Data ◽

Polynomial Algorithm ◽

Enrichment Analysis ◽

Gene Set Enrichment Analysis ◽

Biological Processes ◽

Expression Data ◽

Gene Set Enrichment ◽

P Values ◽

Gene Set

AbstractPreranked gene set enrichment analysis (GSEA) is a widely used method for interpretation of gene expression data in terms of biological processes. Here we present FGSEA method that is able to estimate arbitrarily low GSEA P-values with a higher accuracy and much faster compared to other implementations. We also present a polynomial algorithm to calculate GSEA P-values exactly, which we use to practically confirm the accuracy of the method.

XGSEA: CROSS-species Gene Set Enrichment Analysis via domain adaptation

10.1101/2020.07.21.213645 ◽

2020 ◽

Author(s):

Menglan Cai ◽

Canh Hao Nguyen ◽

Hiroshi Mamitsuka ◽

Limin Li

Keyword(s):

Gene Expression ◽

Domain Adaptation ◽

Gene Knockout ◽

Enrichment Analysis ◽

Real Data ◽

Gene Set Enrichment Analysis ◽

Data Sets ◽

Gene Set Enrichment ◽

Gene Set ◽

Gene Sets

AbstractGene set enrichment analysis (GSEA) has been widely used to identify gene sets with statistically significant difference between cases and controls against a large gene set. GSEA needs both phenotype labels and expression of genes. However, gene expression are assessed more often for model organisms than minor species. More importantly, gene expression could not be measured under specific conditions for human, due to high healthy risk of direct experiments, such as non-approved treatment or gene knockout, and then often substituted by mouse. Thus predicting enrichment significance (on a phenotype) of a given gene set of a species (target, say human), by using gene expression measured under the same phenotype of the other species (source, say mouse) is a vital and challenging problem, which we call CROSS-species Gene Set Enrichment Problem (XGSEP). For XGSEP, we propose XGSEA (Cross-species Gene Set Enrichment Analysis), with three steps of: 1) running GSEA for a source species to obtain enrichment scores and p-values of source gene sets; 2) representing the relation between source and target gene sets by domain adaptation; and 3) using regression to predict p-values of target gene sets, based on the representation in 2). We extensively validated XGSEA by using four real data sets under various settings, proving that XGSEA significantly outperformed three baseline methods. A case study of identifying important human pathways for T cell dysfunction and reprogramming from mouse ATAC-Seq data further confirmed the reliability of XGSEA. Source code is available through https://github.com/LiminLi-xjtu/XGSEAAuthor summaryGene set enrichment analysis (GSEA) is a powerful tool in the gene sets differential analysis given a ranked gene list. GSEA requires complete data, gene expression with phenotype labels. However, gene expression could not be measured under specific conditions for human, due to high risk of direct experiments, such as non-approved treatment or gene knockout, and then often substituted by mouse. Thus no availability of gene expression leads to more challenging problem, CROSS-species Gene Set Enrichment Problem (XGSEP), in which enrichment significance (on a phenotype) of a given gene set of a species (target, say human) is predicted by using gene expression measured under the same phenotype of the other species (source, say mouse). In this work, we propose XGSEA (Cross-species Gene Set Enrichment Analysis) for XGSEP, with three steps of: 1) GSEA; 2) domain adaptation; and 3) regression. The results of four real data sets and a case study indicate that XGSEA significantly outperformed three baseline methods and confirmed the reliability of XGSEA.

A method for downstream analysis of gene set enrichment results facilitates the biological interpretation of vaccine efficacy studies

10.1101/043158 ◽

2016 ◽

Author(s):

Yan Tan ◽

Jernej Godec ◽

Felix Wu ◽

Pablo Tamayo ◽

Jill P. Mesirov ◽

...

Keyword(s):

Transcriptional Response ◽

Leading Edge ◽

Enrichment Analysis ◽

Gene Set Enrichment Analysis ◽

Biological Processes ◽

Gene Set Enrichment ◽

Gene Set ◽

Gene Sets ◽

Biological Interpretation ◽

Downstream Analysis

AbstractGene set enrichment analysis (GSEA) is a widely employed method for analyzing gene expression profiles. The approach uses annotated sets of genes, identifies those that are coordinately up‐ or down-regulated in a biological comparison of interest, and thereby elucidates underlying biological processes relevant to the comparison. As the number of gene sets available in various collections for enrichment analysis has grown, the resulting lists of significant differentially regulated gene sets may also become larger, leading to the need for additional downstream analysis of GSEA results. Here we present a method that allows the rapid identification of a small number of co-regulated groups of genes – “leading edge metagenes” (LEMs) - from high scoring sets in GSEA results. LEM are sub-signatures which are common to multiple gene sets and that “explain” their enrichment specific to the experimental dataset of interest. We show that LEMs contain more refined lists of context-dependent and biologically meaningful genes than the parental gene sets. LEM analysis of the human vaccine response using a large database of immune signatures identified core biological processes induced by five different vaccines in datasets from human peripheral blood mononuclear cells (PBMC). Further study of these biological processes over time following vaccination showed that at day 3 post-vaccination, vaccines derived from viruses or viral subunits exhibit patterns of biological processes that are distinct from protein conjugate vaccines; however, by day 7 these differences were less pronounced. This suggests that the immune response to diverse vaccines eventually converge to a common transcriptional response. LEM analysis can significantly reduce the dimensionality of enriched gene sets, improve the identification of core biological processes active in a comparison of interest, and simplify the biological interpretation of GSEA results.Author SummaryGenome-wide expression profiling is a widely used tool to identify biological mechanisms in a comparison of interest. One analytic method, Gene set enrichment analysis (GSEA) uses annotated sets of genes and identifies those that are coordinately up‐ or down-regulated in a biological comparison of interest. This approach capitalizes on the fact that alternations in biological processes often cause the coordinated change of a large number of genes. However, as the number of gene sets available in various collections for enrichment analysis has grown, the resulting lists of significant differentially regulated gene sets may also become larger, leading to the need for additional downstream analysis of GSEA results. Here we present a method that allows the identification of a small number of co-regulated groups of genes – “leading edge metagenes” (LEMs) – from high scoring sets in GSEA results. We show that LEMs contain more refined lists of context-dependent biologically meaningful genes than the parental gene sets and demonstrate the utility of this approach in analyzing the transcriptional response to vaccination. LEM analysis can significantly reduce the dimensionality of enriched gene sets, improve the identification of core biological processes active in a comparison of interest, and facilitate the biological interpretation of GSEA results.

Application of biclustering of gene expression data and gene set enrichment analysis methods to identify potentially disease causing nanomaterials

Beilstein Journal of Nanotechnology ◽

10.3762/bjnano.6.252 ◽

2015 ◽

Vol 6 ◽

pp. 2438-2448 ◽

Cited By ~ 14

Author(s):

Andrew Williams ◽

Sabina Halappanavar

Keyword(s):

Gene Expression ◽

Pulmonary Fibrosis ◽

Expression Profiles ◽

Enrichment Analysis ◽

Gene Set Enrichment Analysis ◽

Data Driven ◽

Gene Set Enrichment ◽

Gene Set ◽

Analysis Methods ◽

Gene Sets

Background: The presence of diverse types of nanomaterials (NMs) in commerce is growing at an exponential pace. As a result, human exposure to these materials in the environment is inevitable, necessitating the need for rapid and reliable toxicity testing methods to accurately assess the potential hazards associated with NMs. In this study, we applied biclustering and gene set enrichment analysis methods to derive essential features of altered lung transcriptome following exposure to NMs that are associated with lung-specific diseases. Several datasets from public microarray repositories describing pulmonary diseases in mouse models following exposure to a variety of substances were examined and functionally related biclusters of genes showing similar expression profiles were identified. The identified biclusters were then used to conduct a gene set enrichment analysis on pulmonary gene expression profiles derived from mice exposed to nano-titanium dioxide (nano-TiO2), carbon black (CB) or carbon nanotubes (CNTs) to determine the disease significance of these data-driven gene sets. Results: Biclusters representing inflammation (chemokine activity), DNA binding, cell cycle, apoptosis, reactive oxygen species (ROS) and fibrosis processes were identified. All of the NM studies were significant with respect to the bicluster related to chemokine activity (DAVID; FDR p-value = 0.032). The bicluster related to pulmonary fibrosis was enriched in studies where toxicity induced by CNT and CB studies was investigated, suggesting the potential for these materials to induce lung fibrosis. The pro-fibrogenic potential of CNTs is well established. Although CB has not been shown to induce fibrosis, it induces stronger inflammatory, oxidative stress and DNA damage responses than nano-TiO2 particles. Conclusion: The results of the analysis correctly identified all NMs to be inflammogenic and only CB and CNTs as potentially fibrogenic. In addition to identifying several previously defined, functionally relevant gene sets, the present study also identified two novel genes sets: a gene set associated with pulmonary fibrosis and a gene set associated with ROS, underlining the advantage of using a data-driven approach to identify novel, functionally related gene sets. The results can be used in future gene set enrichment analysis studies involving NMs or as features for clustering and classifying NMs of diverse properties.

Comprehensive gene expression profiling of rat lung reveals distinct acute and chronic responses to cigarette smoke inhalation

AJP Lung Cellular and Molecular Physiology ◽

10.1152/ajplung.00105.2007 ◽

2007 ◽

Vol 293 (5) ◽

pp. L1183-L1193 ◽

Cited By ~ 65

Author(s):

Christopher S. Stevenson ◽

Cerys Docx ◽

Ruth Webster ◽

Cliff Battram ◽

Debra Hynx ◽

...

Keyword(s):

Gene Expression ◽

Stress Response ◽

Cigarette Smoke ◽

Enrichment Analysis ◽

Gene Set Enrichment Analysis ◽

Smoke Inhalation ◽

Whole Body ◽

Gene Set Enrichment ◽

Gene Set ◽

Gene Sets

Chronic obstructive pulmonary disease (COPD) is a smoking-related disease that lacks effective therapies due partly to the poor understanding of disease pathogenesis. The aim of this study was to identify molecular pathways that could be responsible for the damaging consequences of smoking. To do this, we employed Gene Set Enrichment Analysis to analyze differences in global gene expression, which we then related to the pathological changes induced by cigarette smoke (CS). Sprague-Dawley rats were exposed to whole body CS for 1 day and for various periods up to 8 mo. Gene Set Enrichment Analysis of microarray data identified that metabolic processes were most significantly increased early in the response to CS. Gene sets involved in stress response and inflammation were also upregulated. CS exposure increased neutrophil chemokines, cytokines, and proteases (MMP-12) linked to the pathogenesis of COPD. After a transient acute response, the CS-exposed rats developed a distinct molecular signature after 2 wk, which was followed by the chronic phase of the response. During this phase, gene sets related to immunity and defense progressively increased and predominated at the later time points in smoke-exposed rats. Chronic CS inhalation recapitulated many of the phenotypic changes observed in COPD patients including oxidative damage to macrophages, a slowly resolving inflammation, epithelial damage, mucus hypersecretion, airway fibrosis, and emphysema. As such, it appears that metabolic pathways are central to dealing with the stress of CS exposure; however, over time, inflammation and stress response gene sets become the most significantly affected in the chronic response to CS.

Utilizing Cancer - Functional Gene Set - Compound Networks to Identify Putative Drugs for Breast Cancer

Combinatorial Chemistry & High Throughput Screening ◽

10.2174/1574888x13666180105125347 ◽

2018 ◽

Vol 21 (2) ◽

pp. 74-83

Author(s):

Tzu-Hung Hsiao ◽

Yu-Chiao Chiu ◽

Yu-Heng Chen ◽

Yu-Ching Hsu ◽

Hung-I Harry Chen ◽

...

Keyword(s):

Breast Cancer ◽

Gene Expression ◽

Cancer Therapy ◽

Cancer Treatment ◽

Cancer Survival ◽

Expression Profiles ◽

Functional Gene ◽

Gene Set Enrichment Analysis ◽

Gene Set ◽

Gene Sets

Aim and Objective: The number of anticancer drugs available currently is limited, and some of them have low treatment response rates. Moreover, developing a new drug for cancer therapy is labor intensive and sometimes cost prohibitive. Therefore, “repositioning” of known cancer treatment compounds can speed up the development time and potentially increase the response rate of cancer therapy. This study proposes a systems biology method for identifying new compound candidates for cancer treatment in two separate procedures. Materials and Methods: First, a “gene set–compound” network was constructed by conducting gene set enrichment analysis on the expression profile of responses to a compound. Second, survival analyses were applied to gene expression profiles derived from four breast cancer patient cohorts to identify gene sets that are associated with cancer survival. A “cancer–functional gene set– compound” network was constructed, and candidate anticancer compounds were identified. Through the use of breast cancer as an example, 162 breast cancer survival-associated gene sets and 172 putative compounds were obtained. Results: We demonstrated how to utilize the clinical relevance of previous studies through gene sets and then connect it to candidate compounds by using gene expression data from the Connectivity Map. Specifically, we chose a gene set derived from a stem cell study to demonstrate its association with breast cancer prognosis and discussed six new compounds that can increase the expression of the gene set after the treatment. Conclusion: Our method can effectively identify compounds with a potential to be “repositioned” for cancer treatment according to their active mechanisms and their association with patients’ survival time.

Higher Acid-Base Imbalance Associated with Respiratory Failure Could Decrease the Survival of Patients with Scrub Typhus during Intensive Care Unit Stay: A Gene Set Enrichment Analysis

Journal of Clinical Medicine ◽

10.3390/jcm8101580 ◽

2019 ◽

Vol 8 (10) ◽

pp. 1580 ◽

Cited By ~ 1

Author(s):

Kyoung Min Moon ◽

Kyueng-Whan Min ◽

Mi-Hye Kim ◽

Dong-Hoon Kim ◽

Byoung Kwan Son ◽

...

Keyword(s):

Intensive Care Unit ◽

Intensive Care ◽

Respiratory Failure ◽

Scrub Typhus ◽

Enrichment Analysis ◽

Gene Set Enrichment Analysis ◽

Acid Base ◽

Gene Set Enrichment ◽

Gene Set ◽

Gene Sets

Ninety percent of patients with scrub typhus (SC) with vasculitis-like syndrome recover after mild symptoms; however, 10% can suffer serious complications, such as acute respiratory failure (ARF) and admission to the intensive care unit (ICU). Predictors for the progression of SC have not yet been established, and conventional scoring systems for ICU patients are insufficient to predict severity. We aimed to identify simple and robust indicators to predict aggressive behaviors of SC. We evaluated 91 patients with SC and 81 non-SC patients who were admitted to the ICU, and 32 cases from the public functional genomics data repository for gene expression analysis. We analyzed the relationships between several predictors and clinicopathological characteristics in patients with SC. We performed gene set enrichment analysis (GSEA) to identify SC-specific gene sets. The acid-base imbalance (ABI), measured 24 h before serious complications, was higher in patients with SC than in non-SC patients. A high ABI was associated with an increased incidence of ARF, leading to mechanical ventilation and worse survival. GSEA revealed that SC correlated to gene sets reflecting inflammation/apoptotic response and airway inflammation. ABI can be used to indicate ARF in patients with SC and assist with early detection.

Drug perturbation gene set enrichment analysis (dpGSEA): a new transcriptomic drug screening approach

BMC Bioinformatics ◽

10.1186/s12859-020-03929-0 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Mike Fang ◽

Brian Richardson ◽

Cheryl M. Cameron ◽

Jean-Eudes Dazard ◽

Mark J. Cameron

Keyword(s):

Drug Targets ◽

T Regulatory Cells ◽

Enrichment Analysis ◽

Gene Set Enrichment Analysis ◽

Regulatory Cells ◽

Gene Set Enrichment ◽

Gene Set ◽

Gene Sets ◽

Gastroenteropancreatic Neuroendocrine Tumor ◽

Public Datasets

Abstract Background In this study, we demonstrate that our modified Gene Set Enrichment Analysis (GSEA) method, drug perturbation GSEA (dpGSEA), can detect phenotypically relevant drug targets through a unique transcriptomic enrichment that emphasizes biological directionality of drug-derived gene sets. Results We detail our dpGSEA method and show its effectiveness in detecting specific perturbation of drugs in independent public datasets by confirming fluvastatin, paclitaxel, and rosiglitazone perturbation in gastroenteropancreatic neuroendocrine tumor cells. In drug discovery experiments, we found that dpGSEA was able to detect phenotypically relevant drug targets in previously published differentially expressed genes of CD4+T regulatory cells from immune responders and non-responders to antiviral therapy in HIV-infected individuals, such as those involved with virion replication, cell cycle dysfunction, and mitochondrial dysfunction. dpGSEA is publicly available at https://github.com/sxf296/drug_targeting. Conclusions dpGSEA is an approach that uniquely enriches on drug-defined gene sets while considering directionality of gene modulation. We recommend dpGSEA as an exploratory tool to screen for possible drug targeting molecules.

A Starvation-Based 9-mRNA Signature Correlates With Prognosis in Patients With Hepatocellular Carcinoma

Frontiers in Oncology ◽

10.3389/fonc.2021.716757 ◽

2021 ◽

Vol 11 ◽

Author(s):

Dengliang Lei ◽

Yue Chen ◽

Yang Zhou ◽

Gangli Hu ◽

Fang Luo

Keyword(s):

Hepatocellular Carcinoma ◽

High Risk ◽

Liver Cancer ◽

Cancer Patients ◽

Enrichment Analysis ◽

Clinical Information ◽

Cancer Genome ◽

Gene Set Enrichment Analysis ◽

Biological Processes ◽

Gene Set Enrichment

BackgroundHepatocellular carcinoma (HCC) is one of the world’s most prevalent and lethal cancers. Notably, the microenvironment of tumor starvation is closely related to cancer malignancy. Our study constructed a signature of starvation-related genes to predict the prognosis of liver cancer patients.MethodsThe mRNA expression matrix and corresponding clinical information of HCC patients were obtained from the International Cancer Genome Consortium (ICGC) and The Cancer Genome Atlas (TCGA). Gene set enrichment analysis (GSEA) was used to distinguish different genes in the hunger metabolism gene in liver cancer and adjacent tissues. Gene Set Enrichment Analysis (GSEA) was used to identify biological differences between high- and low-risk samples. Univariate and multivariate analyses were used to construct prognostic models for hunger-related genes. Kaplan-Meier (KM) and receiver-operating characteristic (ROC) were used to assess the model accuracy. The model and relevant clinical information were used to construct a nomogram, protein expression was detected by western blot (WB), and transwell assay was used to evaluate the invasive and metastatic ability of cells.ResultsFirst, we used univariate analysis to identify 35 prognostic genes, which were further demonstrated to be associated with starvation metabolism through Kyoto Encyclopedia of Genes and Genomes (KEGG) and Gene Ontology (GO). We then used multivariate analysis to build a model with nine genes. Finally, we divided the sample into low- and high-risk groups according to the median of the risk score. KM can be used to conclude that the prognosis of high- and low-risk samples is significantly different, and the prognosis of high-risk samples is worse. The prognostic accuracy of the 9-mRNA signature was also tested in the validation data set. GSEA was used to identify typical pathways and biological processes related to 9-mRNA, cell cycle, hypoxia, p53 pathway, and PI3K/AKT/mTOR pathway, as well as biological processes related to the model. As evidenced by WB, EIF2S1 expression was increased after starvation. Overall, EIF2S1 plays an important role in the invasion and metastasis of liver cancer.ConclusionsThe 9-mRNA model can serve as an accurate signature to predict the prognosis of liver cancer patients. However, its mechanism of action warrants further investigation.