scholarly journals BAGSE: a Bayesian hierarchical model approach for gene set enrichment analysis

2019 ◽  
Vol 36 (6) ◽  
pp. 1689-1695 ◽  
Author(s):  
Abhay Hukku ◽  
Corbin Quick ◽  
Francesca Luca ◽  
Roger Pique-Regi ◽  
Xiaoquan Wen

Abstract Motivation Gene set enrichment analysis has been shown to be effective in identifying relevant biological pathways underlying complex diseases. Existing approaches lack the ability to quantify the enrichment levels accurately, hence preventing the enrichment information to be further utilized in both upstream and downstream analyses. A modernized and rigorous approach for gene set enrichment analysis that emphasizes both hypothesis testing and enrichment estimation is much needed. Results We propose a novel computational method, Bayesian Analysis of Gene Set Enrichment (BAGSE), for gene set enrichment analysis. BAGSE is built on a Bayesian hierarchical model and fully accounts for the uncertainty embedded in the association evidence of individual genes. We adopt an empirical Bayes inference framework to fit the proposed hierarchical model by implementing an efficient EM algorithm. Through simulation studies, we illustrate that BAGSE yields accurate enrichment quantification while achieving similar power as the state-of-the-art methods. Further simulation studies show that BAGSE can effectively utilize the enrichment information to improve the power in gene discovery. Finally, we demonstrate the application of BAGSE in analyzing real data from a differential expression experiment and a transcriptome-wide association study. Our results indicate that the proposed statistical framework is effective in aiding the discovery of potentially causal pathways and gene networks. Availability and implementation BAGSE is implemented using the C++ programing language and is freely available from https://github.com/xqwen/bagse/. Simulated and real data used in this paper are also available at the Github repository for reproducibility purposes. Supplementary information Supplementary data are available at Bioinformatics online.

2019 ◽  
Author(s):  
Abhay Hukku ◽  
Corbin Quick ◽  
Francesca Luca ◽  
Roger Pique-Regi ◽  
Xiaoquan Wen

AbstractGene set enrichment analysis has been shown to be effective in identifying relevant biological pathways underlying complex diseases. Existing approaches lack the ability to quantify the enrichment levels accurately, hence preventing the enrichment information to be further utilized in both upstream and downstream analyses. A modernized and rigorous approach for gene set enrichment analysis that emphasizes both hypothesis testing and enrichment estimation is much needed. We propose a novel computational method, Bayesian Analysis of Gene Set Enrichment (BAGSE), for gene set enrichment analysis. BAGSE is built on a Bayesian hierarchical model and fully accounts for the uncertainty embedded in the association evidence of individual genes. We adopt an empirical Bayes inference framework to fit the proposed hierarchical model by implementing an efficient EM algorithm. Through simulation studies, we illustrate that BAGSE yields accurate enrichment quantification while achieving similar power as the state-of-the-art methods. Further simulation studies show that BAGSE can effectively utilize the enrichment information to improve the power in gene discovery. Finally, we demonstrate the application of BAGSE in analyzing real data from a differential expression experiment and a Transcriptome-wide Association Study (TWAS). Our results indicate that the proposed statistical framework is effective in aiding the discovery of potentially causal pathways and gene networks. BAGSE is implemented using the C++ programming language and is freely available from https://github.com/xqwen/bagse/. Simulated and real data used in this paper are also available at the Github repository for reproducibility purposes.


2019 ◽  
Vol 35 (18) ◽  
pp. 3514-3516 ◽  
Author(s):  
Danyue Dong ◽  
Yuan Tian ◽  
Shijie C Zheng ◽  
Andrew E Teschendorff

AbstractMotivationThe biological interpretation of differentially methylated sites derived from Epigenome-Wide-Association Studies (EWAS) remains a significant challenge. Gene Set Enrichment Analysis (GSEA) is a general tool to aid biological interpretation, yet its correct and unbiased implementation in the EWAS context is difficult due to the differential probe representation of Illumina Infinium DNA methylation beadchips.ResultsWe present a novel GSEA method, called ebGSEA, which ranks genes, not CpGs, according to the overall level of differential methylation, as assessed using all the probes mapping to the given gene. Applied on simulated and real EWAS data, we show how ebGSEA may exhibit higher sensitivity and specificity than the current state-of-the-art, whilst also avoiding differential probe representation bias. Thus, ebGSEA will be a useful additional tool to aid the interpretation of EWAS data.Availability and implementationebGSEA is available from https://github.com/aet21/ebGSEA, and has been incorporated into the ChAMP Bioconductor package (https://www.bioconductor.org).Supplementary informationSupplementary data are available at Bioinformatics online.


2018 ◽  
Vol 35 (13) ◽  
pp. 2258-2266 ◽  
Author(s):  
Van Du T Tran ◽  
Sébastien Moretti ◽  
Alix T Coste ◽  
Sara Amorim-Vaz ◽  
Dominique Sanglard ◽  
...  

Abstract Motivation Genome-scale metabolic networks and transcriptomic data represent complementary sources of knowledge about an organism’s metabolism, yet their integration to achieve biological insight remains challenging. Results We investigate here condition-specific series of metabolic sub-networks constructed by successively removing genes from a comprehensive network. The optimal order of gene removal is deduced from transcriptomic data. The sub-networks are evaluated via a fitness function, which estimates their degree of alteration. We then consider how a gene set, i.e. a group of genes contributing to a common biological function, is depleted in different series of sub-networks to detect the difference between experimental conditions. The method, named metaboGSE, is validated on public data for Yarrowia lipolytica and mouse. It is shown to produce GO terms of higher specificity compared to popular gene set enrichment methods like GSEA or topGO. Availability and implementation The metaboGSE R package is available at https://CRAN.R-project.org/package=metaboGSE. Supplementary information Supplementary data are available at Bioinformatics online.


2020 ◽  
Author(s):  
Menglan Cai ◽  
Canh Hao Nguyen ◽  
Hiroshi Mamitsuka ◽  
Limin Li

AbstractGene set enrichment analysis (GSEA) has been widely used to identify gene sets with statistically significant difference between cases and controls against a large gene set. GSEA needs both phenotype labels and expression of genes. However, gene expression are assessed more often for model organisms than minor species. More importantly, gene expression could not be measured under specific conditions for human, due to high healthy risk of direct experiments, such as non-approved treatment or gene knockout, and then often substituted by mouse. Thus predicting enrichment significance (on a phenotype) of a given gene set of a species (target, say human), by using gene expression measured under the same phenotype of the other species (source, say mouse) is a vital and challenging problem, which we call CROSS-species Gene Set Enrichment Problem (XGSEP). For XGSEP, we propose XGSEA (Cross-species Gene Set Enrichment Analysis), with three steps of: 1) running GSEA for a source species to obtain enrichment scores and p-values of source gene sets; 2) representing the relation between source and target gene sets by domain adaptation; and 3) using regression to predict p-values of target gene sets, based on the representation in 2). We extensively validated XGSEA by using four real data sets under various settings, proving that XGSEA significantly outperformed three baseline methods. A case study of identifying important human pathways for T cell dysfunction and reprogramming from mouse ATAC-Seq data further confirmed the reliability of XGSEA. Source code is available through https://github.com/LiminLi-xjtu/XGSEAAuthor summaryGene set enrichment analysis (GSEA) is a powerful tool in the gene sets differential analysis given a ranked gene list. GSEA requires complete data, gene expression with phenotype labels. However, gene expression could not be measured under specific conditions for human, due to high risk of direct experiments, such as non-approved treatment or gene knockout, and then often substituted by mouse. Thus no availability of gene expression leads to more challenging problem, CROSS-species Gene Set Enrichment Problem (XGSEP), in which enrichment significance (on a phenotype) of a given gene set of a species (target, say human) is predicted by using gene expression measured under the same phenotype of the other species (source, say mouse). In this work, we propose XGSEA (Cross-species Gene Set Enrichment Analysis) for XGSEP, with three steps of: 1) GSEA; 2) domain adaptation; and 3) regression. The results of four real data sets and a case study indicate that XGSEA significantly outperformed three baseline methods and confirmed the reliability of XGSEA.


Author(s):  
James H Joly ◽  
William E Lowry ◽  
Nicholas A Graham

Abstract Motivation Gene Set Enrichment Analysis (GSEA) is an algorithm widely used to identify statistically enriched gene sets in transcriptomic data. However, GSEA cannot examine the enrichment of two gene sets or pathways relative to one another. Here we present Differential Gene Set Enrichment Analysis (DGSEA), an adaptation of GSEA that quantifies the relative enrichment of two gene sets. Results After validating the method using synthetic data, we demonstrate that DGSEA accurately captures the hypoxia-induced coordinated upregulation of glycolysis and downregulation of oxidative phosphorylation. We also show that DGSEA is more predictive than GSEA of the metabolic state of cancer cell lines, including lactate secretion and intracellular concentrations of lactate and AMP. Finally, we demonstrate the application of DGSEA to generate hypotheses about differential metabolic pathway activity in cellular senescence. Together, these data demonstrate that DGSEA is a novel tool to examine the relative enrichment of gene sets in transcriptomic data. Availability and implementation DGSEA software and tutorials are available at https://jamesjoly.github.io/DGSEA/. Supplementary information Supplementary data are available at Bioinformatics online.


2019 ◽  
Vol 8 (10) ◽  
pp. 1580 ◽  
Author(s):  
Kyoung Min Moon ◽  
Kyueng-Whan Min ◽  
Mi-Hye Kim ◽  
Dong-Hoon Kim ◽  
Byoung Kwan Son ◽  
...  

Ninety percent of patients with scrub typhus (SC) with vasculitis-like syndrome recover after mild symptoms; however, 10% can suffer serious complications, such as acute respiratory failure (ARF) and admission to the intensive care unit (ICU). Predictors for the progression of SC have not yet been established, and conventional scoring systems for ICU patients are insufficient to predict severity. We aimed to identify simple and robust indicators to predict aggressive behaviors of SC. We evaluated 91 patients with SC and 81 non-SC patients who were admitted to the ICU, and 32 cases from the public functional genomics data repository for gene expression analysis. We analyzed the relationships between several predictors and clinicopathological characteristics in patients with SC. We performed gene set enrichment analysis (GSEA) to identify SC-specific gene sets. The acid-base imbalance (ABI), measured 24 h before serious complications, was higher in patients with SC than in non-SC patients. A high ABI was associated with an increased incidence of ARF, leading to mechanical ventilation and worse survival. GSEA revealed that SC correlated to gene sets reflecting inflammation/apoptotic response and airway inflammation. ABI can be used to indicate ARF in patients with SC and assist with early detection.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Mike Fang ◽  
Brian Richardson ◽  
Cheryl M. Cameron ◽  
Jean-Eudes Dazard ◽  
Mark J. Cameron

Abstract Background In this study, we demonstrate that our modified Gene Set Enrichment Analysis (GSEA) method, drug perturbation GSEA (dpGSEA), can detect phenotypically relevant drug targets through a unique transcriptomic enrichment that emphasizes biological directionality of drug-derived gene sets. Results We detail our dpGSEA method and show its effectiveness in detecting specific perturbation of drugs in independent public datasets by confirming fluvastatin, paclitaxel, and rosiglitazone perturbation in gastroenteropancreatic neuroendocrine tumor cells. In drug discovery experiments, we found that dpGSEA was able to detect phenotypically relevant drug targets in previously published differentially expressed genes of CD4+T regulatory cells from immune responders and non-responders to antiviral therapy in HIV-infected individuals, such as those involved with virion replication, cell cycle dysfunction, and mitochondrial dysfunction. dpGSEA is publicly available at https://github.com/sxf296/drug_targeting. Conclusions dpGSEA is an approach that uniquely enriches on drug-defined gene sets while considering directionality of gene modulation. We recommend dpGSEA as an exploratory tool to screen for possible drug targeting molecules.


2011 ◽  
Vol 10 (4) ◽  
pp. 3856-3887 ◽  
Author(s):  
Q.Y. Ning ◽  
J.Z. Wu ◽  
N. Zang ◽  
J. Liang ◽  
Y.L. Hu ◽  
...  

2021 ◽  
Author(s):  
Chuan-Qi Xu ◽  
Kui-Sheng Yang ◽  
Shu-Xian Zhao ◽  
Jian Lv

Abstract Objective: Pancreatic cancer (PC) is one of the most malignant tumors. Cytosolic DNA sensing have been found to play an essential role in tumor. In this study, a cytosolic DNA sensing-related genes (CDSRGs) signature was constructed and the potential mechanisms also been discussed.Methods: The RNA expression and clinical data of PC were obtained from The Cancer Genome Atlas (TCGA) and Gene Expression Omnibus (GEO). Subsequently, univariate (UCR) and multivariate Cox regression (MCR) analyses were conducted to establish a prognostic model in the TCGA patients, which was verified by GEO patients. Cancer immune infiltrates were investigated via single sample gene set enrichment analysis (ssGSEA) and Tumor Immune Estimation Resource (TIMER). Finally, Gene Set Enrichment Analysis (GSEA) was used to investigate the related signaling pathways.Results: A prognostic model comprising four genes (POLR2E,IL18, MAVS, and FADD) was established. The survival rate of patients in the low-risk group was significantly higher than that of patients in the high-risk group. In addition, CDSRGs-risk score was proved as an independent prognostic factor in PC. Immune infiltrates and drug sensitivity are associated with POLR2E,IL18, MAVS, and FADD expression.Conclusions: In summary, we present and validated a CDSRGs risk model that is an independent prognostic factor and indicates the immune characteristics of PC. This prognostic model may facilitate the personalized treatment and monitoring.


Sign in / Sign up

Export Citation Format

Share Document