scholarly journals Introduction to Statistical Methods for Analyzing Large Data Sets: Gene-Set Enrichment Analysis

2011 ◽  
Vol 4 (190) ◽  
pp. tr4-tr4 ◽  
Author(s):  
N. R. Clark ◽  
A. Ma'ayan
2014 ◽  
Vol 13s1 ◽  
pp. CIN.S13305 ◽  
Author(s):  
Jianping Hua ◽  
Michael L. Bittner ◽  
Edward R. Dougherty

Gene set enrichment analysis (GSA) methods have been widely adopted by biological labs to analyze data and generate hypotheses for validation. Most of the existing comparison studies focus on whether the existing GSA methods can produce accurate P-values; however, practitioners are often more concerned with the correct gene-set ranking generated by the methods. The ranking performance is closely related to two critical goals associated with GSA methods: the ability to reveal biological themes and ensuring reproducibility, especially for small-sample studies. We have conducted a comprehensive simulation study focusing on the ranking performance of seven representative GSA methods. We overcome the limitation on the availability of real data sets by creating hybrid data models from existing large data sets. To build the data model, we pick a master gene from the data set to form the ground truth and artificially generate the phenotype labels. Multiple hybrid data models can be constructed from one data set and multiple data sets of smaller sizes can be generated by resampling the original data set. This approach enables us to generate a large batch of data sets to check the ranking performance of GSA methods. Our simulation study reveals that for the proposed data model, the Q2 type GSA methods have in general better performance than other GSA methods and the global test has the most robust results. The properties of a data set play a critical role in the performance. For the data sets with highly connected genes, all GSA methods suffer significantly in performance.


2019 ◽  
Author(s):  
Ludwig Geistlinger ◽  
Gergely Csaba ◽  
Mara Santarelli ◽  
Marcel Ramos ◽  
Lucas Schiffer ◽  
...  

AbstractBackgroundAlthough gene set enrichment analysis has become an integral part of high-throughput gene expression data analysis, the assessment of enrichment methods remains rudimentary and ad hoc. In the absence of suitable gold standards, evaluations are commonly restricted to selected data sets and biological reasoning on the relevance of resulting enriched gene sets. However, this is typically incomplete and biased towards the goals of individual investigations.ResultsWe present a general framework for standardized and structured benchmarking of enrichment methods based on defined criteria for applicability, gene set prioritization, and detection of relevant processes. This framework incorporates a curated compendium of 75 expression data sets investigating 42 different human diseases. The compendium features microarray and RNA-seq measurements, and each dataset is associated with a precompiled GO/KEGG relevance ranking for the corresponding disease under investigation. We perform a comprehensive assessment of 10 major enrichment methods on the benchmark compendium, identifying significant differences in (i) runtime and applicability to RNA-seq data, (ii) fraction of enriched gene sets depending on the type of null hypothesis tested, and (iii) recovery of the a priori defined relevance rankings. Based on these findings, we make practical recommendations on (i) how methods originally developed for microarray data can efficiently be applied to RNA-seq data, (ii) how to interpret results depending on the type of gene set test conducted, and (iii) which methods are best suited to effectively prioritize gene sets with high relevance for the phenotype investigated.ConclusionWe carried out a systematic assessment of existing enrichment methods, and identified best performing methods, but also general shortcomings in how gene set analysis is currently conducted. We provide a directly executable benchmark system for straightforward assessment of additional enrichment methods.Availabilityhttp://bioconductor.org/packages/GSEABenchmarkeR


2020 ◽  
Author(s):  
Menglan Cai ◽  
Canh Hao Nguyen ◽  
Hiroshi Mamitsuka ◽  
Limin Li

AbstractGene set enrichment analysis (GSEA) has been widely used to identify gene sets with statistically significant difference between cases and controls against a large gene set. GSEA needs both phenotype labels and expression of genes. However, gene expression are assessed more often for model organisms than minor species. More importantly, gene expression could not be measured under specific conditions for human, due to high healthy risk of direct experiments, such as non-approved treatment or gene knockout, and then often substituted by mouse. Thus predicting enrichment significance (on a phenotype) of a given gene set of a species (target, say human), by using gene expression measured under the same phenotype of the other species (source, say mouse) is a vital and challenging problem, which we call CROSS-species Gene Set Enrichment Problem (XGSEP). For XGSEP, we propose XGSEA (Cross-species Gene Set Enrichment Analysis), with three steps of: 1) running GSEA for a source species to obtain enrichment scores and p-values of source gene sets; 2) representing the relation between source and target gene sets by domain adaptation; and 3) using regression to predict p-values of target gene sets, based on the representation in 2). We extensively validated XGSEA by using four real data sets under various settings, proving that XGSEA significantly outperformed three baseline methods. A case study of identifying important human pathways for T cell dysfunction and reprogramming from mouse ATAC-Seq data further confirmed the reliability of XGSEA. Source code is available through https://github.com/LiminLi-xjtu/XGSEAAuthor summaryGene set enrichment analysis (GSEA) is a powerful tool in the gene sets differential analysis given a ranked gene list. GSEA requires complete data, gene expression with phenotype labels. However, gene expression could not be measured under specific conditions for human, due to high risk of direct experiments, such as non-approved treatment or gene knockout, and then often substituted by mouse. Thus no availability of gene expression leads to more challenging problem, CROSS-species Gene Set Enrichment Problem (XGSEP), in which enrichment significance (on a phenotype) of a given gene set of a species (target, say human) is predicted by using gene expression measured under the same phenotype of the other species (source, say mouse). In this work, we propose XGSEA (Cross-species Gene Set Enrichment Analysis) for XGSEP, with three steps of: 1) GSEA; 2) domain adaptation; and 3) regression. The results of four real data sets and a case study indicate that XGSEA significantly outperformed three baseline methods and confirmed the reliability of XGSEA.


BMC Genomics ◽  
2014 ◽  
Vol 15 (Suppl 1) ◽  
pp. S6 ◽  
Author(s):  
Yinglei Lai ◽  
Fanni Zhang ◽  
Tapan K Nayak ◽  
Reza Modarres ◽  
Norman H Lee ◽  
...  

2019 ◽  
Vol 8 (10) ◽  
pp. 1580 ◽  
Author(s):  
Kyoung Min Moon ◽  
Kyueng-Whan Min ◽  
Mi-Hye Kim ◽  
Dong-Hoon Kim ◽  
Byoung Kwan Son ◽  
...  

Ninety percent of patients with scrub typhus (SC) with vasculitis-like syndrome recover after mild symptoms; however, 10% can suffer serious complications, such as acute respiratory failure (ARF) and admission to the intensive care unit (ICU). Predictors for the progression of SC have not yet been established, and conventional scoring systems for ICU patients are insufficient to predict severity. We aimed to identify simple and robust indicators to predict aggressive behaviors of SC. We evaluated 91 patients with SC and 81 non-SC patients who were admitted to the ICU, and 32 cases from the public functional genomics data repository for gene expression analysis. We analyzed the relationships between several predictors and clinicopathological characteristics in patients with SC. We performed gene set enrichment analysis (GSEA) to identify SC-specific gene sets. The acid-base imbalance (ABI), measured 24 h before serious complications, was higher in patients with SC than in non-SC patients. A high ABI was associated with an increased incidence of ARF, leading to mechanical ventilation and worse survival. GSEA revealed that SC correlated to gene sets reflecting inflammation/apoptotic response and airway inflammation. ABI can be used to indicate ARF in patients with SC and assist with early detection.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Mike Fang ◽  
Brian Richardson ◽  
Cheryl M. Cameron ◽  
Jean-Eudes Dazard ◽  
Mark J. Cameron

Abstract Background In this study, we demonstrate that our modified Gene Set Enrichment Analysis (GSEA) method, drug perturbation GSEA (dpGSEA), can detect phenotypically relevant drug targets through a unique transcriptomic enrichment that emphasizes biological directionality of drug-derived gene sets. Results We detail our dpGSEA method and show its effectiveness in detecting specific perturbation of drugs in independent public datasets by confirming fluvastatin, paclitaxel, and rosiglitazone perturbation in gastroenteropancreatic neuroendocrine tumor cells. In drug discovery experiments, we found that dpGSEA was able to detect phenotypically relevant drug targets in previously published differentially expressed genes of CD4+T regulatory cells from immune responders and non-responders to antiviral therapy in HIV-infected individuals, such as those involved with virion replication, cell cycle dysfunction, and mitochondrial dysfunction. dpGSEA is publicly available at https://github.com/sxf296/drug_targeting. Conclusions dpGSEA is an approach that uniquely enriches on drug-defined gene sets while considering directionality of gene modulation. We recommend dpGSEA as an exploratory tool to screen for possible drug targeting molecules.


2011 ◽  
Vol 10 (4) ◽  
pp. 3856-3887 ◽  
Author(s):  
Q.Y. Ning ◽  
J.Z. Wu ◽  
N. Zang ◽  
J. Liang ◽  
Y.L. Hu ◽  
...  

2021 ◽  
Author(s):  
Chuan-Qi Xu ◽  
Kui-Sheng Yang ◽  
Shu-Xian Zhao ◽  
Jian Lv

Abstract Objective: Pancreatic cancer (PC) is one of the most malignant tumors. Cytosolic DNA sensing have been found to play an essential role in tumor. In this study, a cytosolic DNA sensing-related genes (CDSRGs) signature was constructed and the potential mechanisms also been discussed.Methods: The RNA expression and clinical data of PC were obtained from The Cancer Genome Atlas (TCGA) and Gene Expression Omnibus (GEO). Subsequently, univariate (UCR) and multivariate Cox regression (MCR) analyses were conducted to establish a prognostic model in the TCGA patients, which was verified by GEO patients. Cancer immune infiltrates were investigated via single sample gene set enrichment analysis (ssGSEA) and Tumor Immune Estimation Resource (TIMER). Finally, Gene Set Enrichment Analysis (GSEA) was used to investigate the related signaling pathways.Results: A prognostic model comprising four genes (POLR2E,IL18, MAVS, and FADD) was established. The survival rate of patients in the low-risk group was significantly higher than that of patients in the high-risk group. In addition, CDSRGs-risk score was proved as an independent prognostic factor in PC. Immune infiltrates and drug sensitivity are associated with POLR2E,IL18, MAVS, and FADD expression.Conclusions: In summary, we present and validated a CDSRGs risk model that is an independent prognostic factor and indicates the immune characteristics of PC. This prognostic model may facilitate the personalized treatment and monitoring.


2021 ◽  
Vol 4 (5) ◽  
pp. e201900332
Author(s):  
Elena A Afanasyeva ◽  
Moritz Gartlgruber ◽  
Tatsiana Ryl ◽  
Bieke Decaesteker ◽  
Geertrui Denecker ◽  
...  

The migrational propensity of neuroblastoma is affected by cell identity, but the mechanisms behind the divergence remain unknown. Using RNAi and time-lapse imaging, we show that ADRN-type NB cells exhibit RAC1- and kalirin-dependent nucleokinetic (NUC) migration that relies on several integral components of neuronal migration. Inhibition of NUC migration by RAC1 and kalirin-GEF1 inhibitors occurs without hampering cell proliferation and ADRN identity. Using three clinically relevant expression dichotomies, we reveal that most of up-regulated mRNAs in RAC1- and kalirin–GEF1–suppressed ADRN-type NB cells are associated with low-risk characteristics. The computational analysis shows that, in a context of overall gene set poverty, the upregulomes in RAC1- and kalirin–GEF1–suppressed ADRN-type cells are a batch of AU-rich element–containing mRNAs, which suggests a link between NUC migration and mRNA stability. Gene set enrichment analysis–based search for vulnerabilities reveals prospective weak points in RAC1- and kalirin–GEF1–suppressed ADRN-type NB cells, including activities of H3K27- and DNA methyltransferases. Altogether, these data support the introduction of NUC inhibitors into cancer treatment research.


Sign in / Sign up

Export Citation Format

Share Document