scholarly journals Revealing Biological Pathways Implicated in Lung Cancer from TCGA Gene Expression Data Using Gene Set Enrichment Analysis

2014 ◽  
Vol 13s1 ◽  
pp. CIN.S13882 ◽  
Author(s):  
Binghuang Cai ◽  
Xia Jiang

Analyzing biological system abnormalities in cancer patients based on measures of biological entities, such as gene expression levels, is an important and challenging problem. This paper applies existing methods, Gene Set Enrichment Analysis and Signaling Pathway Impact Analysis, to pathway abnormality analysis in lung cancer using microarray gene expression data. Gene expression data from studies of Lung Squamous Cell Carcinoma (LUSC) in The Cancer Genome Atlas project, and pathway gene set data from the Kyoto Encyclopedia of Genes and Genomes were used to analyze the relationship between pathways and phenotypes. Results, in the form of pathway rankings, indicate that some pathways may behave abnormally in LUSC. For example, both the cell cycle and viral carcinogenesis pathways ranked very high in LUSC. Furthermore, some pathways that are known to be associated with cancer, such as the p53 and the PI3K-Akt signal transduction pathways, were found to rank high in LUSC. Other pathways, such as bladder cancer and thyroid cancer pathways, were also ranked high in LUSC.

2016 ◽  
Author(s):  
Gennady Korotkevich ◽  
Vladimir Sukhov ◽  
Alexey Sergushichev

AbstractPreranked gene set enrichment analysis (GSEA) is a widely used method for interpretation of gene expression data in terms of biological processes. Here we present FGSEA method that is able to estimate arbitrarily low GSEA P-values with a higher accuracy and much faster compared to other implementations. We also present a polynomial algorithm to calculate GSEA P-values exactly, which we use to practically confirm the accuracy of the method.


PPAR Research ◽  
2011 ◽  
Vol 2011 ◽  
pp. 1-10 ◽  
Author(s):  
Kan He ◽  
Qishan Wang ◽  
Yumei Yang ◽  
Minghui Wang ◽  
Yuchun Pan

Gene expression profiling of PPARαhas been used in several studies, but fewer studies went further to identify the tissue-specific pathways or genes involved in PPARαactivation in genome-wide. Here, we employed and applied gene set enrichment analysis to two microarray datasets both PPARαrelated respectively in mouse liver and intestine. We suggested that the regulatory mechanism of PPARαactivation by WY14643 in mouse small intestine is more complicated than in liver due to more involved pathways. Several pathways were cancer-related such as pancreatic cancer and small cell lung cancer, which indicated that PPARαmay have an important role in prevention of cancer development. 12 PPARαdependent pathways and 4 PPARαindependent pathways were identified highly common in both liver and intestine of mice. Most of them were metabolism related, such as fatty acid metabolism, tryptophan metabolism, pyruvate metabolism with regard to PPARαregulation but gluconeogenesis and propanoate metabolism independent of PPARαregulation. Keratan sulfate biosynthesis, the pathway of regulation of actin cytoskeleton, the pathways associated with prostate cancer and small cell lung cancer were not identified as hepatic PPARαindependent but as WY14643 dependent ones in intestinal study. We also provided some novel hepatic tissue-specific marker genes.


2013 ◽  
Vol 39 (1) ◽  
pp. 77-88 ◽  
Author(s):  
Xiaocong Fang ◽  
Michael Netzer ◽  
Christian Baumgartner ◽  
Chunxue Bai ◽  
Xiangdong Wang

PLoS ONE ◽  
2014 ◽  
Vol 9 (9) ◽  
pp. e107629 ◽  
Author(s):  
Pui Shan Wong ◽  
Michihiro Tanaka ◽  
Yoshihiko Sunaga ◽  
Masayoshi Tanaka ◽  
Takeaki Taniguchi ◽  
...  

2019 ◽  
Author(s):  
Ludwig Geistlinger ◽  
Gergely Csaba ◽  
Mara Santarelli ◽  
Marcel Ramos ◽  
Lucas Schiffer ◽  
...  

AbstractBackgroundAlthough gene set enrichment analysis has become an integral part of high-throughput gene expression data analysis, the assessment of enrichment methods remains rudimentary and ad hoc. In the absence of suitable gold standards, evaluations are commonly restricted to selected data sets and biological reasoning on the relevance of resulting enriched gene sets. However, this is typically incomplete and biased towards the goals of individual investigations.ResultsWe present a general framework for standardized and structured benchmarking of enrichment methods based on defined criteria for applicability, gene set prioritization, and detection of relevant processes. This framework incorporates a curated compendium of 75 expression data sets investigating 42 different human diseases. The compendium features microarray and RNA-seq measurements, and each dataset is associated with a precompiled GO/KEGG relevance ranking for the corresponding disease under investigation. We perform a comprehensive assessment of 10 major enrichment methods on the benchmark compendium, identifying significant differences in (i) runtime and applicability to RNA-seq data, (ii) fraction of enriched gene sets depending on the type of null hypothesis tested, and (iii) recovery of the a priori defined relevance rankings. Based on these findings, we make practical recommendations on (i) how methods originally developed for microarray data can efficiently be applied to RNA-seq data, (ii) how to interpret results depending on the type of gene set test conducted, and (iii) which methods are best suited to effectively prioritize gene sets with high relevance for the phenotype investigated.ConclusionWe carried out a systematic assessment of existing enrichment methods, and identified best performing methods, but also general shortcomings in how gene set analysis is currently conducted. We provide a directly executable benchmark system for straightforward assessment of additional enrichment methods.Availabilityhttp://bioconductor.org/packages/GSEABenchmarkeR


2021 ◽  
Vol 11 ◽  
Author(s):  
Junyu Huo ◽  
Liqun Wu ◽  
Yunjin Zang

BackgroundThe high mutation rate of TP53 in hepatocellular carcinoma (HCC) makes it an attractive potential therapeutic target. However, the mechanism by which TP53 mutation affects the prognosis of HCC is not fully understood.Material and ApproachThis study downloaded a gene expression profile and clinical-related information from The Cancer Genome Atlas (TCGA) database and the international genome consortium (ICGC) database. We used Gene Set Enrichment Analysis (GSEA) to determine the difference in gene expression patterns between HCC samples with wild-type TP53 (n=258) and mutant TP53 (n=116) in the TCGA cohort. We screened prognosis-related genes by univariate Cox regression analysis and Kaplan–Meier (KM) survival analysis. We constructed a six-gene prognostic signature in the TCGA training group (n=184) by Lasso and multivariate Cox regression analysis. To assess the predictive capability and applicability of the signature in HCC, we conducted internal validation, external validation, integrated analysis and subgroup analysis.ResultsA prognostic signature consisting of six genes (EIF2S1, SEC61A1, CDC42EP2, SRM, GRM8, and TBCD) showed good performance in predicting the prognosis of HCC. The area under the curve (AUC) values of the ROC curve of 1-, 2-, and 3-year survival of the model were all greater than 0.7 in each independent cohort (internal testing cohort, n = 181; TCGA cohort, n = 365; ICGC cohort, n = 229; whole cohort, n = 594; subgroup, n = 9). Importantly, by gene set variation analysis (GSVA) and the single sample gene set enrichment analysis (ssGSEA) method, we found three possible causes that may lead to poor prognosis of HCC: high proliferative activity, low metabolic activity and immunosuppression.ConclusionOur study provides a reliable method for the prognostic risk assessment of HCC and has great potential for clinical transformation.


2020 ◽  
Author(s):  
Menglan Cai ◽  
Canh Hao Nguyen ◽  
Hiroshi Mamitsuka ◽  
Limin Li

AbstractGene set enrichment analysis (GSEA) has been widely used to identify gene sets with statistically significant difference between cases and controls against a large gene set. GSEA needs both phenotype labels and expression of genes. However, gene expression are assessed more often for model organisms than minor species. More importantly, gene expression could not be measured under specific conditions for human, due to high healthy risk of direct experiments, such as non-approved treatment or gene knockout, and then often substituted by mouse. Thus predicting enrichment significance (on a phenotype) of a given gene set of a species (target, say human), by using gene expression measured under the same phenotype of the other species (source, say mouse) is a vital and challenging problem, which we call CROSS-species Gene Set Enrichment Problem (XGSEP). For XGSEP, we propose XGSEA (Cross-species Gene Set Enrichment Analysis), with three steps of: 1) running GSEA for a source species to obtain enrichment scores and p-values of source gene sets; 2) representing the relation between source and target gene sets by domain adaptation; and 3) using regression to predict p-values of target gene sets, based on the representation in 2). We extensively validated XGSEA by using four real data sets under various settings, proving that XGSEA significantly outperformed three baseline methods. A case study of identifying important human pathways for T cell dysfunction and reprogramming from mouse ATAC-Seq data further confirmed the reliability of XGSEA. Source code is available through https://github.com/LiminLi-xjtu/XGSEAAuthor summaryGene set enrichment analysis (GSEA) is a powerful tool in the gene sets differential analysis given a ranked gene list. GSEA requires complete data, gene expression with phenotype labels. However, gene expression could not be measured under specific conditions for human, due to high risk of direct experiments, such as non-approved treatment or gene knockout, and then often substituted by mouse. Thus no availability of gene expression leads to more challenging problem, CROSS-species Gene Set Enrichment Problem (XGSEP), in which enrichment significance (on a phenotype) of a given gene set of a species (target, say human) is predicted by using gene expression measured under the same phenotype of the other species (source, say mouse). In this work, we propose XGSEA (Cross-species Gene Set Enrichment Analysis) for XGSEP, with three steps of: 1) GSEA; 2) domain adaptation; and 3) regression. The results of four real data sets and a case study indicate that XGSEA significantly outperformed three baseline methods and confirmed the reliability of XGSEA.


Sign in / Sign up

Export Citation Format

Share Document