scholarly journals Easy and efficient ensemble gene set testing with EGSEA

F1000Research ◽  
2017 ◽  
Vol 6 ◽  
pp. 2010 ◽  
Author(s):  
Monther Alhamdoosh ◽  
Charity W. Law ◽  
Luyi Tian ◽  
Julie M. Sheridan ◽  
Milica Ng ◽  
...  

Gene set enrichment analysis is a popular approach for prioritising the biological processes perturbed in genomic datasets. The Bioconductor project hosts over 80 software packages capable of gene set analysis. Most of these packages search for enriched signatures amongst differentially regulated genes to reveal higher level biological themes that may be missed when focusing only on evidence from individual genes. With so many different methods on offer, choosing the best algorithm and visualization approach can be challenging. The EGSEA package solves this problem by combining results from up to 12 prominent gene set testing algorithms to obtain a consensus ranking of biologically relevant results.This workflow demonstrates how EGSEA can extend limma-based differential expression analyses for RNA-seq and microarray data using experiments that profile 3 distinct cell populations important for studying the origins of breast cancer. Following data normalization and set-up of an appropriate linear model for differential expression analysis, EGSEA builds gene signature specific indexes that link a wide range of mouse or human gene set collections obtained from MSigDB, GeneSetDB and KEGG to the gene expression data being investigated. EGSEA is then configured and the ensemble enrichment analysis run, returning an object that can be queried using several S4 methods for ranking gene sets and visualizing results via heatmaps, KEGG pathway views, GO graphs, scatter plots and bar plots. Finally, an HTML report that combines these displays can fast-track the sharing of results with collaborators, and thus expedite downstream biological validation. EGSEA is simple to use and can be easily integrated with existing gene expression analysis pipelines for both human and mouse data.

2016 ◽  
Vol 36 (suppl_1) ◽  
Author(s):  
Elisa C Maruko ◽  
Hao Xu ◽  
Sushma Kaul ◽  
Brian J Capaldo ◽  
Nathalie Pamir ◽  
...  

Atherosclerosis is a disease of both lipids and inflammatory immune cells. More specifically, elevated plasma levels of low-density lipoproteins (LDL) leads to migration of circulating monocytes into the artery wall. Lipid loaded monocyte cells subsequently proliferate in the arterial walls becoming macrophage foam cells; a hallmark of atherosclerotic lesions. A proposed mechanism of the protective effects of high-density lipoprotein (HDL) is apolipoprotein A-I (apo A-I) acting as a mediator of cholesterol efflux and subsequent foam cell regression. To better understand the biological changes stimulated by apo A-I treatment, differential expression analysis of microarray data was performed on spleen cells from apo A-I treated mice. LDL receptor null (LDLr -/- ) and LDL receptor and apo A-I null (LDLr -/- , apoA-I -/- ) mice were fed a western diet consisting of 0.2% cholesterol and 42% of calories as fat for 12 weeks. After 6 weeks of diet, a subset of mice for each genotype was subcutaneously injected with 200 micrograms of apo A-I 3 times a week for the remaining 6 weeks. The control group mice were subcutaneously injected with 200 micrograms of saline or BSA. Spleen cell RNA was isolated, purified, and analyzed for differential expression analysis using Illumina BeadArray Microarray Technology Analysis. Individual gene expression analysis for LDLr -/- , apoA-I -/- apo A-I treated mice showed 281 significantly differentially expressed genes compared to BSA treated mice. LDLr -/- A-I treated mice had 1502. Of the significant genes, 189 intersected across both genotypes. LDLr -/- , apoA-I -/- A-I mice showed 73 up-regulated and 116 down-regulated genes. Similarly, LDLr -/- A-I mice had 71 up-regulated and 118 down-regulated. One-directional Gene Set Enrichment Analysis (GSEA) of LDLr -/- , apoA-I -/- A-I mice revealed 49 significant pathways while a total of 63 were found for LDLr -/- . Of these pathways, 21 were up-regulated and 13 were down-regulated in both genotypes. Eight of the top 10 most significant up-regulated pathways in both genotypes were immune cell related. Their functions involve receptor, adhesion, and chemokine signaling. Overall, preliminary analysis suggests A-I treatment induces similar gene expression changes across different genotypes.


2018 ◽  
Author(s):  
Paul F. Harrison ◽  
Andrew D. Pattison ◽  
David R. Powell ◽  
Traude H. Beilharz

AbstractBackgroundA differential gene expression analysis may produce a set of significantly differentially expressed genes that is too large to easily investigate, so that a means of ranking genes by their biological interest level is desirable. The life-sciences have grappled with the abuse of p-values to rank genes for this purpose. As an alternative, a lower confidence bound on the magnitude of Log Fold Change (LFC) could be used to rank genes, but it has been unclear how to reconcile this with the need to perform False Discovery Rate (FDR) correction. The TREAT test of McCarthy and Smyth is a step in this direction, finding genes significantly exceeding a specified LFC threshold. Here we describe the use of test inversion on TREAT to present genes ranked by a confidence bound on the LFC, while still controlling FDR.ResultsTesting the Topconfects R package with simulated gene expression data shows the method outperforming current statistical approaches across a wide range of experiment sizes in the identification of genes with largest LFCs. Applying the method to a TCGA breast cancer data-set shows the method ranks some genes with large LFC higher than would traditional ranking by p-value. Importantly these two ranking methods lead to a different biological emphasis, in terms both of specific highly ranked genes and gene-set enrichment.ConclusionsThe choice of ranking method in differential expression analysis can affect the biological interpretation. The common default of ranking by p-value is implicitly by an effect size in which each gene is standardized to its own variability, rather than comparing genes on a common scale, which may not be appropriate. The Topconfects approach of presenting genes ranked by confident LFC effect size is a variation on the TREAT method with improved usability, removing the need to fine-tune a threshold parameter and removing the temptation to abuse p-values as a de-facto effect size.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Jovana Maksimovic ◽  
Alicia Oshlack ◽  
Belinda Phipson

AbstractDNA methylation is one of the most commonly studied epigenetic marks, due to its role in disease and development. Illumina methylation arrays have been extensively used to measure methylation across the human genome. Methylation array analysis has primarily focused on preprocessing, normalization, and identification of differentially methylated CpGs and regions. GOmeth and GOregion are new methods for performing unbiased gene set testing following differential methylation analysis. Benchmarking analyses demonstrate GOmeth outperforms other approaches, and GOregion is the first method for gene set testing of differentially methylated regions. Both methods are publicly available in the missMethyl Bioconductor R package.


2014 ◽  
Vol 13s1 ◽  
pp. CIN.S13882 ◽  
Author(s):  
Binghuang Cai ◽  
Xia Jiang

Analyzing biological system abnormalities in cancer patients based on measures of biological entities, such as gene expression levels, is an important and challenging problem. This paper applies existing methods, Gene Set Enrichment Analysis and Signaling Pathway Impact Analysis, to pathway abnormality analysis in lung cancer using microarray gene expression data. Gene expression data from studies of Lung Squamous Cell Carcinoma (LUSC) in The Cancer Genome Atlas project, and pathway gene set data from the Kyoto Encyclopedia of Genes and Genomes were used to analyze the relationship between pathways and phenotypes. Results, in the form of pathway rankings, indicate that some pathways may behave abnormally in LUSC. For example, both the cell cycle and viral carcinogenesis pathways ranked very high in LUSC. Furthermore, some pathways that are known to be associated with cancer, such as the p53 and the PI3K-Akt signal transduction pathways, were found to rank high in LUSC. Other pathways, such as bladder cancer and thyroid cancer pathways, were also ranked high in LUSC.


2015 ◽  
Vol 9s3 ◽  
pp. BBI.S29470 ◽  
Author(s):  
Mikhail G. Dozmorov ◽  
Nicolas Dominguez ◽  
Krista Bean ◽  
Susan R. Macwana ◽  
Virginia Roberts ◽  
...  

Systemic lupus erythematosus (SLE) is an autoimmune disease characterized by complex interplay among immune cell types. SLE activity is experimentally assessed by several blood tests, including gene expression profiling of heterogeneous populations of cells in peripheral blood. To better understand the contribution of different cell types in SLE pathogenesis, we applied the two methods in cell-type-specific differential expression analysis, csSAM and DSection, to identify cell-type-specific gene expression differences in heterogeneous gene expression measures obtained using RNA-seq technology. We identified B-cell-, monocyte-, and neutrophil-specific gene expression differences. Immunoglobulin-coding gene expression was altered in B-cells, while a ribosomal signature was prominent in monocytes. On the contrary, genes differentially expressed in the heterogeneous mixture of cells did not show any functional enrichment. Our results identify antigen binding and structural constituents of ribosomes as functions altered by B-cell- and monocyte-specific gene expression differences, respectively. Finally, these results position both csSAM and DSection methods as viable techniques for cell-type-specific differential expression analysis, which may help uncover pathogenic, cell-type-specific processes in SLE.


2021 ◽  
Vol 2021 ◽  
pp. 1-19
Author(s):  
Xinheng Liu ◽  
Yongxian Rong ◽  
Donglin Huang ◽  
Zhijie Liang ◽  
Xiaolin Yi ◽  
...  

Severe burns are acute wounds caused by local heat exposure, resulting in life-threatening systemic effects and poor survival. However, the specific molecular mechanisms remain unclear. First, we downloaded gene expression data related to severe burns from the GEO database (GSE19743, GSE37069, and GSE77791). Then, a gene expression analysis was performed to identify differentially expressed genes (DEGs) and construct protein-protein interaction (PPI) network. The molecular mechanism was identified by enrichment analysis and Gene Set Enrichment Analysis. In addition, STEM software was used to screen for genes persistently expressed during response to severe burns, and receiver operating characteristic (ROC) curve was used to identify key DEGs. A total of 2631 upregulated and 3451 downregulated DEGs were identified. PPI network analysis clustered these DEGs into 13 modules. Importantly, module genes mostly related with immune responses and metabolism. In addition, we identified genes persistently altered during the response to severe burns corresponding to survival and death status. Among the genes with high area under the ROC curve in the PPI network gene, CCL5 and LCK were identified as key DEGs, which may affect the prognosis of burn patients. Gene set variation analysis showed that the immune response was inhibited and several types of immune cells were decreased, while the metabolic response was enhanced. The results showed that persistent gene expression changes occur in response to severe burns, which may underlie chronic alterations in physiological pathways. Identifying the key altered genes may reveal potential therapeutic targets for mitigating the effects of severe burns.


2021 ◽  
Author(s):  
HUA HUANG ◽  
Shanshan Xu ◽  
Youran Li ◽  
Yunfei Gu ◽  
Lijiang Ji

Abstract Background: Colorectal cancer (CRC), the commonly seen malignancy, ranks the 3rd place among the causes of cancer-associated mortality. As suggested by more and more studies, long coding RNAs (lncRNAs) have been considered as prognostic biomarkers for CRC. But the significance of hypoxic lncRNAs in predicting CRC prognosis remains unclear.Methods: The gene expressed profiles for CRC cases were obtained based on the Cancer Genome Atlas (TCGA) and applied to estimate the hypoxia score using a single-sample gene set enrichment analysis (ssGSEA) algorithm. Overall survival (OS) of high- and low-hypoxia score group was analyzed by the Kaplan–Meier (KM) plot. To identify differentially expressed lncRNAs (DELs) between two hypoxia score groups, this study carried out differential expression analysis, and then further integrated with the DELs between controls and CRC patients to generate the hypoxia-related lncRNAs for CRC. Besides, prognostic lncRNAs were screened by the univariate Cox regression, which were later utilized for constructing the prognosis nomogram for CRC by adopting the least absolute shrinkage and selection operator (LASSO) algorithm. In addition, both accuracy and specificity of the constructed prognostic signature were detected through the receiver operating characteristic (ROC) analysis. Moreover, our constructed prognosis signature also was validated in the internal testing test. This study operated gene set enrichment analysis (GSEA) for exploring potential biological functions associated with the prognostic signature. Finally, the ceRNA network of the prognostic lncRNAs was constructed.Results: Among 2299 hypoxia-related lncRNAs of CRC in total, LINC00327, LINC00163, LINC00174, SYNPR-AS1, and MIR31HG were identified as prognostic lncRNAs by the univariate Cox regression, and adopted for constructing the prognosis signature for CRC. ROC analysis showed the predictive power and accuracy of the prognostic signature. Additionally, the GSEA revealed that ECM-receptor interaction, PI3K-Akt pathway, phagosome, and Hippo pathway were mostly associated with the high-risk group. 352 miRNAs-mRNAs pairs and 177 lncRNAs-miRNAs were predicted.Conclusion: To conclude , we identified 5 hypoxia-related lncRNAs to establish an accurate prognostic signature for CRC, providing important prognostic markers and therapeutic target.


2020 ◽  
Author(s):  
Xiaomei Lei ◽  
Zhijun Feng ◽  
Xiaojun Wang ◽  
Xiaodong He

Abstract Background. Exploring alterations in the host transcriptome following SARS-CoV-2 infection is not only highly warranted to help us understand molecular mechanisms of the disease, but also provide new prospective for screening effective antiviral drugs, finding new therapeutic targets, and evaluating the risk of systemic inflammatory response syndrome (SIRS) early.Methods. We downloaded three gene expression matrix files from the Gene Expression Omnibus (GEO) database, and extracted the gene expression data of the SARS-CoV-2 infection and non-infection in human samples and different cell line samples, and then performed gene set enrichment analysis (GSEA), respectively. Thereafter, we integrated the results of GSEA and obtained co-enriched gene sets and co-core genes in three various microarray data. Finally, we also constructed a protein-protein interaction (PPI) network and molecular modules for co-core genes and performed Kyoto encyclopedia of genes and genomes (KEGG) pathway analysis for the genes from modules to clarify their possible biological processes and underlying signaling pathway. Results. A total of 11 co-enriched gene sets were identified from the three various microarray data. Among them, 10 gene sets were activated, and involved in immune response and inflammatory reaction. 1 gene set was suppressed, and participated in cell cycle. The analysis of molecular modules showed that 2 modules might play a vital role in the pathogenic process of SARS-CoV-2 infection. The KEGG enrichment analysis showed that genes from module one enriched in signaling pathways related to inflammation, but genes from module two enriched in signaling of cell cycle and DNA replication. Particularly, necroptosis signaling, a newly identified type of programmed cell death that differed from apoptosis, was also determined in our findings. Additionally, for patients with SARS-CoV-2 infection, genes from module one showed a relatively high-level expression while genes from module two showed low-level. Conclusions. We identified two molecular modules were used to assess severity and predict the prognosis of the patients with SARS-CoV-2 infection. In addition, these results provide a unique opportunity to explore more molecular pathways as new potential targets on therapy in COVID 19.


F1000Research ◽  
2020 ◽  
Vol 9 ◽  
pp. 1444
Author(s):  
Charity W. Law ◽  
Kathleen Zeglinski ◽  
Xueyi Dong ◽  
Monther Alhamdoosh ◽  
Gordon K. Smyth ◽  
...  

Differential expression analysis of genomic data types, such as RNA-sequencing experiments, use linear models to determine the size and direction of the changes in gene expression. For RNA-sequencing, there are several established software packages for this purpose accompanied with analysis pipelines that are well described. However, there are two crucial steps in the analysis process that can be a stumbling block for many -- the set up an appropriate model via design matrices and the set up of comparisons of interest via contrast matrices. These steps are particularly troublesome because an extensive catalogue for design and contrast matrices does not currently exist. One would usually search for example case studies across different platforms and mix and match the advice from those sources to suit the dataset they have at hand. This article guides the reader through the basics of how to set up design and contrast matrices. We take a practical approach by providing code and graphical representation of each case study, starting with simpler examples (e.g. models with a single explanatory variable) and move onto more complex ones (e.g. interaction models, mixed effects models, higher order time series and cyclical models). Although our work has been written specifically with a limma-style pipeline in mind, most of it is also applicable to other software packages for differential expression analysis, and the ideas covered can be adapted to data analysis of other high-throughput technologies. Where appropriate, we explain the interpretation and differences between models to aid readers in their own model choices. Unnecessary jargon and theory is omitted where possible so that our work is accessible to a wide audience of readers, from beginners to those with experience in genomics data analysis.


Sign in / Sign up

Export Citation Format

Share Document