Gene set enrichment analysis for genome-wide DNA methylation data

AbstractDNA methylation is one of the most commonly studied epigenetic marks, due to its role in disease and development. Illumina methylation arrays have been extensively used to measure methylation across the human genome. Methylation array analysis has primarily focused on preprocessing, normalization, and identification of differentially methylated CpGs and regions. GOmeth and GOregion are new methods for performing unbiased gene set testing following differential methylation analysis. Benchmarking analyses demonstrate GOmeth outperforms other approaches, and GOregion is the first method for gene set testing of differentially methylated regions. Both methods are publicly available in the missMethyl Bioconductor R package.

Download Full-text

Gene set enrichment analysis for genome-wide DNA methylation data

10.1101/2020.08.24.265702 ◽

2020 ◽

Cited By ~ 1

Author(s):

Jovana Maksimovic ◽

Alicia Oshlack ◽

Belinda Phipson

Keyword(s):

Dna Methylation ◽

Enrichment Analysis ◽

R Package ◽

Gene Set Enrichment Analysis ◽

Methylation Array ◽

Gene Set ◽

Genome Wide ◽

Genome Methylation ◽

Unbiased Gene ◽

Gene Set Testing

AbstractDNA methylation is one of the most commonly studied epigenetic marks, due to its role in disease and development. Illumina methylation arrays have been extensively used to measure methylation across the human genome. Methylation array analysis has primarily focused on preprocessing, normalisation and identification of differentially methylated CpGs and regions. GOmeth and GOregion are new methods for performing unbiased gene set testing following differential methylation analysis. Benchmarking analyses demonstrate GOmeth outperforms other approaches and GOregion is the first method for gene set testing of differentially methylated regions. Both methods are publicly available in the missMethyl Bioconductor R package.

Download Full-text

Genome-wide association analysis of hippocampal volume identifies enrichment of neurogenesis-related pathways

Scientific Reports ◽

10.1038/s41598-019-50507-3 ◽

2019 ◽

Vol 9 (1) ◽

Cited By ~ 4

Author(s):

Emrin Horgusluoglu-Moloch ◽

◽

Shannon L. Risacher ◽

Paul K. Crane ◽

Derrek Hibar ◽

...

Keyword(s):

Association Analysis ◽

Adult Neurogenesis ◽

Enrichment Analysis ◽

Hippocampal Volume ◽

Imaging Genetics ◽

Gene Set Enrichment Analysis ◽

Genome Wide Association ◽

Gene Set Enrichment ◽

Gene Set ◽

Genome Wide

Abstract Adult neurogenesis occurs in the dentate gyrus of the hippocampus during adulthood and contributes to sustaining the hippocampal formation. To investigate whether neurogenesis-related pathways are associated with hippocampal volume, we performed gene-set enrichment analysis using summary statistics from a large-scale genome-wide association study (N = 13,163) of hippocampal volume from the Enhancing Neuro Imaging Genetics through Meta-Analysis (ENIGMA) Consortium and two year hippocampal volume changes from baseline in cognitively normal individuals from Alzheimer’s Disease Neuroimaging Initiative Cohort (ADNI). Gene-set enrichment analysis of hippocampal volume identified 44 significantly enriched biological pathways (FDR corrected p-value < 0.05), of which 38 pathways were related to neurogenesis-related processes including neurogenesis, generation of new neurons, neuronal development, and neuronal migration and differentiation. For genes highly represented in the significantly enriched neurogenesis-related pathways, gene-based association analysis identified TESC, ACVR1, MSRB3, and DPP4 as significantly associated with hippocampal volume. Furthermore, co-expression network-based functional analysis of gene expression data in the hippocampal subfields, CA1 and CA3, from 32 normal controls showed that distinct co-expression modules were mostly enriched in neurogenesis related pathways. Our results suggest that neurogenesis-related pathways may be enriched for hippocampal volume and that hippocampal volume may serve as a potential phenotype for the investigation of human adult neurogenesis.

Download Full-text

GSEA-SNP: applying gene set enrichment analysis to SNP data from genome-wide association studies

Bioinformatics ◽

10.1093/bioinformatics/btn516 ◽

2008 ◽

Vol 24 (23) ◽

pp. 2784-2785 ◽

Cited By ~ 119

Author(s):

Marit Holden ◽

Shiwei Deng ◽

Leszek Wojnowski ◽

Bettina Kulle

Keyword(s):

Association Studies ◽

Enrichment Analysis ◽

Gene Set Enrichment Analysis ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Gene Set Enrichment ◽

Gene Set ◽

Snp Data ◽

Genome Wide

Download Full-text

Easy and efficient ensemble gene set testing with EGSEA

F1000Research ◽

10.12688/f1000research.12544.1 ◽

2017 ◽

Vol 6 ◽

pp. 2010 ◽

Cited By ~ 17

Author(s):

Monther Alhamdoosh ◽

Charity W. Law ◽

Luyi Tian ◽

Julie M. Sheridan ◽

Milica Ng ◽

...

Keyword(s):

Gene Expression ◽

Differential Expression ◽

Expression Analysis ◽

Differential Expression Analysis ◽

Enrichment Analysis ◽

Gene Set Enrichment Analysis ◽

Gene Set ◽

P Gene ◽

Wide Range ◽

Gene Set Testing

Gene set enrichment analysis is a popular approach for prioritising the biological processes perturbed in genomic datasets. The Bioconductor project hosts over 80 software packages capable of gene set analysis. Most of these packages search for enriched signatures amongst differentially regulated genes to reveal higher level biological themes that may be missed when focusing only on evidence from individual genes. With so many different methods on offer, choosing the best algorithm and visualization approach can be challenging. The EGSEA package solves this problem by combining results from up to 12 prominent gene set testing algorithms to obtain a consensus ranking of biologically relevant results.This workflow demonstrates how EGSEA can extend limma-based differential expression analyses for RNA-seq and microarray data using experiments that profile 3 distinct cell populations important for studying the origins of breast cancer. Following data normalization and set-up of an appropriate linear model for differential expression analysis, EGSEA builds gene signature specific indexes that link a wide range of mouse or human gene set collections obtained from MSigDB, GeneSetDB and KEGG to the gene expression data being investigated. EGSEA is then configured and the ensemble enrichment analysis run, returning an object that can be queried using several S4 methods for ranking gene sets and visualizing results via heatmaps, KEGG pathway views, GO graphs, scatter plots and bar plots. Finally, an HTML report that combines these displays can fast-track the sharing of results with collaborators, and thus expedite downstream biological validation. EGSEA is simple to use and can be easily integrated with existing gene expression analysis pipelines for both human and mouse data.

Download Full-text

Importance of SNP Dependency Correction and Association Integration for Gene Set Analysis in Genome-Wide Association Studies

Frontiers in Genetics ◽

10.3389/fgene.2021.767358 ◽

2021 ◽

Vol 12 ◽

Author(s):

Michal Marczyk ◽

Agnieszka Macioszek ◽

Joanna Tobiasz ◽

Joanna Polanska ◽

Joanna Zyla

Keyword(s):

Association Studies ◽

Enrichment Analysis ◽

Gene Set Enrichment Analysis ◽

Genome Wide Association ◽

Gene Set Analysis ◽

Genome Wide Association Studies ◽

Gene Set Enrichment ◽

Gene Set ◽

Genome Wide ◽

The Impact

A typical genome-wide association study (GWAS) analyzes millions of single-nucleotide polymorphisms (SNPs), several of which are in a region of the same gene. To conduct gene set analysis (GSA), information from SNPs needs to be unified at the gene level. A widely used practice is to use only the most relevant SNP per gene; however, there are other methods of integration that could be applied here. Also, the problem of nonrandom association of alleles at two or more loci is often neglected. Here, we tested the impact of incorporation of different integrations and linkage disequilibrium (LD) correction on the performance of several GSA methods. Matched normal and breast cancer samples from The Cancer Genome Atlas database were used to evaluate the performance of six GSA algorithms: Coincident Extreme Ranks in Numerical Observations (CERNO), Gene Set Enrichment Analysis (GSEA), GSEA-SNP, improved GSEA for GWAS (i-GSEA4GWAS), Meta-Analysis Gene-set Enrichment of variaNT Associations (MAGENTA), and Over-Representation Analysis (ORA). Association of SNPs to phenotype was calculated using modified McNemar’s test. Results for SNPs mapped to the same gene were integrated using Fisher and Stouffer methods and compared with the minimum p-value method. Four common measures were used to quantify the performance of all combinations of methods. Results of GSA analysis on GWAS were compared to the one performed on gene expression data. Comparing all evaluation metrics across different GSA algorithms, integrations, and LD correction, we highlighted CERNO, and MAGENTA with Stouffer as the most efficient. Applying LD correction increased prioritization and specificity of enrichment outcomes for all tested algorithms. When Fisher or Stouffer were used with LD, sensitivity and reproducibility were also better. Using any integration method was beneficial in comparison with a minimum p-value method in specific combinations. The correlation between GSA results from genomic and transcriptomic level was the highest when Stouffer integration was combined with LD correction. We thoroughly evaluated different approaches to GSA in GWAS in terms of performance to guide others to select the most effective combinations. We showed that LD correction and Stouffer integration could increase the performance of enrichment analysis and encourage the usage of these techniques.

Download Full-text

Dysregulation of post-transcriptional modification by copy number variable microRNAs in schizophrenia with enhanced glycation stress

Translational Psychiatry ◽

10.1038/s41398-021-01460-1 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Akane Yoshikawa ◽

Itaru Kushima ◽

Mitsuhiro Miyashita ◽

Kazuya Toriumi ◽

Kazuhiro Suzuki ◽

...

Keyword(s):

Oxidative Stress ◽

Copy Number ◽

Mirna Target ◽

Target Genes ◽

Enrichment Analysis ◽

Synaptic Function ◽

Gene Set Enrichment Analysis ◽

Gene Set Enrichment ◽

Gene Set ◽

Genome Wide

AbstractPreviously, we identified a subpopulation of schizophrenia (SCZ) showing increased levels of plasma pentosidine, a marker of glycation and oxidative stress. However, its causative genetic factors remain largely unknown. Recently, it has been suggested that dysregulated posttranslational modification by copy number variable microRNAs (CNV-miRNAs) may contribute to the etiology of SCZ. Here, an integrative genome-wide CNV-miRNA analysis was performed to investigate the etiology of SCZ with accumulated plasma pentosidine (PEN-SCZ). The number of CNV-miRNAs and the gene ontology (GO) in the context of miRNAs within CNVs were compared between PEN-SCZ and non-PEN-SCZ groups. Gene set enrichment analysis of miRNA target genes was further performed to evaluate the pathways affected in PEN-SCZ. We show that miRNAs were significantly enriched within CNVs in the PEN-SCZ versus non-PEN-SCZ groups (p = 0.032). Of note, as per GO analysis, the dysregulated neurodevelopmental events in the two groups may have different origins. Additionally, gene set enrichment analysis of miRNA target genes revealed that miRNAs involved in glycation/oxidative stress and synaptic neurotransmission, especially glutamate/GABA receptor signaling, were possibly affected in PEN-SCZ. To the best of our knowledge, this is the first genome-wide CNV-miRNA study suggesting the role of CNV-miRNAs in the etiology of PEN-SCZ, through effects on genes related to glycation/oxidative stress and synaptic function. Our findings provide supportive evidence that glycation/oxidative stress possibly caused by genetic defects related to the posttranscriptional modification may lead to synaptic dysfunction. Therefore, targeting miRNAs may be one of the promising approaches for the treatment of PEN-SCZ.

Download Full-text

Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles

Proceedings of the National Academy of Sciences ◽

10.1073/pnas.0506580102 ◽

2005 ◽

Vol 102 (43) ◽

pp. 15545-15550 ◽

Cited By ~ 18155

Author(s):

A. Subramanian ◽

P. Tamayo ◽

V. K. Mootha ◽

S. Mukherjee ◽

B. L. Ebert ◽

...

Keyword(s):

Expression Profiles ◽

Enrichment Analysis ◽

Gene Set Enrichment Analysis ◽

Gene Set Enrichment ◽

Gene Set ◽

Knowledge Based ◽

Genome Wide ◽

Genome Wide Expression

Download Full-text

Faculty Opinions recommendation of Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.1030064.356415 ◽

2006 ◽

Author(s):

Andrew Emili

Keyword(s):

Expression Profiles ◽

Enrichment Analysis ◽

Gene Set Enrichment Analysis ◽

Gene Set Enrichment ◽

Gene Set ◽

Knowledge Based ◽

Genome Wide ◽

Genome Wide Expression

Download Full-text

Statistical power of gene-set enrichment analysis is a function of gene set correlation structure

10.1101/186288 ◽

2017 ◽

Author(s):

David M. Swanson

Keyword(s):

Statistical Power ◽

Sequence Data ◽

Enrichment Analysis ◽

Analytical Framework ◽

Gene Set Enrichment Analysis ◽

Correlation Structure ◽

Gene Set ◽

Type 1 Error ◽

Gene Set Testing ◽

Supplementary Material

AbstractMotivation:We describe why statistical power for both self-contained and competitive gene-set tests is a function of the correlation structure of co-expressed genes, and why this characteristic is undesirable for gene-set analyses. Variable statistical power as a function of gene correlation structure has not been observed or studied previously. The observation is important in part because gene-set testing methodology is well-developed, yet this fundamental feature of many of its tests is unknown and has the potential to reinterpret past gene-set test results and guide future implementations, including those using sequence data. Type 1 error inflation is also amenable for study in our statistical framework; while it has been well-studied and described previously for both self-contained and competitive tests, it has less often been done in an analytical framework. Our observations apply to four commonly-used gene-set testing approaches for microarrays, including CAMERA, ROAST, SAFE, and GAGE, and a recently proposed one for RNAseq, MAST.Results:We characterize situations in which power is especially small relative to effect sizes of genes in a set for both competitive and self-contained gene-set tests. We propose three alternative tests, one of which replicates the properties of permutation-based self-contained tests, but avoids the need for even recently proposed, rotation-based approximations to permutations. The two other proposed tests have the unique property that statistical power is not a function of co-expression correlation in the gene-set and therefore is the preferred methodology. We compare our proposed tests to leading gene-set tests and apply them to an already-published study of smoking exposure on pregnant women.Contact:[email protected] Material:Online supplementary material includes additional simulation results supporting the relationship between the “mixed” and “directional” gene-set tests of ROAST and closed-form implementations of them.

Download Full-text

Faculty Opinions recommendation of Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.1030064.793476750 ◽

2013 ◽

Author(s):

Laetitia Davidovic

Keyword(s):

Expression Profiles ◽

Enrichment Analysis ◽

Gene Set Enrichment Analysis ◽

Gene Set Enrichment ◽

Gene Set ◽

Knowledge Based ◽

Genome Wide ◽

Genome Wide Expression

Download Full-text