Importance of SNP Dependency Correction and Association Integration for Gene Set Analysis in Genome-Wide Association Studies

A typical genome-wide association study (GWAS) analyzes millions of single-nucleotide polymorphisms (SNPs), several of which are in a region of the same gene. To conduct gene set analysis (GSA), information from SNPs needs to be unified at the gene level. A widely used practice is to use only the most relevant SNP per gene; however, there are other methods of integration that could be applied here. Also, the problem of nonrandom association of alleles at two or more loci is often neglected. Here, we tested the impact of incorporation of different integrations and linkage disequilibrium (LD) correction on the performance of several GSA methods. Matched normal and breast cancer samples from The Cancer Genome Atlas database were used to evaluate the performance of six GSA algorithms: Coincident Extreme Ranks in Numerical Observations (CERNO), Gene Set Enrichment Analysis (GSEA), GSEA-SNP, improved GSEA for GWAS (i-GSEA4GWAS), Meta-Analysis Gene-set Enrichment of variaNT Associations (MAGENTA), and Over-Representation Analysis (ORA). Association of SNPs to phenotype was calculated using modified McNemar’s test. Results for SNPs mapped to the same gene were integrated using Fisher and Stouffer methods and compared with the minimum p-value method. Four common measures were used to quantify the performance of all combinations of methods. Results of GSA analysis on GWAS were compared to the one performed on gene expression data. Comparing all evaluation metrics across different GSA algorithms, integrations, and LD correction, we highlighted CERNO, and MAGENTA with Stouffer as the most efficient. Applying LD correction increased prioritization and specificity of enrichment outcomes for all tested algorithms. When Fisher or Stouffer were used with LD, sensitivity and reproducibility were also better. Using any integration method was beneficial in comparison with a minimum p-value method in specific combinations. The correlation between GSA results from genomic and transcriptomic level was the highest when Stouffer integration was combined with LD correction. We thoroughly evaluated different approaches to GSA in GWAS in terms of performance to guide others to select the most effective combinations. We showed that LD correction and Stouffer integration could increase the performance of enrichment analysis and encourage the usage of these techniques.

Download Full-text

GSEA-SNP: applying gene set enrichment analysis to SNP data from genome-wide association studies

Bioinformatics ◽

10.1093/bioinformatics/btn516 ◽

2008 ◽

Vol 24 (23) ◽

pp. 2784-2785 ◽

Cited By ~ 119

Author(s):

Marit Holden ◽

Shiwei Deng ◽

Leszek Wojnowski ◽

Bettina Kulle

Keyword(s):

Association Studies ◽

Enrichment Analysis ◽

Gene Set Enrichment Analysis ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Gene Set Enrichment ◽

Gene Set ◽

Snp Data ◽

Genome Wide

Download Full-text

Association Signals Unveiled by a Comprehensive Gene Set Enrichment Analysis of Dental Caries Genome-Wide Association Studies

PLoS ONE ◽

10.1371/journal.pone.0072653 ◽

2013 ◽

Vol 8 (8) ◽

pp. e72653 ◽

Cited By ~ 11

Author(s):

Quan Wang ◽

Peilin Jia ◽

Karen T. Cuenco ◽

Zhen Zeng ◽

Eleanor Feingold ◽

...

Keyword(s):

Dental Caries ◽

Association Studies ◽

Enrichment Analysis ◽

Gene Set Enrichment Analysis ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Gene Set Enrichment ◽

Gene Set ◽

Genome Wide

Download Full-text

Genome-wide association analysis of hippocampal volume identifies enrichment of neurogenesis-related pathways

Scientific Reports ◽

10.1038/s41598-019-50507-3 ◽

2019 ◽

Vol 9 (1) ◽

Cited By ~ 4

Author(s):

Emrin Horgusluoglu-Moloch ◽

◽

Shannon L. Risacher ◽

Paul K. Crane ◽

Derrek Hibar ◽

...

Keyword(s):

Association Analysis ◽

Adult Neurogenesis ◽

Enrichment Analysis ◽

Hippocampal Volume ◽

Imaging Genetics ◽

Gene Set Enrichment Analysis ◽

Genome Wide Association ◽

Gene Set Enrichment ◽

Gene Set ◽

Genome Wide

Abstract Adult neurogenesis occurs in the dentate gyrus of the hippocampus during adulthood and contributes to sustaining the hippocampal formation. To investigate whether neurogenesis-related pathways are associated with hippocampal volume, we performed gene-set enrichment analysis using summary statistics from a large-scale genome-wide association study (N = 13,163) of hippocampal volume from the Enhancing Neuro Imaging Genetics through Meta-Analysis (ENIGMA) Consortium and two year hippocampal volume changes from baseline in cognitively normal individuals from Alzheimer’s Disease Neuroimaging Initiative Cohort (ADNI). Gene-set enrichment analysis of hippocampal volume identified 44 significantly enriched biological pathways (FDR corrected p-value < 0.05), of which 38 pathways were related to neurogenesis-related processes including neurogenesis, generation of new neurons, neuronal development, and neuronal migration and differentiation. For genes highly represented in the significantly enriched neurogenesis-related pathways, gene-based association analysis identified TESC, ACVR1, MSRB3, and DPP4 as significantly associated with hippocampal volume. Furthermore, co-expression network-based functional analysis of gene expression data in the hippocampal subfields, CA1 and CA3, from 32 normal controls showed that distinct co-expression modules were mostly enriched in neurogenesis related pathways. Our results suggest that neurogenesis-related pathways may be enriched for hippocampal volume and that hippocampal volume may serve as a potential phenotype for the investigation of human adult neurogenesis.

Download Full-text

Identification of Critical Core Genes of Sarcoma Based on Centrality Analysis of Networks Nodes

Journal of Medical Imaging and Health Informatics ◽

10.1166/jmihi.2020.3080 ◽

2020 ◽

Vol 10 (7) ◽

pp. 1776-1784

Author(s):

Shudong Wang ◽

Jixiao Wang ◽

Xinzeng Wang ◽

Yuanyuan Zhang ◽

Tao Yi

Keyword(s):

Association Studies ◽

Meta Analysis ◽

Complex Diseases ◽

Enrichment Analysis ◽

Gene Interaction ◽

Core Gene ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Gene Set ◽

Genome Wide

Genome-wide association studies (GWAS) are powerful tools for identifying pathogenic genes of complex diseases and revealing genetic structure of diseases. However, due to gene-to-gene interactions, only a part of the hereditary factors can be revealed. The meta-analysis based on GWAS can integrate gene expression data at multiple levels and reveal the complex relationship between genes. Therefore, we used meta-analysis to integrate GWAS data of sarcoma to establish complex networks and discuss their significant genes. Firstly, we established gene interaction networks based on the data of different subtypes of sarcoma to analyze the node centralities of genes. Secondly, we calculated the significant score of each gene according to the Staged Significant Gene Network Algorithm (SSGNA). Then, we obtained the critical gene set HYC of sarcoma by ranking the scores, and then combined Gene Ontology enrichment analysis and protein network analysis to further screen it. Finally, the critical core gene set Hcore containing 47 genes was obtained and validated by GEPIA analysis. Our method has certain generalization performance to the study of complex diseases with prior knowledge and it is a useful supplement to genome-wide association studies.

Download Full-text

Gene Set Analysis and Network Analysis for Genome-Wide Association Studies

Cold Spring Harbor Protocols ◽

10.1101/pdb.top065581 ◽

2011 ◽

Vol 2011 (9) ◽

pp. pdb.top065581-pdb.top065581 ◽

Cited By ~ 14

Author(s):

I. Pedroso ◽

G. Breen

Keyword(s):

Network Analysis ◽

Association Studies ◽

Genome Wide Association ◽

Gene Set Analysis ◽

Genome Wide Association Studies ◽

Gene Set ◽

Genome Wide

Download Full-text

Assessing gene length biases in gene set analysis of Genome-Wide Association Studies

International Journal of Computational Biology and Drug Design ◽

10.1504/ijcbdd.2010.038394 ◽

2010 ◽

Vol 3 (4) ◽

pp. 297 ◽

Cited By ~ 8

Author(s):

Peilin Jia ◽

Jian Tian ◽

Zhongming Zhao

Keyword(s):

Association Studies ◽

Genome Wide Association ◽

Gene Set Analysis ◽

Genome Wide Association Studies ◽

Gene Length ◽

Gene Set ◽

Genome Wide

Download Full-text

Identifying insomnia-related chemicals through integrative analysis of genome-wide association studies and chemical–genes interaction information

SLEEP ◽

10.1093/sleep/zsaa042 ◽

2020 ◽

Vol 43 (9) ◽

Author(s):

Om Prakash Kafle ◽

Shiqiang Cheng ◽

Mei Ma ◽

Ping Li ◽

Bolun Cheng ◽

...

Keyword(s):

Association Studies ◽

Meta Analysis ◽

Enrichment Analysis ◽

Modern Society ◽

Environmental Chemicals ◽

Gene Set Enrichment Analysis ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Genome Wide ◽

Study Results

Abstract Study Objectives Insomnia is a common sleep disorder and constitutes a major issue in modern society. We provide new clues for revealing the association between environmental chemicals and insomnia. Methods Three genome-wide association studies (GWAS) summary datasets of insomnia (n = 113,006, n = 1,331,010, and n = 453,379, respectively) were driven from the UK Biobank, 23andMe, and deCODE. The chemical–gene interaction dataset was downloaded from the Comparative Toxicogenomics Database. First, we conducted a meta-analysis of the three datasets of insomnia using the METAL software. Using the result of meta-analysis, transcriptome-wide association studies were performed to calculate the expression association testing statistics of insomnia. Then chemical-related gene set enrichment analysis (GSEA) was used to explore the association between chemicals and insomnia. Results For GWAS meta-analysis dataset of insomnia, we identified 42 chemicals associated with insomnia in brain tissue (p < 0.05) by GSEA. We detected five important chemicals such as pinosylvin (p = 0.0128), bromobenzene (p = 0.0134), clonidine (p = 0.0372), gabapentin (p = 0.0372), and melatonin (p = 0.0404) which are directly associated with insomnia. Conclusion Our study results provide new clues for revealing the roles of environmental chemicals in the development of insomnia.

Download Full-text

Gowinda: unbiased analysis of gene set enrichment for genome-wide association studies

Bioinformatics ◽

10.1093/bioinformatics/bts315 ◽

2012 ◽

Vol 28 (15) ◽

pp. 2084-2085 ◽

Cited By ~ 65

Author(s):

R. Kofler ◽

C. Schlotterer

Keyword(s):

Association Studies ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Gene Set Enrichment ◽

Gene Set ◽

Genome Wide

Download Full-text

Gene Set Analysis of SNP Data from Genome-wide Association Studies

Bioinformatics in Aquaculture ◽

10.1002/9781118782392.ch24 ◽

2017 ◽

pp. 434-459

Author(s):

Shikai Liu ◽

Peng Zeng ◽

Zhanjiang Liu

Keyword(s):

Association Studies ◽

Genome Wide Association ◽

Gene Set Analysis ◽

Genome Wide Association Studies ◽

Gene Set ◽

Snp Data ◽

Genome Wide

Download Full-text

Integrating Genome-Wide Association and eQTLs Studies Identifies the Genes and Gene Sets Associated with Diabetes

BioMed Research International ◽

10.1155/2017/1758636 ◽

2017 ◽

Vol 2017 ◽

pp. 1-4 ◽

Cited By ~ 2

Author(s):

Xiao Liang ◽

Awen He ◽

Wenyu Wang ◽

Li Liu ◽

Yanan Du ◽

...

Keyword(s):

Fasting Glucose ◽

Association Studies ◽

Integrative Analysis ◽

Gene Set Enrichment Analysis ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Gene Set ◽

Gene Sets ◽

Genome Wide ◽

Summary Data

Aim. To identify novel candidate genes and gene sets for diabetes. Methods. We performed an integrative analysis of genome-wide association studies (GWAS) and expression quantitative trait loci (eQTLs) data for diabetes. Summary data was driven from a large-scale GWAS of diabetes, totally involving 58,070 individuals. eQTLs dataset included 923,021 cis-eQTL for 14,329 genes and 4,732 trans-eQTL for 2,612 genes. Integrative analysis of GWAS and eQTLs data was conducted by summary data-based Mendelian randomization (SMR). To identify the gene sets associated with diabetes, the SMR single gene analysis results were further subjected to gene set enrichment analysis (GSEA). A total of 13,311 annotated gene sets were analyzed in this study. Results. SMR analysis identified 6 genes significantly associated with fasting glucose, such as C11ORF10 (p value = 6.04 × 10−8), MRPL33 (p value = 1.24 × 10−7), and FADS1 (p value = 2.39 × 10−7). Gene set analysis identified HUANG_FOXA2_TARGETS_UP (false discovery rate = 0.047) associated with fasting glucose. Conclusion. Our study provides novel clues for clarifying the genetic mechanism of diabetes. This study also illustrated the good performance of SMR approach and extended it to gene set association analysis for complex diseases.

Download Full-text