The Impact of Incomplete Linkage Disequilibrium and Genetic Model Choice on the Analysis and Interpretation of Genome-wide Association Studies

A diverse population (429 member) of canola (Brassica napus L.) consisting primarily of winter biotypes was assembled and used in genome-wide association studies. Genotype by sequencing analysis of the population identified and mapped 290,972 high-quality markers ranging from 18.5 to 82.4% missing markers per line and an average of 36.8%. After interpolation, 251,575 high-quality markers remained. After filtering for markers with low minor allele counts (count > 5), we were left with 190,375 markers. The average distance between these markers is 4463 bases with a median of 69 and a range from 1 to 281,248 bases. The heterozygosity among the imputed population ranges from 0.9 to 11.0% with an average of 5.4%. The filtered and imputed dataset was used to determine population structure and kinship, which indicated that the population had minimal structure with the best K value of 2–3. These results also indicated that the majority of the population has substantial sequence from a single population with sub-clusters of, and admixtures with, a very small number of other populations. Analysis of chromosomal linkage disequilibrium decay ranged from ~7 Kb for chromosome A01 to ~68 Kb for chromosome C01. Local linkage decay rates determined for all 500 kb windows with a 10kb sliding step indicated a wide range of linkage disequilibrium decay rates, indicating numerous crossover hotspots within this population, and provide a resource for determining the likely limits of linkage disequilibrium from any given marker in which to identify candidate genes. This population and the resources provided here should serve as helpful tools for investigating genetics in winter canola.

Download Full-text

The Impact of Improved Microarray Coverage and Larger Sample Sizes on Future Genome-Wide Association Studies

Genetic Epidemiology ◽

10.1002/gepi.21724 ◽

2013 ◽

Vol 37 (4) ◽

pp. 383-392 ◽

Cited By ~ 16

Author(s):

Karla J. Lindquist ◽

Eric Jorgenson ◽

Thomas J. Hoffmann ◽

John S. Witte

Keyword(s):

Association Studies ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Sample Sizes ◽

Genome Wide ◽

The Impact ◽

Larger Sample

Download Full-text

A hierarchical Bayesian network approach for linkage disequilibrium modeling and data-dimensionality reduction prior to genome-wide association studies

BMC Bioinformatics ◽

10.1186/1471-2105-12-16 ◽

2011 ◽

Vol 12 (1) ◽

Cited By ~ 26

Author(s):

Raphaël Mourad ◽

Christine Sinoquet ◽

Philippe Leray

Keyword(s):

Linkage Disequilibrium ◽

Dimensionality Reduction ◽

Bayesian Network ◽

Association Studies ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Hierarchical Bayesian ◽

Network Approach ◽

Genome Wide ◽

Data Dimensionality Reduction

Download Full-text

Importance of SNP Dependency Correction and Association Integration for Gene Set Analysis in Genome-Wide Association Studies

Frontiers in Genetics ◽

10.3389/fgene.2021.767358 ◽

2021 ◽

Vol 12 ◽

Author(s):

Michal Marczyk ◽

Agnieszka Macioszek ◽

Joanna Tobiasz ◽

Joanna Polanska ◽

Joanna Zyla

Keyword(s):

Association Studies ◽

Enrichment Analysis ◽

Gene Set Enrichment Analysis ◽

Genome Wide Association ◽

Gene Set Analysis ◽

Genome Wide Association Studies ◽

Gene Set Enrichment ◽

Gene Set ◽

Genome Wide ◽

The Impact

A typical genome-wide association study (GWAS) analyzes millions of single-nucleotide polymorphisms (SNPs), several of which are in a region of the same gene. To conduct gene set analysis (GSA), information from SNPs needs to be unified at the gene level. A widely used practice is to use only the most relevant SNP per gene; however, there are other methods of integration that could be applied here. Also, the problem of nonrandom association of alleles at two or more loci is often neglected. Here, we tested the impact of incorporation of different integrations and linkage disequilibrium (LD) correction on the performance of several GSA methods. Matched normal and breast cancer samples from The Cancer Genome Atlas database were used to evaluate the performance of six GSA algorithms: Coincident Extreme Ranks in Numerical Observations (CERNO), Gene Set Enrichment Analysis (GSEA), GSEA-SNP, improved GSEA for GWAS (i-GSEA4GWAS), Meta-Analysis Gene-set Enrichment of variaNT Associations (MAGENTA), and Over-Representation Analysis (ORA). Association of SNPs to phenotype was calculated using modified McNemar’s test. Results for SNPs mapped to the same gene were integrated using Fisher and Stouffer methods and compared with the minimum p-value method. Four common measures were used to quantify the performance of all combinations of methods. Results of GSA analysis on GWAS were compared to the one performed on gene expression data. Comparing all evaluation metrics across different GSA algorithms, integrations, and LD correction, we highlighted CERNO, and MAGENTA with Stouffer as the most efficient. Applying LD correction increased prioritization and specificity of enrichment outcomes for all tested algorithms. When Fisher or Stouffer were used with LD, sensitivity and reproducibility were also better. Using any integration method was beneficial in comparison with a minimum p-value method in specific combinations. The correlation between GSA results from genomic and transcriptomic level was the highest when Stouffer integration was combined with LD correction. We thoroughly evaluated different approaches to GSA in GWAS in terms of performance to guide others to select the most effective combinations. We showed that LD correction and Stouffer integration could increase the performance of enrichment analysis and encourage the usage of these techniques.

Download Full-text

Host Genome-Wide Association Study of Infant Susceptibility to Shigella-Associated Diarrhea

Infection and Immunity ◽

10.1128/iai.00012-21 ◽

2021 ◽

Vol 89 (6) ◽

Author(s):

Dylan Duchen ◽

Rashidul Haque ◽

Laura Chen ◽

Genevieve Wojcik ◽

Poonum Korpe ◽

...

Keyword(s):

Genome Wide Association Study ◽

Diarrheal Disease ◽

Genetic Model ◽

Association Studies ◽

Host Cells ◽

Genome Wide Association ◽

Joint Analysis ◽

Genome Wide Association Studies ◽

Genome Wide ◽

Host Genetic

ABSTRACT Shigella is a leading cause of moderate-to-severe diarrhea globally and the causative agent of shigellosis and bacillary dysentery. Associated with 80 to 165 million cases of diarrhea and >13% of diarrheal deaths, in many regions, Shigella exposure is ubiquitous while infection is heterogenous. To characterize host-genetic susceptibility to Shigella-associated diarrhea, we performed two independent genome-wide association studies (GWAS) including Bangladeshi infants from the PROVIDE and CBC birth cohorts in Dhaka, Bangladesh. Cases were infants with Shigella-associated diarrhea (n = 143) and controls were infants with no Shigella-associated diarrhea in the first 13 months of life (n = 446). Shigella-associated diarrhea was identified via quantitative PCR (qPCR) threshold cycle (CT) distributions for the ipaH gene, carried by all four Shigella species and enteroinvasive Escherichia coli. Host GWAS were performed under an additive genetic model. A joint analysis identified protective loci on chromosomes 11 (rs582240, within the KRT18P59 pseudogene; P = 6.40 × 10−8; odds ratio [OR], 0.43) and 8 (rs12550437, within the lincRNA RP11-115J16.1; P = 1.49 × 10−7; OR, 0.48). Conditional analyses identified two previously suggestive loci, a protective locus on chromosome 7 (rs10266841, within the 3′ untranslated region [UTR] of CYTH3; Pconditional = 1.48 × 10−7; OR, 0.44) and a risk-associated locus on chromosome 10 (rs2801847, an intronic variant within MPP7; Pconditional = 8.37 × 10−8; OR, 5.51). These loci have all been indirectly linked to bacterial type 3 secretion system (T3SS) activity, its components, and bacterial effectors delivered into host cells. Host genetic factors that may affect bacterial T3SS activity and are associated with the host response to Shigella-associated diarrhea may provide insight into vaccine and drug development efforts for Shigella-associated diarrheal disease.

Download Full-text

Faculty Opinions recommendation of Magnitude and distribution of linkage disequilibrium in population isolates and implications for genome-wide association studies.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.1032179.497533 ◽

2006 ◽

Author(s):

Tony Long

Keyword(s):

Linkage Disequilibrium ◽

Association Studies ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Genome Wide

Download Full-text

Faculty Opinions recommendation of Magnitude and distribution of linkage disequilibrium in population isolates and implications for genome-wide association studies.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.1032179.373886 ◽

2006 ◽

Author(s):

Karin Schmitt

Keyword(s):

Linkage Disequilibrium ◽

Association Studies ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Genome Wide

Download Full-text

Improving the detection of pathways in genome-wide association studies by combined effects of SNPs from Linkage Disequilibrium blocks

Scientific Reports ◽

10.1038/s41598-017-03826-2 ◽

2017 ◽

Vol 7 (1) ◽

Cited By ~ 4

Author(s):

Huiying Zhao ◽

Dale R. Nyholt ◽

Yuanhao Yang ◽

Jihua Wang ◽

Yuedong Yang

Keyword(s):

Linkage Disequilibrium ◽

Association Studies ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Combined Effects ◽

Genome Wide

Download Full-text

Fine-scale population structure in the UK Biobank: implications for genome-wide association studies

Human Molecular Genetics ◽

10.1093/hmg/ddaa157 ◽

2020 ◽

Vol 29 (16) ◽

pp. 2803-2811

Author(s):

James P Cook ◽

Anubha Mahajan ◽

Andrew P Morris

Keyword(s):

Population Structure ◽

Association Studies ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Fine Scale ◽

Uk Biobank ◽

Genome Wide ◽

Scale Population ◽

The Uk ◽

The Impact

Abstract The UK Biobank is a prospective study of more than 500 000 participants, which has aggregated data from questionnaires, physical measures, biomarkers, imaging and follow-up for a wide range of health-related outcomes, together with genome-wide genotyping supplemented with high-density imputation. Previous studies have highlighted fine-scale population structure in the UK on a North-West to South-East cline, but the impact of unmeasured geographical confounding on genome-wide association studies (GWAS) of complex human traits in the UK Biobank has not been investigated. We considered 368 325 white British individuals from the UK Biobank and performed GWAS of their birth location. We demonstrate that widely used approaches to adjust for population structure, including principal component analysis and mixed modelling with a random effect for a genetic relationship matrix, cannot fully account for the fine-scale geographical confounding in the UK Biobank. We observe significant genetic correlation of birth location with a range of lifestyle-related traits, including body-mass index and fat mass, hypertension and lung function, even after adjustment for population structure. Variants driving associations with birth location are also strongly associated with many of these lifestyle-related traits after correction for population structure, indicating that there could be environmental factors that are confounded with geography that have not been adequately accounted for. Our findings highlight the need for caution in the interpretation of lifestyle-related trait GWAS in UK Biobank, particularly in loci demonstrating strong residual association with birth location.

Download Full-text