Classic genome-wide association methods are unlikely to identify causal variants in strongly clonal microbial populations.

Mapping Intimacies ◽

10.1101/2021.06.30.450606 ◽

2021 ◽

Author(s):

Peter E Chen ◽

B. Jesse Shapiro

Keyword(s):

False Positive ◽

Mixed Model ◽

Association Studies ◽

Random Effect ◽

Genome Wide Association ◽

Relationship Matrix ◽

Genome Wide Association Studies ◽

Genome Wide ◽

Counting Methods ◽

Causal Variants

Since the advent of genome-wide association studies (GWAS) in human genomes, an increasing sophistication of methods has been developed for more robust association detection. Currently, the backbone of human GWAS approaches is allele-counting-based methods where the signal of association is derived from alleles that are identical-by-state. Borrowing this approach from human GWAS, allele-counting-based methods have been popularized in microbial GWAS, notably the generalized linear model using either dimension reduction for fixed covariates and/or a genetic relationship matrix as a random effect in a mixed model to control for population stratification. In this work, we show how the effects of linkage disequilibrium (LD) can potentially obscure true-positive genotype-phenotype associations (i.e., genetic variants causally associated with the phenotype of interest) and also lead to unacceptably high rates of false-positive associations when applying these classical approaches to GWAS in weakly recombining microbial genomes. We developed a GWAS method called POUTINE (https://github.com/Peter-Two-Point-O/POUTINE), which relies on homoplastic mutation to both clarify the source of putative causal variants and reduce likely false-positive associations compared to traditional allele counting methods. Using datasets of M. tuberculosis genomes and antibiotic-resistance phenotypes, we show that LD can in fact render all association signals from allele counting methods to be fully indistinguishable from hundreds to thousands of sites scattered across an entire genome. These classic GWAS methods thus fail to pinpoint likely causal genotype-phenotype associations and separate them from background noise, even after applying methods to correct for population structure. We therefore urge caution when utilizing classical approaches, particularly in populations that are strongly clonal.

Download Full-text

Editing GWAS: experimental approaches to dissect and exploit disease-associated genetic variation

Genome Medicine ◽

10.1186/s13073-021-00857-3 ◽

2021 ◽

Vol 13 (1) ◽

Author(s):

Shuquan Rao ◽

Yao Yao ◽

Daniel E. Bauer

Keyword(s):

Genome Editing ◽

Genetic Variants ◽

Association Studies ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Functional Studies ◽

Functional Genetics ◽

Genome Wide ◽

Causal Variants ◽

Experimental Approaches

AbstractGenome-wide association studies (GWAS) have uncovered thousands of genetic variants that influence risk for human diseases and traits. Yet understanding the mechanisms by which these genetic variants, mainly noncoding, have an impact on associated diseases and traits remains a significant hurdle. In this review, we discuss emerging experimental approaches that are being applied for functional studies of causal variants and translational advances from GWAS findings to disease prevention and treatment. We highlight the use of genome editing technologies in GWAS functional studies to modify genomic sequences, with proof-of-principle examples. We discuss the challenges in interrogating causal variants, points for consideration in experimental design and interpretation of GWAS locus mechanisms, and the potential for novel therapeutic opportunities. With the accumulation of knowledge of functional genetics, therapeutic genome editing based on GWAS discoveries will become increasingly feasible.

Download Full-text

CAUSALdb: a database for disease/trait causal variants identified using summary statistics of genome-wide association studies

Nucleic Acids Research ◽

10.1093/nar/gkz1026 ◽

2019 ◽

Cited By ~ 2

Author(s):

Jianhua Wang ◽

Dandan Huang ◽

Yao Zhou ◽

Hongcheng Yao ◽

Huanhuan Liu ◽

...

Keyword(s):

Fine Mapping ◽

Genetic Variants ◽

Association Studies ◽

Complex Trait ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Summary Statistics ◽

Genome Wide ◽

Credible Sets ◽

Causal Variants

Abstract Genome-wide association studies (GWASs) have revolutionized the field of complex trait genetics over the past decade, yet for most of the significant genotype-phenotype associations the true causal variants remain unknown. Identifying and interpreting how causal genetic variants confer disease susceptibility is still a big challenge. Herein we introduce a new database, CAUSALdb, to integrate the most comprehensive GWAS summary statistics to date and identify credible sets of potential causal variants using uniformly processed fine-mapping. The database has six major features: it (i) curates 3052 high-quality, fine-mappable GWAS summary statistics across five human super-populations and 2629 unique traits; (ii) estimates causal probabilities of all genetic variants in GWAS significant loci using three state-of-the-art fine-mapping tools; (iii) maps the reported traits to a powerful ontology MeSH, making it simple for users to browse studies on the trait tree; (iv) incorporates highly interactive Manhattan and LocusZoom-like plots to allow visualization of credible sets in a single web page more efficiently; (v) enables online comparison of causal relations on variant-, gene- and trait-levels among studies with different sample sizes or populations and (vi) offers comprehensive variant annotations by integrating massive base-wise and allele-specific functional annotations. CAUSALdb is freely available at http://mulinlab.org/causaldb.

Download Full-text

Identification of Novel Genome-Wide Associations for Oral Inflammatory Traits

10.21203/rs.3.rs-104530/v1 ◽

2020 ◽

Author(s):

Yanjiao Jin ◽

Jie Yang ◽

Shuyue Zhang ◽

Jin Li ◽

Songlin Wang

Keyword(s):

Candidate Genes ◽

Immune Regulation ◽

Functional Annotation ◽

Inflammatory Diseases ◽

Association Studies ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Protein Protein Interaction ◽

Genome Wide ◽

Causal Variants

Abstract Background: Oral diseases impact the majority of the world’s population. The following traits are common in oral inflammatory diseases: mouth ulcers, painful gums, bleeding gums, loose teeth, and toothache. Despite the prevalence of genome-wide association studies, the associations between these traits and common genomic variants, and whether pleiotropic loci are shared by some of these traits remain poorly understood. Methods: In this work, we conducted multi-trait joint analyses based on the summary statistics of genome-wide association studies of these five oral inflammatory traits from the UK Biobank, each of which is comprised of over 10,000 cases and over 300,000 controls. We estimated the genetic correlations between the five traits. We conducted fine-mapping and functional annotation based on multi-omics data to better understand the biological functions of the potential causal variants at each locus. To identify the pathways in which the candidate genes were mainly involved, we applied gene-set enrichment analysis, and further performed protein-protein interaction (PPI) analyses.Results: We identified 39 association signals that surpassed genome-wide significance, including three that were shared between two or more oral inflammatory traits, consistent with a strong correlation. Among these genome-wide significant loci, two were novel for both painful gums and toothache. We performed fine-mapping and identified causal variants at each novel locus. Further functional annotation based on multi-omics data suggested IL10 and IL12A/TRIM59 as potential candidate genes at the novel pleiotropic loci, respectively. Subsequent analyses of pathway enrichment and protein-protein interaction networks suggested the involvement of candidate genes at genome-wide significant loci in immune regulation.Conclusions: Our results highlighted the importance of immune regulation in the pathogenesis of oral inflammatory diseases. Some common immune-related pleiotropic loci or genetic variants are shared by multiple oral inflammatory traits. These findings will be beneficial for risk prediction, prevention, and therapy of oral inflammatory diseases.

Download Full-text

GWAS-Flow: A GPU accelerated framework for efficient permutation based genome-wide association studies

10.1101/783100 ◽

2019 ◽

Cited By ~ 2

Author(s):

Jan A. Freudenthal ◽

Markus J. Ankenbrand ◽

Dominik G. Grimm ◽

Arthur Korte

Keyword(s):

Complex Traits ◽

Mixed Model ◽

Linear Mixed Model ◽

Association Studies ◽

Large Datasets ◽

Genome Wide Association ◽

Small Data ◽

Genome Wide Association Studies ◽

Genome Wide ◽

Non Gaussian

AbstractMotivationGenome-wide association studies (GWAS) are one of the most commonly used methods to detect associations between complex traits and genomic polymorphisms. As both genotyping and phenotyping of large populations has become easier, typical modern GWAS have to cope with massive amounts of data. Thus, the computational demand for these analyses grew remarkably during the last decades. This is especially true, if one wants to implement permutation-based significance thresholds, instead of using the naïve Bonferroni threshold. Permutation-based methods have the advantage to provide an adjusted multiple hypothesis correction threshold that takes the underlying phenotypic distribution into account and will thus remove the need to find the correct transformation for non Gaussian phenotypes. To enable efficient analyses of large datasets and the possibility to compute permutation-based significance thresholds, we used the machine learning framework TensorFlow to develop a linear mixed model (GWAS-Flow) that can make use of the available CPU or GPU infrastructure to decrease the time of the analyses especially for large datasets.ResultsWe were able to show that our application GWAS-Flow outperforms custom GWAS scripts in terms of speed without loosing accuracy. Apart from p-values, GWAS-Flow also computes summary statistics, such as the effect size and its standard error for each individual marker. The CPU-based version is the default choice for small data, while the GPU-based version of GWAS-Flow is especially suited for the analyses of big data.AvailabilityGWAS-Flow is freely available on GitHub (https://github.com/Joyvalley/GWAS_Flow) and is released under the terms of the MIT-License.

Download Full-text

Variable selection in heterogeneous datasets: A truncated-rank sparse linear mixed model with applications to genome-wide association studies

Methods ◽

10.1016/j.ymeth.2018.04.021 ◽

2018 ◽

Vol 145 ◽

pp. 2-9 ◽

Cited By ~ 1

Author(s):

Haohan Wang ◽

Bryon Aragam ◽

Eric P. Xing

Keyword(s):

Variable Selection ◽

Mixed Model ◽

Linear Mixed Model ◽

Association Studies ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Genome Wide ◽

Heterogeneous Datasets

Download Full-text

Genome-Wide Association Studies Reveal Susceptibility Loci for Digital Dermatitis in Holstein Cattle

Animals ◽

10.3390/ani10112009 ◽

2020 ◽

Vol 10 (11) ◽

pp. 2009

Author(s):

Ellen Lai ◽

Alexa L. Danner ◽

Thomas R. Famula ◽

Anita M. Oberbauer

Keyword(s):

Predictive Value ◽

Mixed Model ◽

Linear Mixed Model ◽

Bos Taurus ◽

Association Studies ◽

Bayesian Regression ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Digital Dermatitis ◽

Genome Wide

Digital dermatitis (DD) causes lameness in dairy cattle. To detect the quantitative trait loci (QTL) associated with DD, genome-wide association studies (GWAS) were performed using high-density single nucleotide polymorphism (SNP) genotypes and binary case/control, quantitative (average number of FW per hoof trimming record) and recurrent (cases with ≥2 DD episodes vs. controls) phenotypes from cows across four dairies (controls n = 129 vs. FW n = 85). Linear mixed model (LMM) and random forest (RF) approaches identified the top SNPs, which were used as predictors in Bayesian regression models to assess the SNP predictive value. The LMM and RF analyses identified QTL regions containing candidate genes on Bos taurus autosome (BTA) 2 for the binary and recurrent phenotypes and BTA7 and 20 for the quantitative phenotype that related to epidermal integrity, immune function, and wound healing. Although larger sample sizes are necessary to reaffirm these small effect loci amidst a strong environmental effect, the sample cohort used in this study was sufficient for estimating SNP effects with a high predictive value.

Download Full-text

Variable selection in heterogeneous datasets: A truncated-rank sparse linear mixed model with applications to genome-wide association studies

2017 IEEE International Conference on Bioinformatics and Biomedicine (BIBM) ◽

10.1109/bibm.2017.8217687 ◽

2017 ◽

Cited By ~ 9

Author(s):

Haohan Wang ◽

Bryon Aragam ◽

Eric P. Xing

Keyword(s):

Variable Selection ◽

Mixed Model ◽

Linear Mixed Model ◽

Association Studies ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Genome Wide ◽

Heterogeneous Datasets

Download Full-text

Correction: Iterative Usage of Fixed and Random Effect Models for Powerful and Efficient Genome-Wide Association Studies

PLoS Genetics ◽

10.1371/journal.pgen.1005957 ◽

2016 ◽

Vol 12 (3) ◽

pp. e1005957 ◽

Cited By ~ 4

Author(s):

Keyword(s):

Association Studies ◽

Random Effect ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Genome Wide ◽

Random Effect Models

Download Full-text

Genome-wide association and genomic selection in animal breedingThis article is one of a selection of papers from the conference “Exploiting Genome-wide Association in Oilseed Brassicas: a model for genetic improvement of major OECD crops for sustainable farming”.

Genome ◽

10.1139/g10-076 ◽

2010 ◽

Vol 53 (11) ◽

pp. 876-883 ◽

Cited By ~ 135

Author(s):

Ben Hayes ◽

Mike Goddard

Keyword(s):

Genomic Selection ◽

Complex Traits ◽

Association Studies ◽

Genome Wide Association ◽

Relationship Matrix ◽

Genome Wide Association Studies ◽

Simple Method ◽

Breeding Values ◽

Genome Wide ◽

A Genome

Results from genome-wide association studies in livestock, and humans, has lead to the conclusion that the effect of individual quantitative trait loci (QTL) on complex traits, such as yield, are likely to be small; therefore, a large number of QTL are necessary to explain genetic variation in these traits. Given this genetic architecture, gains from marker-assisted selection (MAS) programs using only a small number of DNA markers to trace a limited number of QTL is likely to be small. This has lead to the development of alternative technology for using the available dense single nucleotide polymorphism (SNP) information, called genomic selection. Genomic selection uses a genome-wide panel of dense markers so that all QTL are likely to be in linkage disequilibrium with at least one SNP. The genomic breeding values are predicted to be the sum of the effect of these SNPs across the entire genome. In dairy cattle breeding, the accuracy of genomic estimated breeding values (GEBV) that can be achieved and the fact that these are available early in life have lead to rapid adoption of the technology. Here, we discuss the design of experiments necessary to achieve accurate prediction of GEBV in future generations in terms of the number of markers necessary and the size of the reference population where marker effects are estimated. We also present a simple method for implementing genomic selection using a genomic relationship matrix. Future challenges discussed include using whole genome sequence data to improve the accuracy of genomic selection and management of inbreeding through genomic relationships.

Download Full-text