Hogwash: three methods for genome-wide association studies in bacteria

Bacterial genome-wide association studies (bGWAS) capture associations between genomic variation and phenotypic variation. Convergence-based bGWAS methods identify genomic mutations that occur independently multiple times on the phylogenetic tree in the presence of phenotypic variation more often than is expected by chance. This work introduces hogwash, an open source R package that implements three algorithms for convergence-based bGWAS. Hogwash additionally contains two burden testing approaches to perform gene or pathway analysis to improve power and increase convergence detection for related but weakly penetrant genotypes. To identify optimal use cases, we applied hogwash to data simulated with a variety of phylogenetic signals and convergence distributions. These simulated data are publicly available and contain the relevant metadata regarding convergence and phylogenetic signal for each phenotype and genotype. Hogwash is available for download from GitHub.

Download Full-text

hogwash: Three Methods for Genome-Wide Association Studies in Bacteria

10.1101/2020.04.19.048421 ◽

2020 ◽

Cited By ~ 1

Author(s):

Katie Saund ◽

Evan S Snitkin

Keyword(s):

Phenotypic Variation ◽

Phylogenetic Signal ◽

Association Studies ◽

Bacterial Genome ◽

Simulated Data ◽

R Package ◽

Genomic Variation ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Genome Wide

Bacterial genome-wide association studies (bGWAS) capture associations between genomic variation and phenotypic variation. Convergence based bGWAS methods identify genomic mutations that occur independently multiple times on the phylogenetic tree in the presence of phenotypic variation more often than is expected by chance. This work introduces hogwash, an open source R package that implements three algorithms for convergence based bGWAS. Hogwash additionally contains two burden testing approaches to perform gene- or pathway-analysis to improve power and increase convergence detection for related but weakly penetrant genotypes. To identify optimal use cases, we applied hogwash to data simulated with a variety of phylogenetic signals and convergence distributions. These simulated data are publicly available and contain the relevant metadata regarding convergence and phylogenetic signal for each phenotype and genotype. Hogwash is available for download from GitHub.

Download Full-text

bGWAS: an R package to perform Bayesian genome wide association studies

Bioinformatics ◽

10.1093/bioinformatics/btaa549 ◽

2020 ◽

Vol 36 (15) ◽

pp. 4374-4376

Author(s):

Ninon Mounier ◽

Zoltán Kutalik

Keyword(s):

Mendelian Randomization ◽

Causal Effect ◽

Association Studies ◽

R Package ◽

Genome Wide Association ◽

Supplementary Information ◽

Genome Wide Association Studies ◽

Biological Mechanisms ◽

Genome Wide ◽

Related Risk

Abstract Summary Increasing sample size is not the only strategy to improve discovery in Genome Wide Association Studies (GWASs) and we propose here an approach that leverages published studies of related traits to improve inference. Our Bayesian GWAS method derives informative prior effects by leveraging GWASs of related risk factors and their causal effect estimates on the focal trait using multivariable Mendelian randomization. These prior effects are combined with the observed effects to yield Bayes Factors, posterior and direct effects. The approach not only increases power, but also has the potential to dissect direct and indirect biological mechanisms. Availability and implementation bGWAS package is freely available under a GPL-2 License, and can be accessed, alongside with user guides and tutorials, from https://github.com/n-mounier/bGWAS. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

A comment on two-locus epistatic interaction models for genome-wide association studies

Journal of Bioinformatics and Computational Biology ◽

10.1142/s0219720015710043 ◽

2015 ◽

Vol 13 (06) ◽

pp. 1571004

Author(s):

Kyung-Ah Sohn ◽

Kyubum Wee

Keyword(s):

Predictive Power ◽

Association Studies ◽

Simulated Data ◽

Genome Wide Association ◽

Disease Models ◽

Genome Wide Association Studies ◽

Epistatic Interactions ◽

Detection Algorithms ◽

Genome Wide ◽

Setting Parameters

Detection of epistatic interactions in genome-wide association studies is a computationally hard problem. Many detection algorithms have been proposed and will continue to be. Most of those algorithms measure their predictive power by running on simulated data many times under various disease models. However, we find that there have been subtle differences in interpreting the meaning of existing disease models among the previous studies on detection of epistatic interactions. We elucidate those differences and suggest that future studies on epistatic interactions in GWAS state explicitly which versions/interpretations are employed. We also provide a way to facilitate setting parameters of disease models.

Download Full-text

Novel method to estimate the phenotypic variation explained by genome-wide association studies reveals large fraction of the missing heritability

Genetic Epidemiology ◽

10.1002/gepi.20582 ◽

2011 ◽

Vol 35 (5) ◽

pp. 341-349 ◽

Cited By ~ 21

Author(s):

Zoltán Kutalik ◽

John Whittaker ◽

Dawn Waterworth ◽

Jacques S. Beckmann ◽

Sven Bergmann ◽

...

Keyword(s):

Phenotypic Variation ◽

Association Studies ◽

Large Fraction ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Missing Heritability ◽

Genome Wide ◽

Novel Method ◽

Variation Explained

Download Full-text

SumVg: Total heritability explained by all variants in genome-wide association studies based on summary statistics with standard error estimates

10.1101/016857 ◽

2015 ◽

Author(s):

Hon-Cheong SO ◽

Pak C. SHAM

Keyword(s):

Error Estimates ◽

Standard Error ◽

Association Studies ◽

Parametric Bootstrap ◽

R Package ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Summary Statistics ◽

Genome Wide ◽

Key Questions

Genome-wide association studies (GWAS) have become increasingly popular these days and one of the key questions is how much heritability could be explained by all variants in GWAS. We have previously proposed an approach to answer this question, based on recovering the "true" z-statistics from a set of observed z-statistics. Only summary statistics are required. However, methods for standard error (SE) estimation are not available yet, thereby limiting the interpretation of the results. In this study we developed resampling-based approaches to estimate the SE and the methods are implemented in an R package. We found that delete-d-jackknife and parametric bootstrap approaches provide good estimates of the SE. Methods to compute the sum of heritability explained and the corresponding SE are implemented in the R package SumVg, available at https://sites.google.com/site/honcheongso/software/var-totalvg

Download Full-text

CVRMS: Cross-validated Rank-based Marker Selection for Genome-wide Prediction of Low Heritability

10.1101/756130 ◽

2019 ◽

Author(s):

Seongmun Jeong ◽

Jae-Yoon Kim ◽

Namshin Kim

Keyword(s):

Genetic Information ◽

Association Studies ◽

R Package ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Marker Selection ◽

Genome Wide ◽

Precise Prediction ◽

Human Animal ◽

Selection For

AbstractCVRMS is an R package designed to extract marker subsets from repeated rank-based marker datasets generated from genome-wide association studies or marker effects for genome-wide prediction (https://github.com/lovemun/CVRMS). CVRMS provides an optimized genome-wide biomarker set with the best predictability of phenotype by implemented ridge regression using genetic information. Applying our method to human, animal, and plant datasets with wide heritability (zero to one), we selected hundreds to thousands of biomarkers for precise prediction.

Download Full-text

Mixed logistic regression in genome-wide association studies

BMC Bioinformatics ◽

10.1186/s12859-020-03862-2 ◽

2020 ◽

Vol 21 (1) ◽

Author(s):

Jacqueline Milet ◽

David Courtin ◽

André Garcia ◽

Hervé Perdry

Keyword(s):

Logistic Regression ◽

Linear Models ◽

Cox Model ◽

Association Studies ◽

Scale Up ◽

Score Test ◽

R Package ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Genome Wide

Abstract Background Mixed linear models (MLM) have been widely used to account for population structure in case-control genome-wide association studies, the status being analyzed as a quantitative phenotype. Chen et al. proved in 2016 that this method is inappropriate in some situations and proposed GMMAT, a score test for the mixed logistic regression (MLR). However, this test does not produces an estimation of the variants’ effects. We propose two computationally efficient methods to estimate the variants’ effects. Their properties and those of other methods (MLM, logistic regression) are evaluated using both simulated and real genomic data from a recent GWAS in two geographically close population in West Africa. Results We show that, when the disease prevalence differs between population strata, MLM is inappropriate to analyze binary traits. MLR performs the best in all circumstances. The variants’ effects are well evaluated by our methods, with a moderate bias when the effect sizes are large. Additionally, we propose a stratified QQ-plot, enhancing the diagnosis of p values inflation or deflation when population strata are not clearly identified in the sample. Conclusion The two proposed methods are implemented in the R package milorGWAS available on the CRAN. Both methods scale up to at least 10,000 individuals. The same computational strategies could be applied to other models (e.g. mixed Cox model for survival analysis).

Download Full-text

EpiGEN: an epistasis simulation pipeline

Bioinformatics ◽

10.1093/bioinformatics/btaa245 ◽

2020 ◽

Vol 36 (19) ◽

pp. 4957-4959

Author(s):

David B Blumenthal ◽

Lorenzo Viola ◽

Markus List ◽

Jan Baumbach ◽

Paolo Tieri ◽

...

Keyword(s):

Arbitrary Order ◽

Association Studies ◽

Simulated Data ◽

Genome Wide Association ◽

Supplementary Information ◽

Genome Wide Association Studies ◽

Nucleotide Polymorphisms ◽

Supplementary Data ◽

Single Nucleotide ◽

Genome Wide

Abstract Summary Simulated data are crucial for evaluating epistasis detection tools in genome-wide association studies. Existing simulators are limited, as they do not account for linkage disequilibrium (LD), support limited interaction models of single nucleotide polymorphisms (SNPs) and only dichotomous phenotypes or depend on proprietary software. In contrast, EpiGEN supports SNP interactions of arbitrary order, produces realistic LD patterns and generates both categorical and quantitative phenotypes. Availability and implementation EpiGEN is implemented in Python 3 and is freely available at https://github.com/baumbachlab/epigen. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text