Mixed logistic regression in genome-wide association studies

Abstract Background Mixed linear models (MLM) have been widely used to account for population structure in case-control genome-wide association studies, the status being analyzed as a quantitative phenotype. Chen et al. proved in 2016 that this method is inappropriate in some situations and proposed GMMAT, a score test for the mixed logistic regression (MLR). However, this test does not produces an estimation of the variants’ effects. We propose two computationally efficient methods to estimate the variants’ effects. Their properties and those of other methods (MLM, logistic regression) are evaluated using both simulated and real genomic data from a recent GWAS in two geographically close population in West Africa. Results We show that, when the disease prevalence differs between population strata, MLM is inappropriate to analyze binary traits. MLR performs the best in all circumstances. The variants’ effects are well evaluated by our methods, with a moderate bias when the effect sizes are large. Additionally, we propose a stratified QQ-plot, enhancing the diagnosis of p values inflation or deflation when population strata are not clearly identified in the sample. Conclusion The two proposed methods are implemented in the R package milorGWAS available on the CRAN. Both methods scale up to at least 10,000 individuals. The same computational strategies could be applied to other models (e.g. mixed Cox model for survival analysis).

Download Full-text

Mixed Logistic Regression in Genome-Wide Association Studies

10.1101/2020.01.17.910109 ◽

2020 ◽

Author(s):

Jacqueline Milet ◽

Hervé Perdry

Keyword(s):

Logistic Regression ◽

Linear Models ◽

Association Studies ◽

Score Test ◽

R Package ◽

Genome Wide Association ◽

Supplementary Information ◽

Genome Wide Association Studies ◽

Mixed Linear Models ◽

Genome Wide

AbstractMotivationMixed linear models (MLM) have been widely used to account for population structure in case-control genome-wide association studies, the status being analyzed as a quantitative phenotype. Chen et al. proved that this method is inappropriate and proposed a score test for the mixed logistic regression (MLR). However this test does not allow an estimation of the variants’ effects.ResultsWe propose two computationally efficient methods to estimate the variants’ effects. Their properties are evaluated on two simulations sets, and compared with other methods (MLM, logistic regression). MLR performs the best in all circumstances. The variants’ effects are well evaluated by our methods, with a moderate bias when the effect sizes are large. Additionally, we propose a stratified QQ-plot, enhancing the diagnosis of p-values inflation or deflation, when population strata are not clearly identified in the sample.AvailabilityAll methods are implemented in the R package milorGWAS available at https://github.com/genostats/[email protected] informationSupplementary data are available at Bioinformatics online.

Download Full-text

GWAS on your notebook: fast semi-parallel linear and logistic regression for genome-wide association studies

BMC Bioinformatics ◽

10.1186/1471-2105-14-166 ◽

2013 ◽

Vol 14 (1) ◽

Cited By ~ 20

Author(s):

Karolina Sikorska ◽

Emmanuel Lesaffre ◽

Patrick FJ Groenen ◽

Paul HC Eilers

Keyword(s):

Logistic Regression ◽

Association Studies ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Genome Wide

Download Full-text

bGWAS: an R package to perform Bayesian genome wide association studies

Bioinformatics ◽

10.1093/bioinformatics/btaa549 ◽

2020 ◽

Vol 36 (15) ◽

pp. 4374-4376

Author(s):

Ninon Mounier ◽

Zoltán Kutalik

Keyword(s):

Mendelian Randomization ◽

Causal Effect ◽

Association Studies ◽

R Package ◽

Genome Wide Association ◽

Supplementary Information ◽

Genome Wide Association Studies ◽

Biological Mechanisms ◽

Genome Wide ◽

Related Risk

Abstract Summary Increasing sample size is not the only strategy to improve discovery in Genome Wide Association Studies (GWASs) and we propose here an approach that leverages published studies of related traits to improve inference. Our Bayesian GWAS method derives informative prior effects by leveraging GWASs of related risk factors and their causal effect estimates on the focal trait using multivariable Mendelian randomization. These prior effects are combined with the observed effects to yield Bayes Factors, posterior and direct effects. The approach not only increases power, but also has the potential to dissect direct and indirect biological mechanisms. Availability and implementation bGWAS package is freely available under a GPL-2 License, and can be accessed, alongside with user guides and tutorials, from https://github.com/n-mounier/bGWAS. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

SumVg: Total heritability explained by all variants in genome-wide association studies based on summary statistics with standard error estimates

10.1101/016857 ◽

2015 ◽

Author(s):

Hon-Cheong SO ◽

Pak C. SHAM

Keyword(s):

Error Estimates ◽

Standard Error ◽

Association Studies ◽

Parametric Bootstrap ◽

R Package ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Summary Statistics ◽

Genome Wide ◽

Key Questions

Genome-wide association studies (GWAS) have become increasingly popular these days and one of the key questions is how much heritability could be explained by all variants in GWAS. We have previously proposed an approach to answer this question, based on recovering the "true" z-statistics from a set of observed z-statistics. Only summary statistics are required. However, methods for standard error (SE) estimation are not available yet, thereby limiting the interpretation of the results. In this study we developed resampling-based approaches to estimate the SE and the methods are implemented in an R package. We found that delete-d-jackknife and parametric bootstrap approaches provide good estimates of the SE. Methods to compute the sum of heritability explained and the corresponding SE are implemented in the R package SumVg, available at https://sites.google.com/site/honcheongso/software/var-totalvg

Download Full-text

CVRMS: Cross-validated Rank-based Marker Selection for Genome-wide Prediction of Low Heritability

10.1101/756130 ◽

2019 ◽

Author(s):

Seongmun Jeong ◽

Jae-Yoon Kim ◽

Namshin Kim

Keyword(s):

Genetic Information ◽

Association Studies ◽

R Package ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Marker Selection ◽

Genome Wide ◽

Precise Prediction ◽

Human Animal ◽

Selection For

AbstractCVRMS is an R package designed to extract marker subsets from repeated rank-based marker datasets generated from genome-wide association studies or marker effects for genome-wide prediction (https://github.com/lovemun/CVRMS). CVRMS provides an optimized genome-wide biomarker set with the best predictability of phenotype by implemented ridge regression using genetic information. Applying our method to human, animal, and plant datasets with wide heritability (zero to one), we selected hundreds to thousands of biomarkers for precise prediction.

Download Full-text

MultiMeta: an R package for meta-analyzing multi-phenotype genome-wide association studies: Table 1.

Bioinformatics ◽

10.1093/bioinformatics/btv222 ◽

2015 ◽

Vol 31 (16) ◽

pp. 2754-2756 ◽

Cited By ~ 6

Author(s):

D. Vuckovic ◽

P. Gasparini ◽

N. Soranzo ◽

V. Iotchkova

Keyword(s):

Association Studies ◽

R Package ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Genome Wide

Download Full-text

BlueSNP: R package for highly scalable genome-wide association studies using Hadoop clusters

Bioinformatics ◽

10.1093/bioinformatics/bts647 ◽

2012 ◽

Vol 29 (1) ◽

pp. 135-136 ◽

Cited By ~ 29

Author(s):

Hailiang Huang ◽

Sandeep Tata ◽

Robert J. Prill

Keyword(s):

Association Studies ◽

R Package ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Genome Wide ◽

Hadoop Clusters

Download Full-text

Neighbor GWAS: incorporating neighbor genotypic identity into genome-wide association studies of field herbivory

Heredity ◽

10.1038/s41437-020-00401-w ◽

2021 ◽

Author(s):

Yasuhiro Sato ◽

Eiji Yamamoto ◽

Kentaro K. Shimizu ◽

Atsushi J. Nagano

Keyword(s):

Complex Traits ◽

Association Studies ◽

Field Studies ◽

R Package ◽

Genome Wide Association ◽

Effective Range ◽

Individual Plant ◽

Genome Wide Association Studies ◽

Genome Wide ◽

Neighbor Effects

AbstractAn increasing number of field studies have shown that the phenotype of an individual plant depends not only on its genotype but also on those of neighboring plants; however, this fact is not taken into consideration in genome-wide association studies (GWAS). Based on the Ising model of ferromagnetism, we incorporated neighbor genotypic identity into a regression model, named “Neighbor GWAS”. Our simulations showed that the effective range of neighbor effects could be estimated using an observed phenotype when the proportion of phenotypic variation explained (PVE) by neighbor effects peaked. The spatial scale of the first nearest neighbors gave the maximum power to detect the causal variants responsible for neighbor effects, unless their effective range was too broad. However, if the effective range of the neighbor effects was broad and minor allele frequencies were low, there was collinearity between the self and neighbor effects. To suppress the false positive detection of neighbor effects, the fixed effect and variance components involved in the neighbor effects should be tested in comparison with a standard GWAS model. We applied neighbor GWAS to field herbivory data from 199 accessions of Arabidopsis thaliana and found that neighbor effects explained 8% more of the PVE of the observed damage than standard GWAS. The neighbor GWAS method provides a novel tool that could facilitate the analysis of complex traits in spatially structured environments and is available as an R package at CRAN (https://cran.rproject.org/package=rNeighborGWAS).

Download Full-text

MultiMeta: an R package for meta-analysing multi-phenotype genome-wide association studies

10.1101/013920 ◽

2015 ◽

Author(s):

Dragana Vuckovic ◽

Paolo Gasparini ◽

Nicole Soranzo ◽

Valentina Iotchkova

Keyword(s):

Multivariate Analysis ◽

Association Studies ◽

Meta Analysis ◽

R Package ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

New Methods ◽

Genome Wide ◽

Inverse Variance

Summary: As new methods for multivariate analysis of Genome Wide Association Studies (GWAS) become available, it is important to be able to combine results from different cohorts in a meta-analysis. The R package MultiMeta provides an implementation of the inverse-variance based method for meta-analysis, generalized to an n-dimensional setting. Availability: The R package MultiMeta can be downloaded from CRAN Contact: [email protected]

Download Full-text

rMVP: A Memory-efficient, Visualization-enhanced, and Parallel-accelerated tool for Genome-Wide Association Study

10.1101/2020.08.20.258491 ◽

2020 ◽

Cited By ~ 2

Author(s):

Lilin Yin ◽

Haohao Zhang ◽

Zhenshuang Tang ◽

Jingya Xu ◽

Dong Yin ◽

...

Keyword(s):

Genome Wide Association Study ◽

Association Studies ◽

Matrix Multiplication ◽

R Package ◽

Test Methods ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Efficient Design ◽

Genome Wide ◽

Memory Efficient

AbstractAlong with the development of high-throughout sequencing technologies, both sample size and number of SNPs are increasing rapidly in Genome-Wide Association Studies (GWAS) and the associated computation is more challenging than ever. Here we present a Memory-efficient, Visualization-enhanced, and Parallel-accelerated R package called “rMVP” to address the need for improved GWAS computation. rMVP can: (1) effectively process large GWAS data; (2) rapidly evaluate population structure; (3) efficiently estimate variance components by EMMAX, FaST-LMM, and HE regression algorithms; (4) implement parallel-accelerated association tests of markers using GLM, MLM, and FarmCPU methods; (5) compute fast with a globally efficient design in the GWAS processes; and (6) generate various visualizations of GWAS related information. Accelerated by block matrix multiplication strategy and multiple threads, the association test methods embedded in rMVP are approximately 5-20 times faster than PLINK, GEMMA, and FarmCPU_pkg. rMVP is freely available at https://github.com/xiaolei-lab/rMVP.

Download Full-text