scholarly journals Mixed logistic regression in genome-wide association studies

2020 ◽  
Vol 21 (1) ◽  
Author(s):  
Jacqueline Milet ◽  
David Courtin ◽  
André Garcia ◽  
Hervé Perdry

Abstract Background Mixed linear models (MLM) have been widely used to account for population structure in case-control genome-wide association studies, the status being analyzed as a quantitative phenotype. Chen et al. proved in 2016 that this method is inappropriate in some situations and proposed GMMAT, a score test for the mixed logistic regression (MLR). However, this test does not produces an estimation of the variants’ effects. We propose two computationally efficient methods to estimate the variants’ effects. Their properties and those of other methods (MLM, logistic regression) are evaluated using both simulated and real genomic data from a recent GWAS in two geographically close population in West Africa. Results We show that, when the disease prevalence differs between population strata, MLM is inappropriate to analyze binary traits. MLR performs the best in all circumstances. The variants’ effects are well evaluated by our methods, with a moderate bias when the effect sizes are large. Additionally, we propose a stratified QQ-plot, enhancing the diagnosis of p values inflation or deflation when population strata are not clearly identified in the sample. Conclusion The two proposed methods are implemented in the R package milorGWAS available on the CRAN. Both methods scale up to at least 10,000 individuals. The same computational strategies could be applied to other models (e.g. mixed Cox model for survival analysis).

2020 ◽  
Author(s):  
Jacqueline Milet ◽  
Hervé Perdry

AbstractMotivationMixed linear models (MLM) have been widely used to account for population structure in case-control genome-wide association studies, the status being analyzed as a quantitative phenotype. Chen et al. proved that this method is inappropriate and proposed a score test for the mixed logistic regression (MLR). However this test does not allow an estimation of the variants’ effects.ResultsWe propose two computationally efficient methods to estimate the variants’ effects. Their properties are evaluated on two simulations sets, and compared with other methods (MLM, logistic regression). MLR performs the best in all circumstances. The variants’ effects are well evaluated by our methods, with a moderate bias when the effect sizes are large. Additionally, we propose a stratified QQ-plot, enhancing the diagnosis of p-values inflation or deflation, when population strata are not clearly identified in the sample.AvailabilityAll methods are implemented in the R package milorGWAS available at https://github.com/genostats/[email protected] informationSupplementary data are available at Bioinformatics online.


2020 ◽  
Vol 36 (15) ◽  
pp. 4374-4376
Author(s):  
Ninon Mounier ◽  
Zoltán Kutalik

Abstract Summary Increasing sample size is not the only strategy to improve discovery in Genome Wide Association Studies (GWASs) and we propose here an approach that leverages published studies of related traits to improve inference. Our Bayesian GWAS method derives informative prior effects by leveraging GWASs of related risk factors and their causal effect estimates on the focal trait using multivariable Mendelian randomization. These prior effects are combined with the observed effects to yield Bayes Factors, posterior and direct effects. The approach not only increases power, but also has the potential to dissect direct and indirect biological mechanisms. Availability and implementation bGWAS package is freely available under a GPL-2 License, and can be accessed, alongside with user guides and tutorials, from https://github.com/n-mounier/bGWAS. Supplementary information Supplementary data are available at Bioinformatics online.


2015 ◽  
Author(s):  
Hon-Cheong SO ◽  
Pak C. SHAM

Genome-wide association studies (GWAS) have become increasingly popular these days and one of the key questions is how much heritability could be explained by all variants in GWAS. We have previously proposed an approach to answer this question, based on recovering the "true" z-statistics from a set of observed z-statistics. Only summary statistics are required. However, methods for standard error (SE) estimation are not available yet, thereby limiting the interpretation of the results. In this study we developed resampling-based approaches to estimate the SE and the methods are implemented in an R package. We found that delete-d-jackknife and parametric bootstrap approaches provide good estimates of the SE. Methods to compute the sum of heritability explained and the corresponding SE are implemented in the R package SumVg, available at https://sites.google.com/site/honcheongso/software/var-totalvg


2019 ◽  
Author(s):  
Seongmun Jeong ◽  
Jae-Yoon Kim ◽  
Namshin Kim

AbstractCVRMS is an R package designed to extract marker subsets from repeated rank-based marker datasets generated from genome-wide association studies or marker effects for genome-wide prediction (https://github.com/lovemun/CVRMS). CVRMS provides an optimized genome-wide biomarker set with the best predictability of phenotype by implemented ridge regression using genetic information. Applying our method to human, animal, and plant datasets with wide heritability (zero to one), we selected hundreds to thousands of biomarkers for precise prediction.


2015 ◽  
Vol 31 (16) ◽  
pp. 2754-2756 ◽  
Author(s):  
D. Vuckovic ◽  
P. Gasparini ◽  
N. Soranzo ◽  
V. Iotchkova

Heredity ◽  
2021 ◽  
Author(s):  
Yasuhiro Sato ◽  
Eiji Yamamoto ◽  
Kentaro K. Shimizu ◽  
Atsushi J. Nagano

AbstractAn increasing number of field studies have shown that the phenotype of an individual plant depends not only on its genotype but also on those of neighboring plants; however, this fact is not taken into consideration in genome-wide association studies (GWAS). Based on the Ising model of ferromagnetism, we incorporated neighbor genotypic identity into a regression model, named “Neighbor GWAS”. Our simulations showed that the effective range of neighbor effects could be estimated using an observed phenotype when the proportion of phenotypic variation explained (PVE) by neighbor effects peaked. The spatial scale of the first nearest neighbors gave the maximum power to detect the causal variants responsible for neighbor effects, unless their effective range was too broad. However, if the effective range of the neighbor effects was broad and minor allele frequencies were low, there was collinearity between the self and neighbor effects. To suppress the false positive detection of neighbor effects, the fixed effect and variance components involved in the neighbor effects should be tested in comparison with a standard GWAS model. We applied neighbor GWAS to field herbivory data from 199 accessions of Arabidopsis thaliana and found that neighbor effects explained 8% more of the PVE of the observed damage than standard GWAS. The neighbor GWAS method provides a novel tool that could facilitate the analysis of complex traits in spatially structured environments and is available as an R package at CRAN (https://cran.rproject.org/package=rNeighborGWAS).


2015 ◽  
Author(s):  
Dragana Vuckovic ◽  
Paolo Gasparini ◽  
Nicole Soranzo ◽  
Valentina Iotchkova

Summary: As new methods for multivariate analysis of Genome Wide Association Studies (GWAS) become available, it is important to be able to combine results from different cohorts in a meta-analysis. The R package MultiMeta provides an implementation of the inverse-variance based method for meta-analysis, generalized to an n-dimensional setting. Availability: The R package MultiMeta can be downloaded from CRAN Contact: [email protected]


Author(s):  
Lilin Yin ◽  
Haohao Zhang ◽  
Zhenshuang Tang ◽  
Jingya Xu ◽  
Dong Yin ◽  
...  

AbstractAlong with the development of high-throughout sequencing technologies, both sample size and number of SNPs are increasing rapidly in Genome-Wide Association Studies (GWAS) and the associated computation is more challenging than ever. Here we present a Memory-efficient, Visualization-enhanced, and Parallel-accelerated R package called “rMVP” to address the need for improved GWAS computation. rMVP can: (1) effectively process large GWAS data; (2) rapidly evaluate population structure; (3) efficiently estimate variance components by EMMAX, FaST-LMM, and HE regression algorithms; (4) implement parallel-accelerated association tests of markers using GLM, MLM, and FarmCPU methods; (5) compute fast with a globally efficient design in the GWAS processes; and (6) generate various visualizations of GWAS related information. Accelerated by block matrix multiplication strategy and multiple threads, the association test methods embedded in rMVP are approximately 5-20 times faster than PLINK, GEMMA, and FarmCPU_pkg. rMVP is freely available at https://github.com/xiaolei-lab/rMVP.


Sign in / Sign up

Export Citation Format

Share Document