scholarly journals Efficient Genomic Control for Mixed Model Associations in Large-scale Population

2021 ◽  
Author(s):  
Zhiyu Hao ◽  
Jin Gao ◽  
Yuxin Song ◽  
Runqing Yang ◽  
Di Liu

AbstractAmong linear mixed model-based association methods, GRAMMAR has the lowest computing complexity for association tests, but it produces a high false-negative rate due to the deflation of test statistics for complex population structure. Here, we present an optimized GRAMMAR method by efficient genomic control, Optim-GRAMMAR, that estimates the phenotype residuals by regulating downward genomic heritability in the genomic best linear unbiased prediction. Even though using the fewer sampling markers to evaluate genomic relationship matrices and genomic controls, Optim-GRAMMAR retains a similar statistical power to the exact mixed model association analysis, which infers an extremely efficient approach to handle large-scale data. Moreover, joint association analysis significantly improved statistical power over existing methods.

2021 ◽  
Author(s):  
Runqing Yang ◽  
Jin Gao ◽  
Yuxin Song ◽  
Zhiyu Hao ◽  
Pao Xu

AbstractA highly efficient genome-wide association method, GRAMMAR-Lambda is proposed to make simple genomic control for the test statistics deflated by GRAMMAR, producing statistical power as high as exact mixed model association method. Using the simulated and real phenotypes, we show that at a moderate or above genomic heritability, polygenic effects can be estimated using a small number of randomly selected markers, which extremely simplify genome-wide association analysis with an approximate computational complexity to naïve method in large-scale complex population. Upon a test at once, joint association analysis offers significant increase in statistical power over existing methods.


2021 ◽  
Author(s):  
Runqing Yang ◽  
Jun Bao ◽  
Runqing Yang ◽  
Yuxin Song ◽  
Zhiyu Hao ◽  
...  

Abstract Generalized linear mixed models exhibit computationally intensive and biasness in mapping quantitative trait nucleotides for binary diseases. In genomic logit regression, we consider genomic breeding values estimated in advance as a known predictor, and then correct the deflated association test statistics by using genomic control, thereby successfully extending GRAMMAR-Lambda to analyze binary diseases in a complex structured population. Because there is no need to estimate genomic heritability and genomic breeding values can be estimated by a small number of sampling markers, the generalized mixed-model association analysis has been extremely simplified to handle large-scale data. With almost perfect genomic control, joint analysis for the candidate quantitative trait nucleotides chosen by multiple testing offered a significant improvement in statistical power.


2021 ◽  
Author(s):  
Runqing Yang ◽  
Yuxin Song ◽  
Zhiyu Hao ◽  
Zhonghua Liu

AbstractIn genome-wide association analysis for complex diseases, we partitioned the genomic generalized linear mixed model (GLMM) into two hierarchies—the GLMM regarding genomic breeding values (GBVs) and a generalized linear regression of the GBVs to the tested marker effects. In the first hierarchy, the GBVs were predicted by solving for the genomic best linear unbiased prediction for GLMM, and in the second hierarchy, association tests were performed using the generalized least square (GLS) method. The so-called Hi-GLMM method exhibited advantages over existing methods in terms of both genomic control for complex population structure and statistical power to detect quantitative trait nucleotides (QTNs), especially when the GBVs were estimated precisely, and using joint association analysis for QTN candidates obtained from a test at once.


2021 ◽  
Author(s):  
Runqing Yang ◽  
Jun Bao ◽  
Runqing Yang ◽  
Yuxin Song ◽  
Zhiyu Hao ◽  
...  

Abstract Generalized linear mixed models exhibit computationally intensive and biasness in mapping quantitative trait nucleotides for binary diseases. In genomic logit regression, we consider genomic breeding values estimated in advance as a known predictor, and then correct the deflated association test statistics by using genomic control, thereby successfully extending GRAMMAR-Lambda to analyze binary diseases in a complex structured population. Because there is no need to estimate genomic heritability and genomic breeding values can be estimated by a small number of sampling markers, the generalized mixed-model association analysis has been extremely simplified to handle large-scale data. With almost perfect genomic control, joint analysis for the candidate quantitative trait nucleotides chosen by multiple testing offered a significant improvement in statistical power.


2021 ◽  
Author(s):  
Runqing Yang ◽  
Di Liu ◽  
Zhiyu Hao ◽  
Yuxin Song ◽  
Runqing Yang ◽  
...  

Abstract We partitioned the genomic mixed model into two hierarchies to firstly estimate genomic breeding values (GBVs) using the genomic best linear unbiased prediction and then statistically infer the association of GBVs with each SNP using the generalized least square. The genome-wide hierarchical mixed model association study (named Hi-LMM) can correct effectively confounders with polygenic effects as residuals in association tests, preventing potential false negative errors produced with GRAMMAR or EMMAX. The Hi-LMM performs the same statistical power as the exact FaST-LMM with the same computing efficiency as EMMAX. When the GBVs have been estimated precisely, Hi-LMM outperforms existing methods in statistical power, especially through joint association analysis.


2020 ◽  
Author(s):  
Patrick Sin-Chan ◽  
Nehal Gosalia ◽  
Chuan Gao ◽  
Cristopher V. Van Hout ◽  
Bin Ye ◽  
...  

SUMMARYAging is characterized by degeneration in cellular and organismal functions leading to increased disease susceptibility and death. Although our understanding of aging biology in model systems has increased dramatically, large-scale sequencing studies to understand human aging are now just beginning. We applied exome sequencing and association analyses (ExWAS) to identify age-related variants on 58,470 participants of the DiscovEHR cohort. Linear Mixed Model regression analyses of age at last encounter revealed variants in genes known to be linked with clonal hematopoiesis of indeterminate potential, which are associated with myelodysplastic syndromes, as top signals in our analysis, suggestive of age-related somatic mutation accumulation in hematopoietic cells despite patients lacking clinical diagnoses. In addition to APOE, we identified rare DISP2 rs183775254 (p = 7.40×10−10) and ZYG11A rs74227999 (p = 2.50×10−08) variants that were negatively associated with age in either both sexes combined and females, respectively, which were replicated with directional consistency in two independent cohorts. Epigenetic mapping showed these variants are located within cell-type-specific enhancers, suggestive of important transcriptional regulatory functions. To discover variants associated with extreme age, we performed exome-sequencing on persons of Ashkenazi Jewish descent ascertained for extensive lifespans. Case-Control analyses in 525 Ashkenazi Jews cases (Males ≥ 92 years, Females ≥ 95years) were compared to 482 controls. Our results showed variants in APOE (rs429358, rs6857), and TMTC2 (rs7976168) passed Bonferroni-adjusted p-value, as well as several nominally-associated population-specific variants. Collectively, our Age-ExWAS, the largest performed to date, confirmed and identified previously unreported candidate variants associated with human age.


2017 ◽  
Author(s):  
Wei Zhou ◽  
Jonas B. Nielsen ◽  
Lars G. Fritsche ◽  
Rounak Dey ◽  
Maiken E. Gabrielsen ◽  
...  

AbstractIn genome-wide association studies (GWAS) for thousands of phenotypes in large biobanks, most binary traits have substantially fewer cases than controls. Both of the widely used approaches, linear mixed model and the recently proposed logistic mixed model, perform poorly – producing large type I error rates – in the analysis of phenotypes with unbalanced case-control ratios. Here we propose a scalable and accurate generalized mixed model association test that uses the saddlepoint approximation (SPA) to calibrate the distribution of score test statistics. This method, SAIGE, provides accurate p-values even when case-control ratios are extremely unbalanced. It utilizes state-of-art optimization strategies to reduce computational time and memory cost of generalized mixed model. The computation cost linearly depends on sample size, and hence can be applicable to GWAS for thousands of phenotypes by large biobanks. Through the analysis of UK Biobank data of 408,961 white British European-ancestry samples for >1400 binary phenotypes, we show that SAIGE can efficiently analyze large sample data, controlling for unbalanced case-control ratios and sample relatedness.


2020 ◽  
Vol 648 ◽  
pp. 207-219 ◽  
Author(s):  
D Stalder ◽  
FM van Beest ◽  
S Sveegaard ◽  
R Dietz ◽  
J Teilmann ◽  
...  

The harbour porpoise Phocoena phocoena is a small marine predator with a high conservation status in Europe and the USA. To protect the species effectively, it is crucial to understand its movement patterns and how the distribution of intensively used foraging areas can be predicted from environmental conditions. Here, we investigated the influence of both static and dynamic environmental conditions on large-scale harbour porpoise movements in the North Sea. We used long-term movement data from 57 individuals tracked during 1999-2017 in a state-space model to estimate the underlying behavioural states, i.e. whether animals used area-restricted or directed movements. Subsequently, we assessed whether the probability of using area-restricted movements was related to environmental conditions using a generalized linear mixed model. Harbour porpoises were more likely to use area-restricted movements in areas with low salinity levels, relatively high chlorophyll a concentrations and low current velocity, and in areas with steep bottom slopes, suggesting that such areas are important foraging grounds for porpoises. Our study identifies environmental parameters of relevance for predicting harbour porpoise foraging hot spots over space and time in a dynamic system. The study illustrates how movement patterns and data on environmental conditions can be combined, which is valuable to the conservation of marine mammals.


2021 ◽  
Author(s):  
Runqing Yang ◽  
Yuxin Song ◽  
Li Jiang ◽  
Zhiyu Hao ◽  
Runqing Yang

Abstract Complex computation and approximate solution hinder the application of generalized linear mixed models (GLMM) into genome-wide association studies. We extended GRAMMAR to handle binary diseases by considering genomic breeding values (GBVs) estimated in advance as a known predictor in genomic logit regression, and then controlled polygenic effects by regulating downward genomic heritability. Using simulations and case analyses, we showed in optimizing GRAMMAR, polygenic effects and genomic controls could be evaluated using the fewer sampling markers, which extremely simplified GLMM-based association analysis in large-scale data. In addition, joint analysis for quantitative trait nucleotide (QTN) candidates chosen by multiple testing offered significant improved statistical power to detect QTNs over existing methods.


Sign in / Sign up

Export Citation Format

Share Document