scholarly journals Multivariate linear mixed model enhanced the power of identifying genome-wide association to poplar tree heights in a randomized complete block design

Author(s):  
Yuhua Chen ◽  
Hainan Wu ◽  
Wenguo Yang ◽  
Wei Zhao ◽  
Chunfa Tong

Abstract With the advances in high-throughput sequencing technologies, it is not difficult to extract tens of thousands of single nucleotide polymorphisms (SNPs) across many individuals in a fast and cheap way, making it possible to perform genome-wide association studies (GWAS) of quantitative traits in outbred forest trees. It is very valuable to apply traditional breeding experiments in GWAS for identifying genome variants associated to ecologically and economically important traits in Populus. Here, we reported a GWAS of tree height measured at multiple time points from a randomized complete block design (RCBD), which was established with clones from an F1 hybrid population of Populus deltoides and Populus simonii. A total of 22,670 SNPs across 172 clones in the RCBD were obtained with restriction site-associated DNA sequencing (RADseq) technology. The multivariate mixed linear model was applied by incorporating the pedigree relationship matrix of individuals to test the association of each SNP to the tree heights over 8 time points. Consequently, 41 SNPs were identified significantly associated to the tree height under the p-value threshold determined by Bonferroni correction at the significant level of 0.01. These SNPs were distributed on all but 2 chromosomes (Chr02 and Chr18) and explained the phenotypic variance ranged from 0.26% to 2.64%, amounting to 63.68% in total. Comparison with previous mapping studies for poplar height as well as the candidate genes of these detected SNPs were also investigated. We therefore demonstrated that the application of multivariate linear mixed model to the longitudinal phenotypic data from the traditional breeding experimental design facilitated to identify far more genome-wide variants for tree height in poplar. The significant SNPs identified in this study would enhance understanding of molecular mechanism for growth traits and would accelerate marker-assisted breeding programs in Populus.

2019 ◽  
Author(s):  
Jan A. Freudenthal ◽  
Markus J. Ankenbrand ◽  
Dominik G. Grimm ◽  
Arthur Korte

AbstractMotivationGenome-wide association studies (GWAS) are one of the most commonly used methods to detect associations between complex traits and genomic polymorphisms. As both genotyping and phenotyping of large populations has become easier, typical modern GWAS have to cope with massive amounts of data. Thus, the computational demand for these analyses grew remarkably during the last decades. This is especially true, if one wants to implement permutation-based significance thresholds, instead of using the naïve Bonferroni threshold. Permutation-based methods have the advantage to provide an adjusted multiple hypothesis correction threshold that takes the underlying phenotypic distribution into account and will thus remove the need to find the correct transformation for non Gaussian phenotypes. To enable efficient analyses of large datasets and the possibility to compute permutation-based significance thresholds, we used the machine learning framework TensorFlow to develop a linear mixed model (GWAS-Flow) that can make use of the available CPU or GPU infrastructure to decrease the time of the analyses especially for large datasets.ResultsWe were able to show that our application GWAS-Flow outperforms custom GWAS scripts in terms of speed without loosing accuracy. Apart from p-values, GWAS-Flow also computes summary statistics, such as the effect size and its standard error for each individual marker. The CPU-based version is the default choice for small data, while the GPU-based version of GWAS-Flow is especially suited for the analyses of big data.AvailabilityGWAS-Flow is freely available on GitHub (https://github.com/Joyvalley/GWAS_Flow) and is released under the terms of the MIT-License.


Animals ◽  
2020 ◽  
Vol 10 (11) ◽  
pp. 2009
Author(s):  
Ellen Lai ◽  
Alexa L. Danner ◽  
Thomas R. Famula ◽  
Anita M. Oberbauer

Digital dermatitis (DD) causes lameness in dairy cattle. To detect the quantitative trait loci (QTL) associated with DD, genome-wide association studies (GWAS) were performed using high-density single nucleotide polymorphism (SNP) genotypes and binary case/control, quantitative (average number of FW per hoof trimming record) and recurrent (cases with ≥2 DD episodes vs. controls) phenotypes from cows across four dairies (controls n = 129 vs. FW n = 85). Linear mixed model (LMM) and random forest (RF) approaches identified the top SNPs, which were used as predictors in Bayesian regression models to assess the SNP predictive value. The LMM and RF analyses identified QTL regions containing candidate genes on Bos taurus autosome (BTA) 2 for the binary and recurrent phenotypes and BTA7 and 20 for the quantitative phenotype that related to epidermal integrity, immune function, and wound healing. Although larger sample sizes are necessary to reaffirm these small effect loci amidst a strong environmental effect, the sample cohort used in this study was sufficient for estimating SNP effects with a high predictive value.


PLoS ONE ◽  
2011 ◽  
Vol 6 (11) ◽  
pp. e27061 ◽  
Author(s):  
Goutam Sahana ◽  
Thomas Mailund ◽  
Mogens Sandø Lund ◽  
Bernt Guldbrandtsen

2017 ◽  
Author(s):  
Haohan Wang ◽  
Bryon Aragam ◽  
Eric P. Xing

AbstractA fundamental and important challenge in modern datasets of ever increasing dimensionality is variable selection, which has taken on renewed interest recently due to the growth of biological and medical datasets with complex, non-i.i.d. structures. Naïvely applying classical variable selection methods such as the Lasso to such datasets may lead to a large number of false discoveries. Motivated by genome-wide association studies in genetics, we study the problem of variable selection for datasets arising from multiple subpopulations, when this underlying population structure is unknown to the researcher. We propose a unified framework for sparse variable selection that adaptively corrects for population structure via a low-rank linear mixed model. Most importantly, the proposed method does not require prior knowledge of sample structure in the data and adaptively selects a covariance structure of the correct complexity. Through extensive experiments, we illustrate the effectiveness of this framework over existing methods. Further, we test our method on three different genomic datasets from plants, mice, and human, and discuss the knowledge we discover with our method.


2019 ◽  
Vol 35 (23) ◽  
pp. 4879-4885 ◽  
Author(s):  
Chao Ning ◽  
Dan Wang ◽  
Lei Zhou ◽  
Julong Wei ◽  
Yuanxin Liu ◽  
...  

Abstract Motivation Current dynamic phenotyping system introduces time as an extra dimension to genome-wide association studies (GWAS), which helps to explore the mechanism of dynamical genetic control for complex longitudinal traits. However, existing methods for longitudinal GWAS either ignore the covariance among observations of different time points or encounter computational efficiency issues. Results We herein developed efficient genome-wide multivariate association algorithms for longitudinal data. In contrast to existing univariate linear mixed model analyses, the proposed method has improved statistic power for association detection and computational speed. In addition, the new method can analyze unbalanced longitudinal data with thousands of individuals and more than ten thousand records within a few hours. The corresponding time for balanced longitudinal data is just a few minutes. Availability and implementation A software package to implement the efficient algorithm named GMA (https://github.com/chaoning/GMA) is available freely for interested users in relevant fields. Supplementary information Supplementary data are available at Bioinformatics online.


Sign in / Sign up

Export Citation Format

Share Document