Flexible multivariate linear mixed models for structured multiple traits

Many genetic studies collect structured multivariate traits that have rich information about the traits encoded in trait covariates, in addition to the genotype and covariate information on individuals. Examples of such data include gene-environment studies where the same genotype/clone is measured in multiple enviroments, and longitudinal studies where a measurement is taken at multiple time points. We present a flexible multivariate linear mixed model (fMulti-LMM) suitable for genetic analysis of structured multivariate traits. Our model can incorporate low- and high-dimensional trait covariates to test the genetic association across structured multiple traits while capturing the correlations due to individual-to-individual similarity measured by genome-wide markers and trait-to-trait similarity measured by trait covariates.

Download Full-text

Meta-Analysis of Effect Sizes Reported at Multiple Time Points Using General Linear Mixed Model

PLoS ONE ◽

10.1371/journal.pone.0164898 ◽

2016 ◽

Vol 11 (10) ◽

pp. e0164898 ◽

Cited By ~ 10

Author(s):

Alfred Musekiwa ◽

Samuel O. M. Manda ◽

Henry G. Mwambi ◽

Ding-Geng Chen

Keyword(s):

Mixed Model ◽

Linear Mixed Model ◽

Meta Analysis ◽

Effect Sizes ◽

General Linear ◽

Multiple Time ◽

Time Points ◽

Multiple Time Points ◽

General Linear Mixed Model

Download Full-text

GWAS-Flow: A GPU accelerated framework for efficient permutation based genome-wide association studies

10.1101/783100 ◽

2019 ◽

Cited By ~ 2

Author(s):

Jan A. Freudenthal ◽

Markus J. Ankenbrand ◽

Dominik G. Grimm ◽

Arthur Korte

Keyword(s):

Complex Traits ◽

Mixed Model ◽

Linear Mixed Model ◽

Association Studies ◽

Large Datasets ◽

Genome Wide Association ◽

Small Data ◽

Genome Wide Association Studies ◽

Genome Wide ◽

Non Gaussian

AbstractMotivationGenome-wide association studies (GWAS) are one of the most commonly used methods to detect associations between complex traits and genomic polymorphisms. As both genotyping and phenotyping of large populations has become easier, typical modern GWAS have to cope with massive amounts of data. Thus, the computational demand for these analyses grew remarkably during the last decades. This is especially true, if one wants to implement permutation-based significance thresholds, instead of using the naïve Bonferroni threshold. Permutation-based methods have the advantage to provide an adjusted multiple hypothesis correction threshold that takes the underlying phenotypic distribution into account and will thus remove the need to find the correct transformation for non Gaussian phenotypes. To enable efficient analyses of large datasets and the possibility to compute permutation-based significance thresholds, we used the machine learning framework TensorFlow to develop a linear mixed model (GWAS-Flow) that can make use of the available CPU or GPU infrastructure to decrease the time of the analyses especially for large datasets.ResultsWe were able to show that our application GWAS-Flow outperforms custom GWAS scripts in terms of speed without loosing accuracy. Apart from p-values, GWAS-Flow also computes summary statistics, such as the effect size and its standard error for each individual marker. The CPU-based version is the default choice for small data, while the GPU-based version of GWAS-Flow is especially suited for the analyses of big data.AvailabilityGWAS-Flow is freely available on GitHub (https://github.com/Joyvalley/GWAS_Flow) and is released under the terms of the MIT-License.

Download Full-text

Variable selection in heterogeneous datasets: A truncated-rank sparse linear mixed model with applications to genome-wide association studies

Methods ◽

10.1016/j.ymeth.2018.04.021 ◽

2018 ◽

Vol 145 ◽

pp. 2-9 ◽

Cited By ~ 1

Author(s):

Haohan Wang ◽

Bryon Aragam ◽

Eric P. Xing

Keyword(s):

Variable Selection ◽

Mixed Model ◽

Linear Mixed Model ◽

Association Studies ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Genome Wide ◽

Heterogeneous Datasets

Download Full-text

A linear mixed model framework for gene-based gene-environment interaction tests in twin studies

Genetic Epidemiology ◽

10.1002/gepi.22150 ◽

2018 ◽

Vol 42 (7) ◽

pp. 648-663 ◽

Cited By ~ 2

Author(s):

Brandon J. Coombes ◽

Saonli Basu ◽

Matt McGue

Keyword(s):

Mixed Model ◽

Linear Mixed Model ◽

Twin Studies ◽

Environment Interaction ◽

Model Framework ◽

Gene Environment Interaction ◽

Gene Environment

Download Full-text

Genome-Wide Association Studies Reveal Susceptibility Loci for Digital Dermatitis in Holstein Cattle

Animals ◽

10.3390/ani10112009 ◽

2020 ◽

Vol 10 (11) ◽

pp. 2009

Author(s):

Ellen Lai ◽

Alexa L. Danner ◽

Thomas R. Famula ◽

Anita M. Oberbauer

Keyword(s):

Predictive Value ◽

Mixed Model ◽

Linear Mixed Model ◽

Bos Taurus ◽

Association Studies ◽

Bayesian Regression ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Digital Dermatitis ◽

Genome Wide

Digital dermatitis (DD) causes lameness in dairy cattle. To detect the quantitative trait loci (QTL) associated with DD, genome-wide association studies (GWAS) were performed using high-density single nucleotide polymorphism (SNP) genotypes and binary case/control, quantitative (average number of FW per hoof trimming record) and recurrent (cases with ≥2 DD episodes vs. controls) phenotypes from cows across four dairies (controls n = 129 vs. FW n = 85). Linear mixed model (LMM) and random forest (RF) approaches identified the top SNPs, which were used as predictors in Bayesian regression models to assess the SNP predictive value. The LMM and RF analyses identified QTL regions containing candidate genes on Bos taurus autosome (BTA) 2 for the binary and recurrent phenotypes and BTA7 and 20 for the quantitative phenotype that related to epidermal integrity, immune function, and wound healing. Although larger sample sizes are necessary to reaffirm these small effect loci amidst a strong environmental effect, the sample cohort used in this study was sufficient for estimating SNP effects with a high predictive value.

Download Full-text

Variable selection in heterogeneous datasets: A truncated-rank sparse linear mixed model with applications to genome-wide association studies

2017 IEEE International Conference on Bioinformatics and Biomedicine (BIBM) ◽

10.1109/bibm.2017.8217687 ◽

2017 ◽

Cited By ~ 9

Author(s):

Haohan Wang ◽

Bryon Aragam ◽

Eric P. Xing

Keyword(s):

Variable Selection ◽

Mixed Model ◽

Linear Mixed Model ◽

Association Studies ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Genome Wide ◽

Heterogeneous Datasets

Download Full-text

Local Genealogies in a Linear Mixed Model for Genome-Wide Association Mapping in Complex Pedigreed Populations

PLoS ONE ◽

10.1371/journal.pone.0027061 ◽

2011 ◽

Vol 6 (11) ◽

pp. e27061 ◽

Cited By ~ 2

Author(s):

Goutam Sahana ◽

Thomas Mailund ◽

Mogens Sandø Lund ◽

Bernt Guldbrandtsen

Keyword(s):

Association Mapping ◽

Mixed Model ◽

Linear Mixed Model ◽

Genome Wide Association ◽

Genome Wide

Download Full-text

Variable Selection in Heterogeneous Datasets: A Truncated-rank Sparse Linear Mixed Model with Applications to Genome-wide Association Studies

10.1101/228106 ◽

2017 ◽

Cited By ~ 2

Author(s):

Haohan Wang ◽

Bryon Aragam ◽

Eric P. Xing

Keyword(s):

Population Structure ◽

Variable Selection ◽

Mixed Model ◽

Linear Mixed Model ◽

Association Studies ◽

Genome Wide Association ◽

Low Rank ◽

Genome Wide Association Studies ◽

Unified Framework ◽

Genome Wide

AbstractA fundamental and important challenge in modern datasets of ever increasing dimensionality is variable selection, which has taken on renewed interest recently due to the growth of biological and medical datasets with complex, non-i.i.d. structures. Naïvely applying classical variable selection methods such as the Lasso to such datasets may lead to a large number of false discoveries. Motivated by genome-wide association studies in genetics, we study the problem of variable selection for datasets arising from multiple subpopulations, when this underlying population structure is unknown to the researcher. We propose a unified framework for sparse variable selection that adaptively corrects for population structure via a low-rank linear mixed model. Most importantly, the proposed method does not require prior knowledge of sample structure in the data and adaptively selects a covariance structure of the correct complexity. Through extensive experiments, we illustrate the effectiveness of this framework over existing methods. Further, we test our method on three different genomic datasets from plants, mice, and human, and discuss the knowledge we discover with our method.

Download Full-text

Multivariate genome-wide association study of leaf shape in a Populus deltoides and P. simonii F1 pedigree

PLoS ONE ◽

10.1371/journal.pone.0259278 ◽

2021 ◽

Vol 16 (10) ◽

pp. e0259278

Author(s):

Wenguo Yang ◽

Dan Yao ◽

Hainan Wu ◽

Wei Zhao ◽

Yuhua Chen ◽

...

Keyword(s):

Leaf Morphology ◽

Mixed Model ◽

Linear Mixed Model ◽

Populus Deltoides ◽

Leaf Shape ◽

Leaf Traits ◽

Leaf Length ◽

Moderate Number ◽

Genome Wide ◽

Poplar Leaf

Leaf morphology exhibits tremendous diversity between and within species, and is likely related to adaptation to environmental factors. Most poplar species are of great economic and ecological values and their leaf morphology can be a good predictor for wood productivity and environment adaptation. It is important to understand the genetic mechanism behind variation in leaf shape. Although some initial efforts have been made to identify quantitative trait loci (QTLs) for poplar leaf traits, more effort needs to be expended to unravel the polygenic architecture of the complex traits of leaf shape. Here, we performed a genome-wide association analysis (GWAS) of poplar leaf shape traits in a randomized complete block design with clones from F1 hybrids of Populus deltoides and Populus simonii. A total of 35 SNPs were identified as significantly associated with the multiple traits of a moderate number of regular polar radii between the leaf centroid and its edge points, which could represent the leaf shape, based on a multivariate linear mixed model. In contrast, the univariate linear mixed model was applied as single leaf traits for GWAS, leading to genomic inflation; thus, no significant SNPs were detected for leaf length, measures of leaf width, leaf area, or the ratio of leaf length to leaf width under genomic control. Investigation of the candidate genes showed that most flanking regions of the significant leaf shape-associated SNPs harbored genes that were related to leaf growth and development and to the regulation of leaf morphology. The combined use of the traditional experimental design and the multivariate linear mixed model could greatly improve the power in GWAS because the multiple trait data from a large number of individuals with replicates of clones were incorporated into the statistical model. The results of this study will enhance the understanding of the genetic mechanism of leaf shape variation in Populus. In addition, a moderate number of regular leaf polar radii can largely represent the leaf shape and can be used for GWAS of such a complicated trait in Populus, instead of the higher-dimensional regular radius data that were previously considered to well represent leaf shape.

Download Full-text

Efficient multivariate analysis algorithms for longitudinal genome-wide association studies

Bioinformatics ◽

10.1093/bioinformatics/btz304 ◽

2019 ◽

Vol 35 (23) ◽

pp. 4879-4885 ◽

Cited By ~ 4

Author(s):

Chao Ning ◽

Dan Wang ◽

Lei Zhou ◽

Julong Wei ◽

Yuanxin Liu ◽

...

Keyword(s):

Longitudinal Data ◽

Software Package ◽

Mixed Model ◽

Linear Mixed Model ◽

Association Studies ◽

Genome Wide Association ◽

Supplementary Information ◽

Genome Wide Association Studies ◽

Genome Wide ◽

Computational Speed

Abstract Motivation Current dynamic phenotyping system introduces time as an extra dimension to genome-wide association studies (GWAS), which helps to explore the mechanism of dynamical genetic control for complex longitudinal traits. However, existing methods for longitudinal GWAS either ignore the covariance among observations of different time points or encounter computational efficiency issues. Results We herein developed efficient genome-wide multivariate association algorithms for longitudinal data. In contrast to existing univariate linear mixed model analyses, the proposed method has improved statistic power for association detection and computational speed. In addition, the new method can analyze unbalanced longitudinal data with thousands of individuals and more than ten thousand records within a few hours. The corresponding time for balanced longitudinal data is just a few minutes. Availability and implementation A software package to implement the efficient algorithm named GMA (https://github.com/chaoning/GMA) is available freely for interested users in relevant fields. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text