scholarly journals Integrative functional linear model for genome-wide association studies with multiple traits

Biostatistics ◽  
2020 ◽  
Author(s):  
Yang Li ◽  
Fan Wang ◽  
Mengyun Wu ◽  
Shuangge Ma

Summary In recent biomedical research, genome-wide association studies (GWAS) have demonstrated great success in investigating the genetic architecture of human diseases. For many complex diseases, multiple correlated traits have been collected. However, most of the existing GWAS are still limited because they analyze each trait separately without considering their correlations and suffer from a lack of sufficient information. Moreover, the high dimensionality of single nucleotide polymorphism (SNP) data still poses tremendous challenges to statistical methods, in both theoretical and practical aspects. In this article, we innovatively propose an integrative functional linear model for GWAS with multiple traits. This study is the first to approximate SNPs as functional objects in a joint model of multiple traits with penalization techniques. It effectively accommodates the high dimensionality of SNPs and correlations among multiple traits to facilitate information borrowing. Our extensive simulation studies demonstrate the satisfactory performance of the proposed method in the identification and estimation of disease-associated genetic variants, compared to four alternatives. The analysis of type 2 diabetes data leads to biologically meaningful findings with good prediction accuracy and selection stability.

2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Guomin Zhang ◽  
Rongsheng Wang ◽  
Juntao Ma ◽  
Hongru Gao ◽  
Lingwei Deng ◽  
...  

Abstract Background Heilongjiang Province is a high-quality japonica rice cultivation area in China. One in ten bowls of Chinese rice is produced here. Increasing yield is one of the main aims of rice production in this area. However, yield is a complex quantitative trait composed of many factors. The purpose of this study was to determine how many genetic loci are associated with yield-related traits. Genome-wide association studies (GWAS) were performed on 450 accessions collected from northeast Asia, including Russia, Korea, Japan and Heilongjiang Province of China. These accessions consist of elite varieties and landraces introduced into Heilongjiang Province decade ago. Results After resequencing of the 450 accessions, 189,019 single nucleotide polymorphisms (SNPs) were used for association studies by two different models, a general linear model (GLM) and a mixed linear model (MLM), examining four traits: days to heading (DH), plant height (PH), panicle weight (PW) and tiller number (TI). Over 25 SNPs were found to be associated with each trait. Among them, 22 SNPs were selected to identify candidate genes, and 2, 8, 1 and 11 SNPs were found to be located in 3′ UTR region, intron region, coding region and intergenic region, respectively. Conclusions All SNPs detected in this research may become candidates for further fine mapping and may be used in the molecular breeding of high-latitude rice.


2010 ◽  
Vol 42 (4) ◽  
pp. 355-360 ◽  
Author(s):  
Zhiwu Zhang ◽  
Elhan Ersoz ◽  
Chao-Qiang Lai ◽  
Rory J Todhunter ◽  
Hemant K Tiwari ◽  
...  

2019 ◽  
Author(s):  
Michael C. Turchin ◽  
Matthew Stephens

AbstractGenome-wide association studies (GWAS) have now been conducted for hundreds of phenotypes of relevance to human health. Many such GWAS involve multiple closely-related phenotypes collected on the same samples. However, the vast majority of these GWAS have been analyzed using simple univariate analyses, which consider one phenotype at a time. This is de-spite the fact that, at least in simulation experiments, multivariate analyses have been shown to be more powerful at detecting associations. Here, we conduct multivariate association analyses on 13 different publicly-available GWAS datasets that involve multiple closely-related phenotypes. These data include large studies of anthropometric traits (GIANT), plasma lipid traits (GlobalLipids), and red blood cell traits (HaemgenRBC). Our analyses identify many new associations (433 in total across the 13 studies), many of which replicate when follow-up samples are available. Overall, our results demonstrate that multivariate analyses can help make more effective use of data from both existing and future GWAS.1Author SummaryGenome-wide association studies (GWAS) have become a common and powerful tool for identifying significant correlations between markers of genetic variation and physical traits of interest. Often these studies are conducted by comparing genetic variation against single traits one at a time (‘univariate’); however, it has previously been shown that it is possible to increase your power to detect significant associations by comparing genetic variation against multiple traits simultaneously (‘multivariate’). Despite this apparent increase in power though, researchers still rarely conduct multivariate GWAS, even when studies have multiple traits readily available. Here, we reanalyze 13 previously published GWAS using a multivariate method and find >400 additional associations. Our method makes use of univariate GWAS summary statistics and is available as a software package, thus making it accessible to other researchers interested in conducting the same analyses. We also show, using studies that have multiple releases, that our new associations have high rates of replication. Overall, we argue multivariate approaches in GWAS should no longer be overlooked and how, often, there is low-hanging fruit in the form of new associations by running these methods on data already collected.


2018 ◽  
Author(s):  
Ping Zeng ◽  
Xinjie Hao ◽  
Xiang Zhou

AbstractMotivationGenome-wide association studies (GWASs) have identified many genetic loci associated with complex traits. A substantial fraction of these identified loci are associated with multiple traits – a phenomena known as pleiotropy. Identification of pleiotropic associations can help characterize the genetic relationship among complex traits and can facilitate our understanding of disease etiology. Effective pleiotropic association mapping requires the development of statistical methods that can jointly model multiple traits with genome-wide SNPs together.ResultsWe develop a joint modeling method, which we refer to as the integrative MApping of Pleiotropic association (iMAP). iMAP models summary statistics from GWASs, uses a multivariate Gaussian distribution to account for phenotypic correlation, simultaneously infers genome-wide SNP association pattern using mixture modeling, and has the potential to reveal causal relationship between traits. Importantly, iMAP integrates a large number of SNP functional annotations to substantially improve association mapping power, and, with a sparsity-inducing penalty, is capable of selecting informative annotations from a large, potentially noninformative set. To enable scalable inference of iMAP to association studies with hundreds of thousands of individuals and millions of SNPs, we develop an efficient expectation maximization algorithm based on an approximate penalized regression algorithm. With simulations and comparisons to existing methods, we illustrate the benefits of iMAP both in terms of high association mapping power and in terms of accurate estimation of genome-wide SNP association patterns. Finally, we apply iMAP to perform a joint analysis of 48 traits from 31 GWAS consortia together with 40 tissue-specific SNP annotations generated from the Roadmap Project. iMAP is freely available at www.xzlab.org/software.html.


2019 ◽  
Vol 2 (1) ◽  
Author(s):  
Pieter W. M. Bonnemaijer ◽  
◽  
Elisabeth M. van Leeuwen ◽  
Adriana I. Iglesias ◽  
Puya Gharahkhani ◽  
...  

AbstractA new avenue of mining published genome-wide association studies includes the joint analysis of related traits. The power of this approach depends on the genetic correlation of traits, which reflects the number of pleiotropic loci, i.e. genetic loci influencing multiple traits. Here, we applied new meta-analyses of optic nerve head (ONH) related traits implicated in primary open-angle glaucoma (POAG); intraocular pressure and central corneal thickness using Haplotype reference consortium imputations. We performed a multi-trait analysis of ONH parameters cup area, disc area and vertical cup-disc ratio. We uncover new variants; rs11158547 in PPP1R36-PLEKHG3 and rs1028727 near SERPINE3 at genome-wide significance that replicate in independent Asian cohorts imputed to 1000 Genomes. At this point, validation of these variants in POAG cohorts is hampered by the high degree of heterogeneity. Our results show that multi-trait analysis is a valid approach to identify novel pleiotropic variants for ONH.


2016 ◽  
Vol 6 (1) ◽  
Author(s):  
Shi-Bo Wang ◽  
Jian-Ying Feng ◽  
Wen-Long Ren ◽  
Bo Huang ◽  
Ling Zhou ◽  
...  

F1000Research ◽  
2014 ◽  
Vol 3 ◽  
pp. 200 ◽  
Author(s):  
Diego Fabregat-Traver ◽  
Sodbo Zh. Sharapov ◽  
Caroline Hayward ◽  
Igor Rudan ◽  
Harry Campbell ◽  
...  

To raise the power of genome-wide association studies (GWAS) and avoid false-positive results in structured populations, one can rely on mixed model based tests. When large samples are used, and when multiple traits are to be studied in the ’omics’ context, this approach becomes computationally challenging. Here we consider the problem of mixed-model based GWAS for arbitrary number of traits, and demonstrate that for the analysis of single-trait and multiple-trait scenarios different computational algorithms are optimal. We implement these optimal algorithms in a high-performance computing framework that uses state-of-the-art linear algebra kernels, incorporates optimizations, and avoids redundant computations,increasing throughput while reducing memory usage and energy consumption. We show that, compared to existing libraries, our algorithms and software achieve considerable speed-ups. The OmicABEL software described in this manuscript is available under the GNUGPL v. 3 license as part of the GenABEL project for statistical genomics at http: //www.genabel.org/packages/OmicABEL.


2016 ◽  
Author(s):  
Sunyoung Shin ◽  
Sündüz Keleş

AbstractAlthough genome-wide association studies (GWAS) have been successful at finding thousands of disease-associated genetic variants (GVs), identifying causal variants and elucidating the mechanisms by which genotypes influence phenotypes are critical open questions. A key challenge is that a large percentage of disease-associated GVs are potential regulatory variants located in noncoding regions, making them difficult to interpret. Recent research efforts focus on going beyond annotating GVs by integrating functional annotation data with GWAS to prioritize GVs. However, applicability of these approaches is challenged by high dimensionality and heterogeneity of functional annotation data. Furthermore, existing methods often assume global associations of GVs with annotation data. This strong assumption is susceptible to violations for GVs involved in many complex diseases. To address these issues, we develop a general regression framework, named Annotation Regression for GWAS (ARoG). ARoG is based on finite mixture of linear regression models where GWAS association measures are viewed as responses and functional annotations as predictors. This mixture framework addresses heterogeneity of effects of GVs by grouping them into clusters and high dimensionality of the functional annotations by enabling annotation selection within each cluster. ARoG further employs permutation testing to evaluate the significance of selected annotations. Computational experiments indicate that ARoG can discover distinct associations between disease risk and functional annotations. Application of ARoG to autism and schizophrenia data from Psychiatric Genomics Consortium led to identification of GVs that significantly affect interactions of several transcription factors with DNA as potential mechanisms contributing to these disorders.


2018 ◽  
Vol 2018 ◽  
pp. 1-9 ◽  
Author(s):  
Baolin Wu ◽  
James S. Pankow

Multiple correlated traits are often collected in genetic studies. By jointly analyzing multiple traits, we can increase power by aggregating multiple weak effects and reveal additional insights into the genetic architecture of complex human diseases. In this article, we propose a multivariate linear regression-based method to test the joint association of multiple quantitative traits. It is flexible to accommodate any covariates, has very accurate control of type I errors, and offers very competitive performance. We also discuss fast and accurate significance p value computation especially for genome-wide association studies with small-to-medium sample sizes. We demonstrate through extensive numerical studies that the proposed method has competitive performance. Its usefulness is further illustrated with application to genome-wide association analysis of diabetes-related traits in the Atherosclerosis Risk in Communities (ARIC) study. We found some very interesting associations with diabetes traits which have not been reported before. We implemented the proposed methods in a publicly available R package.


Sign in / Sign up

Export Citation Format

Share Document