scholarly journals Association analysis of rare and common variants with multiple traits based on variable reduction method

2018 ◽  
Vol 100 ◽  
Author(s):  
LILI CHEN ◽  
YONG WANG ◽  
YAJING ZHOU

SummaryPleiotropy, the effect of one variant on multiple traits, is widespread in complex diseases. Joint analysis of multiple traits can improve statistical power to detect genetic variants and uncover the underlying genetic mechanism. Currently, a large number of existing methods target one common variant or only rare variants. Increasing evidence shows that complex diseases are caused by common and rare variants. Here we propose a region-based method to test both rare and common variant associated multiple traits based on variable reduction method (abbreviated as MULVR). However, in the presence of noise traits, the MULVR method may lose power, so we propose the MULVR-O method, which jointly analyses the optimal number of traits associated with genetic variants by the MULVR method, to guard against the effect of noise traits. Extensive simulation studies show that our proposed method (MULVR-O) is applied to not only multiple quantitative traits but also qualitative traits, and is more powerful than several other comparison methods in most scenarios. An application to the two genes (SHBG and CHRM3) and two phenotypes (systolic blood pressure and diastolic blood pressure) from the GAW19 dataset illustrates that our proposed methods (MULVR and MULVR-O) are feasible and efficient as a region-based method.

2018 ◽  
Author(s):  
Zhenchuan Wang ◽  
Qiuying Sha ◽  
Kui Zhang ◽  
Shuanglin Zhang

AbstractJoint analysis of multiple traits has recently become popular since it can increase statistical power to detect genetic variants and there is increasing evidence showing that pleiotropy is a widespread phenomenon in complex diseases. Currently, most of existing methods test the association between multiple traits and a single common variant. However, the variant-by-variant methods for common variant association studies may not be optimal for rare variant association studies due to the allelic heterogeneity as well as the extreme rarity of individual variants. In this article, we developed a statistical method by testing an optimally weighted combination of variants with multiple traits (TOWmuT) to test the association between multiple traits and a weighted combination of variants (rare and/or common) in a genomic region. TOWmuT is robust to the directions of effects of causal variants and is applicable to different types of traits. Using extensive simulation studies, we compared the performance of TOWmuT with the following five existing methods: gene association with multiple traits (GAMuT), multiple sequence kernel association test (MSKAT), adaptive weighting reverse regression (AWRR), single-TOW, and MANOVA. Our results showed that, in all of the simulation scenarios, TOWmuT has correct type I error rates and is consistently more powerful than the other five tests. We also illustrated the usefulness of TOWmuT by analyzing a whole-genome genotyping data from a lung function study.


PLoS Genetics ◽  
2021 ◽  
Vol 17 (11) ◽  
pp. e1009849
Author(s):  
Yuhan Xie ◽  
Mo Li ◽  
Weilai Dong ◽  
Wei Jiang ◽  
Hongyu Zhao

Recent studies have demonstrated that multiple early-onset diseases have shared risk genes, based on findings from de novo mutations (DNMs). Therefore, we may leverage information from one trait to improve statistical power to identify genes for another trait. However, there are few methods that can jointly analyze DNMs from multiple traits. In this study, we develop a framework called M-DATA (Multi-trait framework for De novo mutation Association Test with Annotations) to increase the statistical power of association analysis by integrating data from multiple correlated traits and their functional annotations. Using the number of DNMs from multiple diseases, we develop a method based on an Expectation-Maximization algorithm to both infer the degree of association between two diseases as well as to estimate the gene association probability for each disease. We apply our method to a case study of jointly analyzing data from congenital heart disease (CHD) and autism. Our method was able to identify 23 genes for CHD from joint analysis, including 12 novel genes, which is substantially more than single-trait analysis, leading to novel insights into CHD disease etiology.


2020 ◽  
Author(s):  
Craig Smail ◽  
Nicole M. Ferraro ◽  
Matthew G. Durrant ◽  
Abhiram S. Rao ◽  
Matthew Aguirre ◽  
...  

SummaryPolygenic risk scores (PRS) aim to quantify the contribution of multiple genetic loci to an individual’s likelihood of a complex trait or disease. However, existing PRS estimate genetic liability using common genetic variants, excluding the impact of rare variants. We identified rare, large-effect variants in individuals with outlier gene expression from the GTEx project and then assessed their impact on PRS predictions in the UK Biobank (UKB). We observed large deviations from the PRS-predicted phenotypes for carriers of multiple outlier rare variants; for example, individuals classified as “low-risk” but in the top 1% of outlier rare variant burden had a 6-fold higher rate of severe obesity. We replicated these findings using data from the NHLBI Trans-Omics for Precision Medicine (TOPMed) biobank and the Million Veteran Program, and demonstrated that PRS across multiple traits will significantly benefit from the inclusion of rare genetic variants.


2019 ◽  
Author(s):  
Jianjun Zhang ◽  
Qiuying Sha ◽  
Guanfu Liu ◽  
Xuexia Wang

AbstractThere is increasing evidence showing that pleiotropy is a widespread phenomenon in complex diseases for which multiple correlated traits are often measured. Joint analysis of multiple traits could increase statistical power by aggregating multiple weak effects. Existing methods for multiple trait association tests usually study each of the multiple traits separately and then combine the univariate test statistics or combine p-values of the univariate tests for identifying disease associated genetic variants. However, ignoring correlation between phenotypes may cause power loss. Additionally, the genetic variants in one gene (including common and rare variants) are often viewed as a whole that affects the underlying disease since the basic functional unit of inheritance is a gene rather than a genetic variant. Thus, results from gene level association test can be more readily integrated with downstream functional and pathogenic investigation, whereas many existing methods for multiple trait association tests only focus on testing a single common variant rather than a gene. In this article, we propose a statistical method by Testing an Optimally Weighted Combination of Multiple traits (TOW-CM) to test the association between multiple traits and multiple variants in a genomic region (a gene or pathway). We investigate the performance of the proposed method through extensive simulation studies. Our simulation studies show that the proposed method has correct type I error rates and is either the most powerful test or comparable with the most powerful tests. In addition, we illustrate the usefulness of TOW-CM by analyzing a whole-genome genotyping data from a COPDGene study.


2021 ◽  
Author(s):  
Yuhan Xie ◽  
Mo Li ◽  
Weilai Dong ◽  
Wei Jiang ◽  
Hongyu Zhao

Recent studies have demonstrated that multiple early-onset diseases have shared risk genes, based on findings from de novo mutations (DMNs). Therefore, we may leverage information from one trait to improve statistical power to identify genes for another trait. However, there are few methods that can jointly analyze DNMs from multiple traits. In this study, we develop a framework called M-DATA (Multi-trait framework for De novo mutation Association Test with Annotations) to increase the statistical power of association analysis by integrating data from multiple correlated traits and their functional annotations. Using the number of DNMs from multiple diseases, we develop a method based on an Expectation-Maximization algorithm to both infer the degree of association between two diseases as well as to estimate the gene association probability for each disease. We apply our method to a case study of jointly analyzing data from congenital heart disease (CHD) and autism. Our method was able to identify 23 genes from joint analysis, including 12 novel genes, which is substantially more than single-trait analysis, leading to novel insights into CHD disease etiology.


2021 ◽  
Vol 12 ◽  
Author(s):  
Liwan Fu ◽  
Yuquan Wang ◽  
Tingting Li ◽  
Yue-Qing Hu

As a pivotal research tool, genome-wide association study has successfully identified numerous genetic variants underlying distinct diseases. However, these identified genetic variants only explain a small proportion of the phenotypic variation for certain diseases, suggesting that there are still more genetic signals to be detected. One of the reasons may be that one-phenotype one-variant association study is not so efficient in detecting variants of weak effects. Nowadays, it is increasingly worth noting that joint analysis of multiple phenotypes may boost the statistical power to detect pathogenic variants with weak genetic effects on complex diseases, providing more clues for their underlying biology mechanisms. So a Weighted Combination of multiple phenotypes following Hierarchical Clustering method (WCHC) is proposed for simultaneously analyzing multiple phenotypes in association studies. A series of simulations are conducted, and the results show that WCHC is either the most powerful method or comparable with the most powerful competitor in most of the simulation scenarios. Additionally, we evaluated the performance of WCHC in its application to the obesity-related phenotypes from Atherosclerosis Risk in Communities, and several associated variants are reported.


2021 ◽  
Author(s):  
Peter Dornbos ◽  
Ryan Koesterer ◽  
Andrew Ruttenburg ◽  
Joanne B Cole ◽  
Aaron Leong ◽  
...  

Polygenic scores (PS), constructed from the combined effects of many genetic variants, have been shown to predict risk or treatment strategies for certain common diseases. As most PS to date are based on common variants, the benefit of adding rare variation to PS remains largely unknown and methodically challenging. We developed and validated a novel method for constructing a rare variant PS and applied it to a previously identified clinical scenario, in which genetic variants modify the hemoglobin A1C (HbA1C) threshold recommended for type 2 diabetes (T2D) diagnosis. The resultant rare variant PS is highly polygenic (21,293 variants across 144 genes), depends on ultra-rare variants (72.7% of variants observed in <3 people), and identifies significantly more undiagnosed T2D cases than expected by chance (OR=2.71, p=1.51x10-6). A model combining the rare variant PS with a previously published common variant PS is expected to identify 4.9M misdiagnosed T2D cases in the USA, nearly 1.5-fold more than the common variant PS alone. These results provide a method for constructing complex phenotype PS from rare variants and suggest that rare variants will augment common variants in precision medicine approaches for common disease.


2020 ◽  
Author(s):  
Kodi Taraszka ◽  
Noah Zaitlen ◽  
Eleazar Eskin

AbstractWe introduce pleiotropic association test (PAT) for joint analysis of multiple traits using GWAS summary statistics. The method utilizes the decomposition of phenotypic covariation into genetic and environmental components to create a likelihood ratio test statistic for each genetic variant. Though PAT does not directly interpret which trait(s) drive the association, a per trait interpretation of the omnibus p-value is provided through an extension to the meta-analysis framework, m-values. In simulations, we show PAT controls the false positive rate, increases statistical power, and is robust to model misspecifications of genetic effect.Additionally, simulations comparing PAT to two multi-trait methods, HIPO and MTAG show PAT having a 43.0% increase in the number of omnibus associations over the other methods. When these associations are interpreted on a per trait level using m-values, PAT has 52.2% more per trait interpretations with a 0.57% false positive assignment rate. When analyzing four traits from the UK Biobank, PAT identifies 22,095 novel associated variants. Through the m-values interpretation framework, the number of total per trait associations for two traits are almost tripled and are nearly doubled for another trait relative to the original single trait GWAS.


2021 ◽  
Vol 12 ◽  
Author(s):  
Fengrong Liu ◽  
Ziyang Zhou ◽  
Mingzhi Cai ◽  
Yangjun Wen ◽  
Jin Zhang

Genome-wide association study (GWAS) has identified thousands of genetic variants associated with complex traits and diseases. Compared with analyzing a single phenotype at a time, the joint analysis of multiple phenotypes can improve statistical power by taking into account the information from phenotypes. However, most established joint algorithms ignore the different level of correlations between multiple phenotypes; instead of that, they simultaneously analyze all phenotypes in a genetic model. Thus, they may fail to capture the genetic structure of phenotypes and consequently reduce the statistical power. In this study, we develop a novel method agglomerative nesting clustering algorithm for phenotypic dimension reduction analysis (AGNEP) to jointly analyze multiple phenotypes for GWAS. First, AGNEP uses an agglomerative nesting clustering algorithm to group correlated phenotypes and then applies principal component analysis (PCA) to generate representative phenotypes for each group. Finally, multivariate analysis is employed to test associations between genetic variants and the representative phenotypes rather than all phenotypes. We perform three simulation experiments with various genetic structures and a real dataset analysis for 19 Arabidopsis phenotypes. Compared to established methods, AGNEP is more powerful in terms of statistical power, computing time, and the number of quantitative trait nucleotides (QTNs). The analysis of the Arabidopsis real dataset further illustrates the efficiency of AGNEP for detecting QTNs, which are confirmed by The Arabidopsis Information Resource gene bank.


PeerJ ◽  
2016 ◽  
Vol 4 ◽  
pp. e2139 ◽  
Author(s):  
Leiming Sun ◽  
Chan Wang ◽  
Yue-Qing Hu

Background.Genome-wide association studies have succeeded in detecting novel common variants which associate with complex diseases. As a result of the fast changes in next generation sequencing technology, a large number of sequencing data are generated, which offers great opportunities to identify rare variants that could explain a larger proportion of missing heritability. Many effective and powerful methods are proposed, although they are usually limited to continuous, dichotomous or ordinal traits. Notice that traits having nominal categorical features are commonly observed in complex diseases, especially in mental disorders, which motivates the incorporation of the characteristics of the categorical trait into association studies with rare and common variants.Methods.We construct two simple and intuitive nonparametric tests, MIT and aMIT, based on mutual information for detecting association between genetic variants in a gene or region and a categorical trait. MIT and aMIT can gauge the difference among the distributions of rare and common variants across a region given every categorical trait value. If there is little association between variants and a categorical trait, MIT or aMIT approximately equals zero. The larger the difference in distributions, the greater values MIT and aMIT have. Therefore, MIT and aMIT have the potential for detecting functional variants.Results.We checked the validity of proposed statistics and compared them to the existing ones through extensive simulation studies with varied combinations of the numbers of variants of rare causal, rare non-causal, common causal, and common non-causal, deleterious and protective, various minor allele frequencies and different levels of linkage disequilibrium. The results show our methods have higher statistical power than conventional ones, including the likelihood based score test, in most cases: (1) there are multiple genetic variants in a gene or region; (2) both protective and deleterious variants are present; (3) there exist rare and common variants; and (4) more than half of the variants are neutral. The proposed tests are applied to the data from Collaborative Studies on Genetics of Alcoholism, and a competent performance is exhibited therein.Discussion.As a complementary to the existing methods mainly focusing on quantitative traits, this study provides the nonparametric tests MIT and aMIT for detecting variants associated with categorical trait. Furthermore, we plan to investigate the association between rare variants and multiple categorical traits.


Sign in / Sign up

Export Citation Format

Share Document