scholarly journals M-DATA: A statistical approach to jointly analyzing de novo mutations for multiple traits

PLoS Genetics ◽  
2021 ◽  
Vol 17 (11) ◽  
pp. e1009849
Author(s):  
Yuhan Xie ◽  
Mo Li ◽  
Weilai Dong ◽  
Wei Jiang ◽  
Hongyu Zhao

Recent studies have demonstrated that multiple early-onset diseases have shared risk genes, based on findings from de novo mutations (DNMs). Therefore, we may leverage information from one trait to improve statistical power to identify genes for another trait. However, there are few methods that can jointly analyze DNMs from multiple traits. In this study, we develop a framework called M-DATA (Multi-trait framework for De novo mutation Association Test with Annotations) to increase the statistical power of association analysis by integrating data from multiple correlated traits and their functional annotations. Using the number of DNMs from multiple diseases, we develop a method based on an Expectation-Maximization algorithm to both infer the degree of association between two diseases as well as to estimate the gene association probability for each disease. We apply our method to a case study of jointly analyzing data from congenital heart disease (CHD) and autism. Our method was able to identify 23 genes for CHD from joint analysis, including 12 novel genes, which is substantially more than single-trait analysis, leading to novel insights into CHD disease etiology.

2021 ◽  
Author(s):  
Yuhan Xie ◽  
Mo Li ◽  
Weilai Dong ◽  
Wei Jiang ◽  
Hongyu Zhao

Recent studies have demonstrated that multiple early-onset diseases have shared risk genes, based on findings from de novo mutations (DMNs). Therefore, we may leverage information from one trait to improve statistical power to identify genes for another trait. However, there are few methods that can jointly analyze DNMs from multiple traits. In this study, we develop a framework called M-DATA (Multi-trait framework for De novo mutation Association Test with Annotations) to increase the statistical power of association analysis by integrating data from multiple correlated traits and their functional annotations. Using the number of DNMs from multiple diseases, we develop a method based on an Expectation-Maximization algorithm to both infer the degree of association between two diseases as well as to estimate the gene association probability for each disease. We apply our method to a case study of jointly analyzing data from congenital heart disease (CHD) and autism. Our method was able to identify 23 genes from joint analysis, including 12 novel genes, which is substantially more than single-trait analysis, leading to novel insights into CHD disease etiology.


2018 ◽  
Author(s):  
Hoang T. Nguyen ◽  
Amanda Dobbyn ◽  
Joseph Buxbaum ◽  
Dalila Pinto ◽  
Shaun M Purcell ◽  
...  

AbstractJoint analysis of multiple traits can result in the identification of associations not found through the analysis of each trait in isolation. In addition, approaches that consider multiple traits can aid in the characterization of shared genetic etiology among those traits. In recent years, parent-offspring trio studies have reported an enrichment of de novo mutations (DNMs) in neuropsychiatric disorders. The analysis of DNM data in the context of neuropsychiatric disorders has implicated multiple putatively causal genes, and a number of reported genes are shared across disorders. However, a joint analysis method designed to integrate de novo mutation data from multiple studies has yet to be implemented. We here introduce multi pi e-trait TAD A (mTADA) which jointly analyzes two traits using DNMs from non-overlapping family samples. mTADA uses two single-trait analysis data sets to estimate the proportion of overlapping risk genes, and reports genes shared between and specific to the relevant disorders. We applied mTADA to >13,000 trios for six disorders: schizophrenia (SCZ), autism spectrum disorder (ASD), developmental disorders (DD), intellectual disability (ID), epilepsy (EPI), and congenital heart disease (CHD). We report the proportion of overlapping risk genes and the specific risk genes shared for each pair of disorders. A total of 153 genes were found to be shared in at least one pair of disorders. The largest percentages of shared risk genes were observed for pairs of DD, ID, ASD, and CHD (>20%) whereas SCZ, CHD, and EPI did not show strong overlaps In risk gene set between them. Furthermore, mTADA identified additional SCZ, EPI and CHD risk genes through integration with DD de novo mutation data. For CHD, using DD information, 31 risk genes with posterior probabilities > 0.8 were identified, and 20 of these 31 genes were not in the list of known CHD genes. We find evidence that most significant CHD risk genes are strongly expressed in prenatal stages of the human genes. Finally, we validated our findings for CHD and EPI in independent cohorts comprising 1241 CHD trios, 226 CHD singletons and 197 EPI trios. Multiple novel risk genes identified by mTADA also had de novo mutations in these independent data sets. The joint analysis method introduced here, mTADA, is able to identify risk genes shared by two traits as well as additional risk genes not found through single-trait analysis only. A number of risk genes reported by mTADA are identified only through joint analysis, specifically when ASD, DD, or ID are one of the two traits examined. This suggests that novel genes for the trait or a new trait might converge to a core gene list of the three traits.


2018 ◽  
Vol 100 ◽  
Author(s):  
LILI CHEN ◽  
YONG WANG ◽  
YAJING ZHOU

SummaryPleiotropy, the effect of one variant on multiple traits, is widespread in complex diseases. Joint analysis of multiple traits can improve statistical power to detect genetic variants and uncover the underlying genetic mechanism. Currently, a large number of existing methods target one common variant or only rare variants. Increasing evidence shows that complex diseases are caused by common and rare variants. Here we propose a region-based method to test both rare and common variant associated multiple traits based on variable reduction method (abbreviated as MULVR). However, in the presence of noise traits, the MULVR method may lose power, so we propose the MULVR-O method, which jointly analyses the optimal number of traits associated with genetic variants by the MULVR method, to guard against the effect of noise traits. Extensive simulation studies show that our proposed method (MULVR-O) is applied to not only multiple quantitative traits but also qualitative traits, and is more powerful than several other comparison methods in most scenarios. An application to the two genes (SHBG and CHRM3) and two phenotypes (systolic blood pressure and diastolic blood pressure) from the GAW19 dataset illustrates that our proposed methods (MULVR and MULVR-O) are feasible and efficient as a region-based method.


2020 ◽  
Vol 11 (1) ◽  
Author(s):  
Tan-Hoang Nguyen ◽  
Amanda Dobbyn ◽  
Ruth C. Brown ◽  
Brien P. Riley ◽  
Joseph D. Buxbaum ◽  
...  

Author(s):  
Johnathon M Shook ◽  
Jiaoping Zhang ◽  
Sarah E Jones ◽  
Arti Singh ◽  
Brian W Diers ◽  
...  

Abstract We report a meta-Genome Wide Association Study involving 73 published studies in soybean (Glycine max L. [Merr.]) covering 17,556 unique accessions, with improved statistical power for robust detection of loci associated with a broad range of traits. De novo GWAS and meta-analysis were conducted for composition traits including fatty acid and amino acid composition traits, disease resistance traits, and agronomic traits including seed yield, plant height, stem lodging, seed weight, seed mottling, seed quality, flowering timing, and pod shattering. To examine differences in detectability and test statistical power between single- and multi-environment GWAS, comparison of meta-GWAS results to those from the constituent experiments were performed. Using meta-GWAS analysis and the analysis of individual studies, we report 483 peaks at 393 unique loci. Using stringent criteria to detect significant marker trait associations, 59 candidate genes were identified, including 17 agronomic traits loci, 19 for seed related traits, and 33 for disease reaction traits. This study identified potentially valuable candidate genes that affect multiple traits. The success in narrowing down the genomic region for some loci through overlapping mapping results of multiple studies is a promising avenue for community-based studies and plant breeding applications.


2018 ◽  
Author(s):  
Zhenchuan Wang ◽  
Qiuying Sha ◽  
Kui Zhang ◽  
Shuanglin Zhang

AbstractJoint analysis of multiple traits has recently become popular since it can increase statistical power to detect genetic variants and there is increasing evidence showing that pleiotropy is a widespread phenomenon in complex diseases. Currently, most of existing methods test the association between multiple traits and a single common variant. However, the variant-by-variant methods for common variant association studies may not be optimal for rare variant association studies due to the allelic heterogeneity as well as the extreme rarity of individual variants. In this article, we developed a statistical method by testing an optimally weighted combination of variants with multiple traits (TOWmuT) to test the association between multiple traits and a weighted combination of variants (rare and/or common) in a genomic region. TOWmuT is robust to the directions of effects of causal variants and is applicable to different types of traits. Using extensive simulation studies, we compared the performance of TOWmuT with the following five existing methods: gene association with multiple traits (GAMuT), multiple sequence kernel association test (MSKAT), adaptive weighting reverse regression (AWRR), single-TOW, and MANOVA. Our results showed that, in all of the simulation scenarios, TOWmuT has correct type I error rates and is consistently more powerful than the other five tests. We also illustrated the usefulness of TOWmuT by analyzing a whole-genome genotyping data from a lung function study.


2021 ◽  
Author(s):  
Brennan H Baker ◽  
Shaoyi Zhang ◽  
Jeremy M Simon ◽  
Sarah M McLarnan ◽  
Wendy K Chung ◽  
...  

De novo mutations contribute to a large proportion of sporadic psychiatric and developmental disorders, yet the potential role of environmental carcinogens as drivers of causal de novo mutations in neurodevelopmental disorders is poorly studied. We demonstrate that several mutagens, including polycyclic aromatic hydrocarbons (PAHs), disproportionately mutate genes related to neurodevelopmental disorders including autism spectrum disorders (ASD), schizophrenia, and attention deficit hyperactivity disorder (ADHD). Other disease genes including amyotrophic lateral sclerosis (ALS), Alzheimers disease, congenital heart disease, orofacial clefts, and coronary artery disease were generally not mutated more than expected. Our findings support a new paradigm of neurodevelopmental disease etiology driven by a contribution of environmentally induced rather than random mutations.


2020 ◽  
Author(s):  
Kodi Taraszka ◽  
Noah Zaitlen ◽  
Eleazar Eskin

AbstractWe introduce pleiotropic association test (PAT) for joint analysis of multiple traits using GWAS summary statistics. The method utilizes the decomposition of phenotypic covariation into genetic and environmental components to create a likelihood ratio test statistic for each genetic variant. Though PAT does not directly interpret which trait(s) drive the association, a per trait interpretation of the omnibus p-value is provided through an extension to the meta-analysis framework, m-values. In simulations, we show PAT controls the false positive rate, increases statistical power, and is robust to model misspecifications of genetic effect.Additionally, simulations comparing PAT to two multi-trait methods, HIPO and MTAG show PAT having a 43.0% increase in the number of omnibus associations over the other methods. When these associations are interpreted on a per trait level using m-values, PAT has 52.2% more per trait interpretations with a 0.57% false positive assignment rate. When analyzing four traits from the UK Biobank, PAT identifies 22,095 novel associated variants. Through the m-values interpretation framework, the number of total per trait associations for two traits are almost tripled and are nearly doubled for another trait relative to the original single trait GWAS.


Sign in / Sign up

Export Citation Format

Share Document