scholarly journals mTADA is a framework for identifying risk genes from de novo mutations in multiple traits

2020 ◽  
Vol 11 (1) ◽  
Author(s):  
Tan-Hoang Nguyen ◽  
Amanda Dobbyn ◽  
Ruth C. Brown ◽  
Brien P. Riley ◽  
Joseph D. Buxbaum ◽  
...  
2018 ◽  
Author(s):  
Hoang T. Nguyen ◽  
Amanda Dobbyn ◽  
Joseph Buxbaum ◽  
Dalila Pinto ◽  
Shaun M Purcell ◽  
...  

AbstractJoint analysis of multiple traits can result in the identification of associations not found through the analysis of each trait in isolation. In addition, approaches that consider multiple traits can aid in the characterization of shared genetic etiology among those traits. In recent years, parent-offspring trio studies have reported an enrichment of de novo mutations (DNMs) in neuropsychiatric disorders. The analysis of DNM data in the context of neuropsychiatric disorders has implicated multiple putatively causal genes, and a number of reported genes are shared across disorders. However, a joint analysis method designed to integrate de novo mutation data from multiple studies has yet to be implemented. We here introduce multi pi e-trait TAD A (mTADA) which jointly analyzes two traits using DNMs from non-overlapping family samples. mTADA uses two single-trait analysis data sets to estimate the proportion of overlapping risk genes, and reports genes shared between and specific to the relevant disorders. We applied mTADA to >13,000 trios for six disorders: schizophrenia (SCZ), autism spectrum disorder (ASD), developmental disorders (DD), intellectual disability (ID), epilepsy (EPI), and congenital heart disease (CHD). We report the proportion of overlapping risk genes and the specific risk genes shared for each pair of disorders. A total of 153 genes were found to be shared in at least one pair of disorders. The largest percentages of shared risk genes were observed for pairs of DD, ID, ASD, and CHD (>20%) whereas SCZ, CHD, and EPI did not show strong overlaps In risk gene set between them. Furthermore, mTADA identified additional SCZ, EPI and CHD risk genes through integration with DD de novo mutation data. For CHD, using DD information, 31 risk genes with posterior probabilities > 0.8 were identified, and 20 of these 31 genes were not in the list of known CHD genes. We find evidence that most significant CHD risk genes are strongly expressed in prenatal stages of the human genes. Finally, we validated our findings for CHD and EPI in independent cohorts comprising 1241 CHD trios, 226 CHD singletons and 197 EPI trios. Multiple novel risk genes identified by mTADA also had de novo mutations in these independent data sets. The joint analysis method introduced here, mTADA, is able to identify risk genes shared by two traits as well as additional risk genes not found through single-trait analysis only. A number of risk genes reported by mTADA are identified only through joint analysis, specifically when ASD, DD, or ID are one of the two traits examined. This suggests that novel genes for the trait or a new trait might converge to a core gene list of the three traits.


2020 ◽  
Vol 11 (1) ◽  
Author(s):  
Tianyun Wang ◽  
◽  
Kendra Hoekzema ◽  
Davide Vecchio ◽  
Huidan Wu ◽  
...  

Abstract Most genes associated with neurodevelopmental disorders (NDDs) were identified with an excess of de novo mutations (DNMs) but the significance in case–control mutation burden analysis is unestablished. Here, we sequence 63 genes in 16,294 NDD cases and an additional 62 genes in 6,211 NDD cases. By combining these with published data, we assess a total of 125 genes in over 16,000 NDD cases and compare the mutation burden to nonpsychiatric controls from ExAC. We identify 48 genes (25 newly reported) showing significant burden of ultra-rare (MAF < 0.01%) gene-disruptive mutations (FDR 5%), six of which reach family-wise error rate (FWER) significance (p < 1.25E−06). Among these 125 targeted genes, we also reevaluate DNM excess in 17,426 NDD trios with 6,499 new autism trios. We identify 90 genes enriched for DNMs (FDR 5%; e.g., GABRG2 and UIMC1); of which, 61 reach FWER significance (p < 3.64E−07; e.g., CASZ1). In addition to doubling the number of patients for many NDD risk genes, we present phenotype–genotype correlations for seven risk genes (CTCF, HNRNPU, KCNQ3, ZBTB18, TCF12, SPEN, and LEO1) based on this large-scale targeted sequencing effort.


2018 ◽  
Vol 9 (1) ◽  
Author(s):  
Hui Guo ◽  
Tianyun Wang ◽  
Huidan Wu ◽  
Min Long ◽  
Bradley P. Coe ◽  
...  

PLoS Genetics ◽  
2021 ◽  
Vol 17 (11) ◽  
pp. e1009849
Author(s):  
Yuhan Xie ◽  
Mo Li ◽  
Weilai Dong ◽  
Wei Jiang ◽  
Hongyu Zhao

Recent studies have demonstrated that multiple early-onset diseases have shared risk genes, based on findings from de novo mutations (DNMs). Therefore, we may leverage information from one trait to improve statistical power to identify genes for another trait. However, there are few methods that can jointly analyze DNMs from multiple traits. In this study, we develop a framework called M-DATA (Multi-trait framework for De novo mutation Association Test with Annotations) to increase the statistical power of association analysis by integrating data from multiple correlated traits and their functional annotations. Using the number of DNMs from multiple diseases, we develop a method based on an Expectation-Maximization algorithm to both infer the degree of association between two diseases as well as to estimate the gene association probability for each disease. We apply our method to a case study of jointly analyzing data from congenital heart disease (CHD) and autism. Our method was able to identify 23 genes for CHD from joint analysis, including 12 novel genes, which is substantially more than single-trait analysis, leading to novel insights into CHD disease etiology.


2021 ◽  
Author(s):  
Hanmin Guo ◽  
Lin Hou ◽  
Yu Shi ◽  
Sheng Chih Jin ◽  
Xue Zeng ◽  
...  

AbstractExome sequencing on tens of thousands of parent-proband trios has identified numerous deleterious de novo mutations (DNMs) and implicated risk genes for many disorders. Recent studies have suggested shared genes and pathways are enriched for DNMs across multiple disorders. However, existing analytic strategies only focus on genes that reach statistical significance for multiple disorders and require large trio samples in each study. As a result, these methods are not able to characterize the full landscape of genetic sharing due to polygenicity and incomplete penetrance. In this work, we introduce EncoreDNM, a novel statistical framework to quantify shared genetic effects between two disorders characterized by concordant enrichment of DNMs in the exome. EncoreDNM makes use of exome-wide, summary-level DNM data, including genes that do not reach statistical significance in single-disorder analysis, to evaluate the overall and annotation-partitioned genetic sharing between two disorders. Applying EncoreDNM to DNM data of nine disorders, we identified abundant pairwise enrichment correlations, especially in genes intolerant to pathogenic mutations and genes highly expressed in fetal tissues. These results suggest that EncoreDNM improves current analytic approaches and may have broad applications in DNM studies.


2021 ◽  
Author(s):  
Yuhan Xie ◽  
Mo Li ◽  
Weilai Dong ◽  
Wei Jiang ◽  
Hongyu Zhao

Recent studies have demonstrated that multiple early-onset diseases have shared risk genes, based on findings from de novo mutations (DMNs). Therefore, we may leverage information from one trait to improve statistical power to identify genes for another trait. However, there are few methods that can jointly analyze DNMs from multiple traits. In this study, we develop a framework called M-DATA (Multi-trait framework for De novo mutation Association Test with Annotations) to increase the statistical power of association analysis by integrating data from multiple correlated traits and their functional annotations. Using the number of DNMs from multiple diseases, we develop a method based on an Expectation-Maximization algorithm to both infer the degree of association between two diseases as well as to estimate the gene association probability for each disease. We apply our method to a case study of jointly analyzing data from congenital heart disease (CHD) and autism. Our method was able to identify 23 genes from joint analysis, including 12 novel genes, which is substantially more than single-trait analysis, leading to novel insights into CHD disease etiology.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Dan He ◽  
Cong Fan ◽  
Mengling Qi ◽  
Yuedong Yang ◽  
David N. Cooper ◽  
...  

AbstractSchizophrenia (SCZ) is a polygenic disease with a heritability approaching 80%. Over 100 SCZ-related loci have so far been identified by genome-wide association studies (GWAS). However, the risk genes associated with these loci often remain unknown. We present a new risk gene predictor, rGAT-omics, that integrates multi-omics data under a Bayesian framework by combining the Hotelling and Box–Cox transformations. The Bayesian framework was constructed using gene ontology, tissue-specific protein–protein networks, and multi-omics data including differentially expressed genes in SCZ and controls, distance from genes to the index single-nucleotide polymorphisms (SNPs), and de novo mutations. The application of rGAT-omics to the 108 loci identified by a recent GWAS study of SCZ predicted 103 high-risk genes (HRGs) that explain a high proportion of SCZ heritability (Enrichment = 43.44 and $$p = 9.30 \times 10^{ - 9}$$ p = 9.30 × 1 0 − 9 ). HRGs were shown to be significantly ($$p_{\mathrm{adj}} = 5.35 \times 10^{ - 7}$$ p adj = 5.35 × 1 0 − 7 ) enriched in genes associated with neurological activities, and more likely to be expressed in brain tissues and SCZ-associated cell types than background genes. The predicted HRGs included 16 novel genes not present in any existing databases of SCZ-associated genes or previously predicted to be SCZ risk genes by any other method. More importantly, 13 of these 16 genes were not the nearest to the index SNP markers, and them would have been difficult to identify as risk genes by conventional approaches while ten out of the 16 genes are associated with neurological functions that make them prime candidates for pathological involvement in SCZ. Therefore, rGAT-omics has revealed novel insights into the molecular mechanisms underlying SCZ and could provide potential clues to future therapies.


2018 ◽  
Vol 102 (6) ◽  
pp. 1031-1047 ◽  
Author(s):  
Yuwen Liu ◽  
Yanyu Liang ◽  
A. Ercument Cicek ◽  
Zhongshan Li ◽  
Jinchen Li ◽  
...  

2021 ◽  
Author(s):  
Tianyun Wang ◽  
Chang Kim ◽  
Trygve E. Bakken ◽  
Madelyn A. Gillentine ◽  
Barbara Henning ◽  
...  

ABSTRACTMost genetic studies consider autism spectrum disorder (ASD) and developmental disorder (DD) separately despite overwhelming comorbidity and shared genetic etiology. Here we analyzed de novo mutations (DNMs) from 15,560 ASD (6,557 are new) and 31,052 DD trios independently and combined as broader neurodevelopmental disorders (NDD) using three models. We identify 615 candidate genes (FDR 5%, 189 potentially novel) by one or more models, including 138 reaching exome-wide significance (p < 3.64e-07) in all models. We find no evidence for ASD-specific genes in contrast to 18 genes significantly enriched for DD. There are 53 genes show particular mutational-bias including enrichments for missense (n=41) or truncating DNM (n=12). We find 22 genes with evidence of sex-bias including five X chromosome genes also with significant female burden (DDX3X, MECP2, SMC1A, WDR45, and HDAC8). NDD risk genes group into five functional networks associating with different brain developmental lineages based on single-cell nuclei transcriptomic data, which provides important insights into disease subtypes and future functional studies.


Sign in / Sign up

Export Citation Format

Share Document