scholarly journals Predicting functional effect of missense variants using graph attention neural networks

2021 ◽  
Author(s):  
Haicang Zhang ◽  
Michelle S. Xu ◽  
Wendy K. Chung ◽  
Yufeng Shen

AbstractAccurate prediction of damaging missense variants is critically important for interpretating genome sequence. While many methods have been developed, their performance has been limited. Recent progress in machine learning and availability of large-scale population genomic sequencing data provide new opportunities to significantly improve computational predictions. Here we describe gMVP, a new method based on graph attention neural networks. Its main component is a graph with nodes capturing predictive features of amino acids and edges weighted by coevolution strength, which enables effective pooling of information from local protein sequence context and functionally correlated distal positions. Evaluated by deep mutational scan data, gMVP outperforms published methods in identifying damaging variants in TP53, PTEN, BRCA1, and MSH2. Additionally, it achieves the best separation of de novo missense variants in neurodevelopmental disorder cases from the ones in controls. Finally, the model supports transfer learning to optimize gain- and loss-of-function predictions in sodium and calcium channels. In summary, we demonstrate that gMVP can improve interpretation of missense variants in clinical testing and genetic studies.

2019 ◽  
Author(s):  
Elliott Rees ◽  
Jun Han ◽  
Joanne Morgan ◽  
Noa Carrera ◽  
Valentina Escott-Price ◽  
...  

AbstractSchizophrenia is a highly polygenic disorder with important contributions coming from both common and rare risk alleles, the latter including CNVs and rare coding variants (RCVs), sometimes occurring as de novo variants (DNVs). We performed DNV analysis in whole exome-sequencing data obtained from a new sample of 613 schizophrenia trios, and combined this with published data for a total of 3,444 trios. Loss-of-function (LoF) DNVs were significantly enriched among 3,488 LoF intolerant genes in our new trio data (rate ratio (RR) (95% CI) = 2.23 (1.31, 3.79); p = 2.2 × 10−3), supporting previous findings. In the full dataset, genes associated with neurodevelopmental disorders (NDD; n=160) were significantly enriched for LoF DNVs (RR (95% CI) = 3.32 (2.0, 5.21); p = 7.4 × 10−6). Within this set of NDD genes, SLC6A1, encoding a gamma-aminobutyric acid transporter, was associated with missense-damaging DNVs (p = 5.2 × 10−5). Using data from a subset of 1,122 trios for which we had genome-wide common variant data, schizophrenia polygenic risk was significantly over-transmitted to probands (p = 2.6 × 10−60), as was bipolar disorder common variant polygenic risk (p = 5.7 × 10−17). We defined carriers of candidate schizophrenia-related DNVs as those with LoF or deletion DNVs in LoF intolerant or NDD genes. These individuals had significantly less over-transmission of common risk alleles than non-carriers (p = 3.5 × 10−4), providing robust support for the hypothesis that this set of DNVs is enriched for those related to schizophrenia.


Author(s):  
Lot Snijders Blok ◽  
Arianna Vino ◽  
Joery den Hoed ◽  
Hunter R. Underhill ◽  
Danielle Monteil ◽  
...  

Abstract Purpose Heterozygous pathogenic variants in various FOXP genes cause specific developmental disorders. The phenotype associated with heterozygous variants in FOXP4 has not been previously described. Methods We assembled a cohort of eight individuals with heterozygous and mostly de novo variants in FOXP4: seven individuals with six different missense variants and one individual with a frameshift variant. We collected clinical data to delineate the phenotypic spectrum, and used in silico analyses and functional cell-based assays to assess pathogenicity of the variants. Results We collected clinical data for six individuals: five individuals with a missense variant in the forkhead box DNA-binding domain of FOXP4, and one individual with a truncating variant. Overlapping features included speech and language delays, growth abnormalities, congenital diaphragmatic hernia, cervical spine abnormalities, and ptosis. Luciferase assays showed loss-of-function effects for all these variants, and aberrant subcellular localization patterns were seen in a subset. The remaining two missense variants were located outside the functional domains of FOXP4, and showed transcriptional repressor capacities and localization patterns similar to the wild-type protein. Conclusion Collectively, our findings show that heterozygous loss-of-function variants in FOXP4 are associated with an autosomal dominant neurodevelopmental disorder with speech/language delays, growth defects, and variable congenital abnormalities.


2022 ◽  
Vol 14 ◽  
Author(s):  
Li Shu ◽  
Neng Xiao ◽  
Jiong Qin ◽  
Qi Tian ◽  
Yanghui Zhang ◽  
...  

Objective: To prove microtubule associated serine/threonine kinase 3 (MAST3) gene is associated with neurodevelopmental diseases (NDD) and the genotype-phenotype correlation.Methods: Trio exome sequencing (trio ES) was performed on four NDD trios. Bioinformatic analysis was conducted based on large-scale genome sequencing data and human brain transcriptomic data. Further in vivo zebrafish studies were performed.Results: In our study, we identified four de novo MAST3 variants (NM_015016.1: c.302C > T:p.Ser101Phe; c.311C > T:p.Ser104Leu; c.1543G > A:p.Gly515Ser; and c.1547T > C:p.Leu516Pro) in four patients with developmental and epileptic encephalopathy (DEE) separately. Clinical heterogeneities were observed in patients carrying variants in domain of unknown function (DUF) and serine-threonine kinase (STK) domain separately. Using the published large-scale exome sequencing data, higher CADD scores of missense variants in DUF domain were found in NDD cohort compared with gnomAD database. In addition, we obtained an excess of missense variants in DUF domain when compared autistic spectrum disorder (ASD) cohort with gnomAD database, similarly an excess of missense variants in STK domain when compared DEE cohort with gnomAD database. Based on Brainspan datasets, we showed that MAST3 expression was significantly upregulated in ASD and DEE-related brain regions and was functionally linked with DEE genes. In zebrafish model, abnormal morphology of central nervous system was observed in mast3a/b crispants.Conclusion: Our results support the possibility that MAST3 is a novel gene associated with NDD which could expand the genetic spectrum for NDD. The genotype-phenotype correlation may contribute to future genetic counseling.


2017 ◽  
Vol 3 (6) ◽  
pp. e200 ◽  
Author(s):  
Ralph D. Hector ◽  
Vera M. Kalscheuer ◽  
Friederike Hennig ◽  
Helen Leonard ◽  
Jenny Downs ◽  
...  

Objective:To provide new insights into the interpretation of genetic variants in a rare neurologic disorder, CDKL5 deficiency, in the contexts of population sequencing data and an updated characterization of the CDKL5 gene.Methods:We analyzed all known potentially pathogenic CDKL5 variants by combining data from large-scale population sequencing studies with CDKL5 variants from new and all available clinical cohorts and combined this with computational methods to predict pathogenicity.Results:The study has identified several variants that can be reclassified as benign or likely benign. With the addition of novel CDKL5 variants, we confirm that pathogenic missense variants cluster in the catalytic domain of CDKL5 and reclassify a purported missense variant as having a splicing consequence. We provide further evidence that missense variants in the final 3 exons are likely to be benign and not important to disease pathology. We also describe benign splicing and nonsense variants within these exons, suggesting that isoform hCDKL5_5 is likely to have little or no neurologic significance. We also use the available data to make a preliminary estimate of minimum incidence of CDKL5 deficiency.Conclusions:These findings have implications for genetic diagnosis, providing evidence for the reclassification of specific variants previously thought to result in CDKL5 deficiency. Together, these analyses support the view that the predominant brain isoform in humans (hCDKL5_1) is crucial for normal neurodevelopment and that the catalytic domain is the primary functional domain.


2019 ◽  
Author(s):  
Joseph Park ◽  
Nathan Katz ◽  
Xinyuan Zhang ◽  
Anastasia M Lucas ◽  
Anurag Verma ◽  
...  

AbstractBackgroundBy coupling large-scale DNA sequencing with electronic health records (EHR), “genome-first” approaches can enhance our understanding of the contribution of rare genetic variants to disease. Aggregating rare, loss-of-function variants in a candidate gene into a “gene burden” to test for association with EHR phenotypes can identify both known and novel clinical implications for the gene in human disease. However, this methodology has not yet been applied on both an exome-wide and phenome-wide scale, and the clinical ontologies of rare loss-of-function variants in many genes have yet to be described.MethodsWe leveraged whole exome sequencing (WES) data in participants (N=11,451) in the Penn Medicine Biobank (PMBB) to address on an exome-wide scale the association of a burden of rare loss-of-function variants in each gene with diverse EHR phenotypes using a phenome-wide association study (PheWAS) approach. For discovery, we collapsed rare (minor allele frequency (MAF) ≤ 0.1%) predicted loss-of-function (pLOF) variants (i.e. frameshift insertions/deletions, gain/loss of stop codon, or splice site disruption) per gene to perform a gene burden PheWAS. Subsequent evaluation of the significant gene burden associations was done by collapsing rare (MAF ≤ 0.1%) missense variants with Rare Exonic Variant Ensemble Learner (REVEL) scores ≥ 0.5 into corresponding yet distinct gene burdens, as well as interrogation of individual low-frequency to common (MAF > 0.1%) pLOF variants and missense variants with REVEL≥ 0.5. We replicated our findings using the UK Biobank’s (UKBB) whole exome sequence dataset (N=49,960).ResultsFrom the pLOF-based discovery phase, we identified 106 gene burdens with phenotype associations at p<10-6 from our exome-by-phenome-wide association studies. Positive-control associations included TTN (cardiomyopathy, p=7.83E-13), MYBPC3 (hypertrophic cardiomyopathy, p=3.48E-15), CFTR (cystic fibrosis, p=1.05E-15), CYP2D6 (adverse effects due to opiates/narcotics, p=1.50E-09), and BRCA2 (breast cancer, p=1.36E-07). Of the 106 genes, 12 gene-phenotype relationships were also detected by REVEL-informed missense-based gene burdens and 19 by single-variant analyses, demonstrating the robustness of these gene-phenotype relationships. Three genes showed evidence of association using both additional methods (BRCA1, CFTR, TGM6), leading to a total of 28 robust gene-phenotype associations within PMBB. Furthermore, replication studies in UKBB validated 30 of 106 gene burden associations, of which 12 demonstrated robustness in PMBB.ConclusionOur study presents 12 exome-by-phenome-wide robust gene-phenotype associations, which include three proof-of-concept associations and nine novel findings. We show the value of aggregating rare pLOF variants into gene burdens on an exome-wide scale for unbiased association with EHR phenotypes to identify novel clinical ontologies of human genes. Furthermore, we show the significance of evaluating gene burden associations through complementary, yet non-overlapping genetic association studies from the same dataset. Our results suggest that this approach applied to even larger cohorts of individuals with WES or whole-genome sequencing data linked to EHR phenotype data will yield many new insights into the relationship of genetic variation and disease phenotypes.


2021 ◽  
Vol 13 (1) ◽  
Author(s):  
Ilaria Mannucci ◽  
Nghi D. P. Dang ◽  
Hannes Huber ◽  
Jaclyn B. Murry ◽  
Jeff Abramson ◽  
...  

Abstract Background We aimed to define the clinical and variant spectrum and to provide novel molecular insights into the DHX30-associated neurodevelopmental disorder. Methods Clinical and genetic data from affected individuals were collected through Facebook-based family support group, GeneMatcher, and our network of collaborators. We investigated the impact of novel missense variants with respect to ATPase and helicase activity, stress granule (SG) formation, global translation, and their effect on embryonic development in zebrafish. SG formation was additionally analyzed in CRISPR/Cas9-mediated DHX30-deficient HEK293T and zebrafish models, along with in vivo behavioral assays. Results We identified 25 previously unreported individuals, ten of whom carry novel variants, two of which are recurrent, and provide evidence of gonadal mosaicism in one family. All 19 individuals harboring heterozygous missense variants within helicase core motifs (HCMs) have global developmental delay, intellectual disability, severe speech impairment, and gait abnormalities. These variants impair the ATPase and helicase activity of DHX30, trigger SG formation, interfere with global translation, and cause developmental defects in a zebrafish model. Notably, 4 individuals harboring heterozygous variants resulting either in haploinsufficiency or truncated proteins presented with a milder clinical course, similar to an individual harboring a de novo mosaic HCM missense variant. Functionally, we established DHX30 as an ATP-dependent RNA helicase and as an evolutionary conserved factor in SG assembly. Based on the clinical course, the variant location, and type we establish two distinct clinical subtypes. DHX30 loss-of-function variants cause a milder phenotype whereas a severe phenotype is caused by HCM missense variants that, in addition to the loss of ATPase and helicase activity, lead to a detrimental gain-of-function with respect to SG formation. Behavioral characterization of dhx30-deficient zebrafish revealed altered sleep-wake activity and social interaction, partially resembling the human phenotype. Conclusions Our study highlights the usefulness of social media to define novel Mendelian disorders and exemplifies how functional analyses accompanied by clinical and genetic findings can define clinically distinct subtypes for ultra-rare disorders. Such approaches require close interdisciplinary collaboration between families/legal representatives of the affected individuals, clinicians, molecular genetics diagnostic laboratories, and research laboratories.


Author(s):  
Yuri A. Zarate ◽  
Tomoko Uehara ◽  
Kota Abe ◽  
Masayuki Oginuma ◽  
Sora Harako ◽  
...  

Author(s):  
Elisabeth Bosch ◽  
Moritz Hebebrand ◽  
Bernt Popp ◽  
Theresa Penger ◽  
Bettina Behring ◽  
...  

Abstract Context CPE encodes carboxypeptidase E, an enzyme which converts proneuropeptides and propeptide hormones to bioactive forms. It is widely expressed in the endocrine and central nervous system. To date, four individuals from two families with core clinical features including morbid obesity, neurodevelopmental delay and hypogonadotropic hypogonadism, harbouring biallelic loss-of-function CPE variants, were reported. Objective We describe four affected individuals from three unrelated consanguineous families, two siblings of Syrian, one of Egyptian and one of Pakistani descent, all harbouring novel homozygous CPE loss-of-function variants. Methods After excluding Prader-Willi syndrome, exome sequencing was performed in both Syrian siblings. The variants identified in the other two individuals were reported as research variants in a large scale exome study and in ClinVar database. Computational modelling of all possible missense alterations allowed assessing CPE tolerance to missense variants. Results All affected individuals were severely obese with neurodevelopmental delay and other endocrine anomalies. Three individuals from two families shared the same CPE homozygous truncating variant c.361C&gt;T, p.(Arg121*), while the fourth carried the c.994del, p.(Ser333Alafs*22) variant. Comparison of clinical features with previously described cases and standardization according to the Human Phenotype Ontology indicated a recognisable clinical phenotype, which we termed Blakemore-Durmaz-Vasileiou (BDV) syndrome. Computational analysis indicated high conservation of CPE domains and intolerance to missense changes. Conclusions Biallelic truncating CPE variants are associated with BDV syndrome, a clinically recognisable monogenic recessive syndrome with childhood-onset obesity, neurodevelopmental delay, hypogonadotropic hypogonadism and hypothyroidism. BDV syndrome resembles Prader-Willi syndrome. Our findings suggested that missense variants may also be clinically relevant.


2021 ◽  
Author(s):  
Jet van der Spek ◽  
Joery den Hoed ◽  
Lot Snijders Blok ◽  
Alexander J. M. Dingemans ◽  
Dick Schijven ◽  
...  

Interpretation of next-generation sequencing data of individuals with an apparent sporadic neurodevelopmental disorder (NDD) often focusses on pathogenic variants in genes associated with NDD, assuming full clinical penetrance with limited variable expressivity. Consequently, inherited variants in genes associated with dominant disorders may be overlooked when the transmitting parent is clinically unaffected. While de novo variants explain a substantial proportion of cases with NDDs, a significant number remains undiagnosed possibly explained by coding variants associated with reduced penetrance and variable expressivity. We characterized twenty families with inherited heterozygous missense or protein-truncating variants (PTVs) in CHD3, a gene in which de novo variants cause Snijders Blok-Campeau syndrome, characterized by intellectual disability, speech delay and recognizable facial features (SNIBCPS). Notably, the majority of the inherited CHD3 variants were maternally transmitted. Computational facial and human phenotype ontology-based comparisons demonstrated that the phenotypic features of probands with inherited CHD3 variants overlap with the phenotype previously associated with de novo variants in the gene, while carrier parents are mildly or not affected, suggesting variable expressivity. Additionally, similarly reduced expression levels of CHD3 protein in cells of an affected proband and of related healthy carriers with a CHD3 PTV, suggested that compensation of expression from the wildtype allele is unlikely to be an underlying mechanism. Our results point to a significant role of inherited variation in SNIBCPS, a finding that is critical for correct variant interpretation and genetic counseling and warrants further investigation towards understanding the broader contributions of such variation to the landscape of human disease.


2021 ◽  
Author(s):  
Xueya Zhou ◽  
Pamela Feliciano ◽  
Tianyun Wang ◽  
Irina Astrovskaya ◽  
Chang Shu ◽  
...  

AbstractDespite the known heritable nature of autism spectrum disorder (ASD), studies have primarily identified risk genes with de novo variants (DNVs). To capture the full spectrum of ASD genetic risk, we performed a two-stage analysis of rare de novo and inherited coding variants in 42,607 ASD cases, including 35,130 new cases recruited online by SPARK. In the first stage, we analyzed 19,843 cases with one or both biological parents and found that known ASD or neurodevelopmental disorder (NDD) risk genes explain nearly 70% of the genetic burden conferred by DNVs. In contrast, less than 20% of genetic risk conferred by rare inherited loss-of-function (LoF) variants are explained by known ASD/NDD genes. We selected 404 genes based on the first stage of analysis and performed a meta-analysis with an additional 22,764 cases and 236,000 population controls. We identified 60 genes with exome-wide significance (p < 2.5e-6), including five new risk genes (NAV3, ITSN1, MARK2, SCAF1, and HNRNPUL2). The association of NAV3 with ASD risk is entirely driven by rare inherited LoFs variants, with an average relative risk of 4, consistent with moderate effect. ASD individuals with LoF variants in the four moderate risk genes (NAV3, ITSN1, SCAF1, and HNRNPUL2, n = 95) have less cognitive impairment compared to 129 ASD individuals with LoF variants in well-established, highly penetrant ASD risk genes (CHD8, SCN2A, ADNP, FOXP1, SHANK3) (59% vs. 88%, p= 1.9e-06). These findings will guide future gene discovery efforts and suggest that much larger numbers of ASD cases and controls are needed to identify additional genes that confer moderate risk of ASD through rare, inherited variants.


Sign in / Sign up

Export Citation Format

Share Document