scholarly journals Localized structural frustration for evaluating the impact of sequence variants

2016 ◽  
Author(s):  
Sushant Kumar ◽  
Declan Clarke ◽  
Mark Gerstein

AbstractThe rapidly declining costs of sequencing human genomes and exomes are providing deeper insights into genomic variation than previously possible. Growing sequence datasets are uncovering large numbers of rare single-nucleotide variants (SNVs) in coding regions, many of which may even be unique to single individuals. The rarity of such variants makes it difficult to use conventional variant-phenotype associations as a means of predicting their potential impacts. As such, protein structures may help to provide the needed means for inferring otherwise difficult-to-discern rare SNV-phenotype associations. Previous efforts have sought to quantify the effects of SNVs on structures by evaluating their impacts on global stability. However, local perturbations can severely impact functionality (such as catalysis, allosteric regulation, interactions and specificity) without strongly disrupting global stability. Here, we describe a workflow in which localized frustration (which quantifies unfavorable residue-residue interactions) is employed as a metric to investigate such effects. We apply frustration to study the impacts of a large number of SNVs available throughout a number of next-generation sequencing datasets. Most of our observations are intuitively consistent: we observe that disease-associated SNVs have a strong proclivity to induce strong changes in localized frustration, and rare variants tend to disrupt local interactions to a larger extent than do common variants. Furthermore, we observe that somatic SNVs associated with oncogenes induce stronger perturbations at the surface, whereas those associated with tumor suppressor genes (TSGs) induce stronger perturbations in the interior. These findings are consistent with the notion that gain-of-function (for oncogenes) and loss-of-function events (for TSGs) may act through changes in regulatory interactions and basic functionality, respectively

2013 ◽  
Vol 44 (21) ◽  
Author(s):  
Sushant Kumar ◽  
Declan Clarke ◽  
Mark Gerstein

Abstract Population-scale sequencing is increasingly uncovering large numbers of rare single-nucleotide variants (SNVs) in coding regions of the genome. The rarity of these variants makes it challenging to evaluate their deleteriousness with conventional phenotype–genotype associations. Protein structures provide a way of addressing this challenge. Previous efforts have focused on globally quantifying the impact of SNVs on protein stability. However, local perturbations may severely impact protein functionality without strongly disrupting global stability (e.g. in relation to catalysis or allostery). Here, we describe a workflow in which localized frustration, quantifying unfavorable local interactions, is employed as a metric to investigate such effects. Using this workflow on the Protein Databank, we find that frustration produces many immediately intuitive results: for instance, disease-related SNVs create stronger changes in localized frustration than non-disease related variants, and rare SNVs tend to disrupt local interactions to a larger extent than common variants. Less obviously, we observe that somatic SNVs associated with oncogenes and tumor suppressor genes (TSGs) induce very different changes in frustration. In particular, those associated with TSGs change the frustration more in the core than the surface (by introducing loss-of-function events), whereas those associated with oncogenes manifest the opposite pattern, creating gain-of-function events.


2019 ◽  
Vol 47 (W1) ◽  
pp. W136-W141 ◽  
Author(s):  
Emidio Capriotti ◽  
Ludovica Montanucci ◽  
Giuseppe Profiti ◽  
Ivan Rossi ◽  
Diana Giannuzzi ◽  
...  

Abstract As the amount of genomic variation data increases, tools that are able to score the functional impact of single nucleotide variants become more and more necessary. While there are several prediction servers available for interpreting the effects of variants in the human genome, only few have been developed for other species, and none were specifically designed for species of veterinary interest such as the dog. Here, we present Fido-SNP the first predictor able to discriminate between Pathogenic and Benign single-nucleotide variants in the dog genome. Fido-SNP is a binary classifier based on the Gradient Boosting algorithm. It is able to classify and score the impact of variants in both coding and non-coding regions based on sequence features within seconds. When validated on a previously unseen set of annotated variants from the OMIA database, Fido-SNP reaches 88% overall accuracy, 0.77 Matthews correlation coefficient and 0.91 Area Under the ROC Curve.


Author(s):  
Henne Holstege ◽  
Marc Hulsman ◽  
Camille Charbonnier ◽  
Benjamin Grenier-Boley ◽  
Olivier Quenez ◽  
...  

Background: With the development of next-generation sequencing technologies, it is possible to identify rare genetic variants that influence the risk of complex disorders. To date, whole exome sequencing (WES) strategies have shown that specific clusters of damaging rare variants in the TREM2, SORL1 and ABCA7 genes are associated with an increased risk of developing Alzheimers Disease (AD), reaching odds ratios comparable with the APOE-ε4 allele, the main common AD genetic risk factor. Here, we set out to identify additional AD-associated genes by an exome-wide investigation of the burden of rare damaging variants in the genomes of AD cases and cognitively healthy controls. Method: We integrated the data from 25,982 samples from the European ADES consortium and the American ADSP consortium. We developed new techniques to homogenise and analyse these data. Carriers of pathogenic variants in genes associated with Mendelian inheritance of dementia were excluded. After quality control, we used 12,652 AD cases and 8,693 controls for analysis. Genes were analysed using a burden analysis, including both non-synonymous and loss-of-function rare variants, the impact of which was prioritised using REVEL. Result: We confirmed that carrying rare protein-damaging genetic variants in TREM2, SORL1 or ABCA7 is associated with increased AD-risk. Moreover, we found that carrying rare damaging variants in the microglial ATP8B4 gene was significantly associated with AD, and we found suggestive evidence that rare variants in ADAM10, ABCA1, ORC6, B3GNT4 and SRC genes associated with increased AD risk. High-impact variants in these genes were mostly extremely rare and enriched in AD patients with earlier ages at onset. Additionally, we identified two suggestive protective associations in CBX3 and PRSS3. We are currently replicating these associations in independent datasets. Conclusion: With our newly developed homogenisation methods, we identified novel genetic determinants of AD which provide further evidence for a pivotal role of APP processing, lipid metabolism, and microglia and neuro-inflammatory processes in AD pathophysiology.


2021 ◽  
Author(s):  
Tony Zeng ◽  
Yang I Li

Recent progress in deep learning approaches have greatly improved the prediction of RNA splicing from DNA sequence. Here, we present Pangolin, a deep learning model to predict splice site strength in multiple tissues that has been trained on RNA splicing and sequence data from four species. Pangolin outperforms state of the art methods for predicting RNA splicing on a variety of prediction tasks. We use Pangolin to study the impact of genetic variants on RNA splicing, including lineage-specific variants and rare variants of uncertain significance. Pangolin predicts loss-of-function mutations with high accuracy and recall, particularly for mutations that are not missense or nonsense (AUPRC = 0.93), demonstrating remarkable potential for identifying pathogenic variants.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Sina Rüeger ◽  
◽  
Christian Hammer ◽  
Alexis Loetscher ◽  
Paul J. McLaren ◽  
...  

AbstractEpstein–Barr virus (EBV) is one of the most common viruses latently infecting humans. Little is known about the impact of human genetic variation on the large inter-individual differences observed in response to EBV infection. To search for a potential imprint of host genomic variation on the EBV sequence, we jointly analyzed paired viral and human genomic data from 268 HIV-coinfected individuals with CD4 + T cell count < 200/mm3 and elevated EBV viremia. We hypothesized that the reactivated virus circulating in these patients could carry sequence variants acquired during primary EBV infection, thereby providing a snapshot of early adaptation to the pressure exerted on EBV by the individual immune response. We searched for associations between host and pathogen genetic variants, taking into account human and EBV population structure. Our analyses revealed significant associations between human and EBV sequence variation. Three polymorphic regions in the human genome were found to be associated with EBV variation: one at the amino acid level (BRLF1:p.Lys316Glu); and two at the gene level (burden testing of rare variants in BALF5 and BBRF1). Our findings confirm that jointly analyzing host and pathogen genomes can identify sites of genomic interactions, which could help dissect pathogenic mechanisms and suggest new therapeutic avenues.


2021 ◽  
Author(s):  
Peter Gergics ◽  
Cathy Smith ◽  
Hironori Bando ◽  
Alexander A. L. Jorge ◽  
Denise Rockstroh-Lippold ◽  
...  

AbstractPituitary hormone deficiency occurs in ∼1:4,000 live births. Approximately 3% of the cases are due to mutations in the alpha isoform of POU1F1, a pituitary-specific transcriptional activator. We found four separate heterozygous missense variants in unrelated hypopituitarism patients that were predicted to affect a minor isoform, POU1F1 beta, which can act as a transcriptional repressor. These variants retain repressor activity, but they shift splicing to favor the expression of the beta isoform, resulting in dominant negative loss of function. Using a high throughput splicing reporter assay, we tested 1,080 single nucleotide variants in POU1F1. We identified 113 splice disruptive variants, including 23 synonymous variants. We evaluated separate cohorts of hypopituitarism patients and found two different synonymous splice disruptive variants that co-segregate with hypopituitarism. This study underlines the importance of evaluating the impact of variants on splicing and provides a catalog for interpretation of variants of unknown significance in the POU1F1 gene.


2020 ◽  
Author(s):  
Sina Rüeger ◽  
Christian Hammer ◽  
Alexis Loetscher ◽  
Paul J McLaren ◽  
Dylan Lawless ◽  
...  

AbstractEpstein-Barr virus (EBV) is one of the most common viruses latently infecting humans. Little is known about the impact of human genetic variation on the large inter-individual differences observed in response to EBV infection. To search for a potential imprint of host genomic variation on the EBV sequence, we jointly analyzed paired viral and human genomic data from 268 HIV-coinfected individuals with CD4+ T cell count <200/mm3 and elevated EBV viremia. We hypothesized that the reactivated virus circulating in these patients could carry sequence variants acquired during primary EBV infection, thereby providing a snapshot of early adaptation to the pressure exerted on EBV by the individual immune response. We searched for associations between host and pathogen genetic variants, taking into account human and EBV population structure. Our analyses revealed significant associations between human and EBV sequence variation. Three polymorphic regions in the human genome were found to be associated with EBV variation: one at the amino acid level (BRLF1:p.Lys316Glu); and two at the gene level (burden testing of rare variants in BALF5 and BBRF1). Our findings confirm that jointly analyzing host and pathogen genomes can identify sites of genomic interactions, which could help dissect pathogenic mechanisms and suggest new therapeutic avenues.


2019 ◽  
Vol 56 (7) ◽  
pp. 453-460 ◽  
Author(s):  
Irene Lopez-Perolio ◽  
Raphaël Leman ◽  
Raquel Behar ◽  
Vanessa Lattimore ◽  
John F Pearson ◽  
...  

BackgroundPALB2 monoallelic loss-of-function germ-line variants confer a breast cancer risk comparable to the average BRCA2 pathogenic variant. Recommendations for risk reduction strategies in carriers are similar. Elaborating robust criteria to identify loss-of-function variants in PALB2—without incurring overprediction—is thus of paramount clinical relevance. Towards this aim, we have performed a comprehensive characterisation of alternative splicing in PALB2, analysing its relevance for the classification of truncating and splice site variants according to the 2015 American College of Medical Genetics and Genomics-Association for Molecular Pathology guidelines.MethodsAlternative splicing was characterised in RNAs extracted from blood, breast and fimbriae/ovary-related human specimens (n=112). RNAseq, RT-PCR/CE and CloneSeq experiments were performed by five contributing laboratories. Centralised revision/curation was performed to assure high-quality annotations. Additional splicing analyses were performed in PALB2 c.212–1G>A, c.1684+1G>A, c.2748+2T>G, c.3113+5G>A, c.3350+1G>A, c.3350+4A>C and c.3350+5G>A carriers. The impact of the findings on PVS1 status was evaluated for truncating and splice site variant.ResultsWe identified 88 naturally occurring alternative splicing events (81 newly described), including 4 in-frame events predicted relevant to evaluate PVS1 status of splice site variants. We did not identify tissue-specific alternate gene transcripts in breast or ovarian-related samples, supporting the clinical relevance of blood-based splicing studies.ConclusionsPVS1 is not necessarily warranted for splice site variants targeting four PALB2 acceptor sites (exons 2, 5, 7 and 10). As a result, rare variants at these splice sites cannot be assumed pathogenic/likely pathogenic without further evidences. Our study puts a warning in up to five PALB2 genetic variants that are currently reported as pathogenic/likely pathogenic in ClinVar.


2015 ◽  
Vol 113 (04) ◽  
pp. 826-837 ◽  
Author(s):  
Matthew L. Jones ◽  
Jane E. Norman ◽  
Neil V. Morgan ◽  
Stuart J. Mundell ◽  
Marie Lordkipanidzé ◽  
...  

SummaryPlatelet responses to activating agonists are influenced by common population variants within or near G protein-coupled receptor (GPCR) genes that affect receptor activity. However, the impact of rare GPCR gene variants is unknown. We describe the rare single nucleotide variants (SNVs) in the coding and splice regions of 18 GPCR genes in 7,595 exomes from the 1,000-genomes and Exome Sequencing Project databases and in 31 cases with inherited platelet function disorders (IPFDs). In the population databases, the GPCR gene target regions contained 740 SNVs (318 synonymous, 410 missense, 7 stop gain and 6 splice region) of which 70 % had global minor allele frequency (MAF) < 0.05 %. Functional annotation using six computational algorithms, experimental evidence and structural data identified 156/740 (21 %) SNVs as potentially damaging to GPCR function, most commonly in regions encoding the transmembrane and C-terminal intracellular receptor domains. In 31 index cases with IPFDs (Gi-pathway defect n=15; secretion defect n=11; thromboxane pathway defect n=3 and complex defect n=2) there were 256 SNVs in the target regions of 15 stimulatory platelet GPCRs (34 unique; 12 with MAF< 1 % and 22 with MAF≥ 1 %). These included rare variants predicting R122H, P258T and V207A substitutions in the P2Y12 receptor that were annotated as potentially damaging, but only partially explained the platelet function defects in each case. Our data highlight that potentially damaging variants in platelet GPCR genes have low individual frequencies, but are collectively abundant in the population. Potentially damaging variants are also present in pedigrees with IPFDs and may contribute to complex laboratory phenotypes.


2020 ◽  
Author(s):  
Wei He ◽  
Helen Wang ◽  
Yanjun Wei ◽  
Zhiyun Jiang ◽  
Yitao Tang ◽  
...  

AbstractThe efficiency of CRISPR/Cas9-mediated protein knockout is determined by three factors: sequence-specific sgRNA activity, frameshift probability, and the characteristics of targeted amino acids. A number of computational methods have been developed for predicting sgRNA efficiency from different perspectives. We propose GuidePro, a two-layer ensemble predictor that enables the integration of multiple predictive methods and feature sets. GuidePro leverages information from DNA sequences, amino acids, and protein structures, and reduces the impact of dataset-specific biases. Tested on independent datasets, GuidePro demonstrated consistent superior performance in predicting phenotypes caused by protein loss-of-function. GuidePro is implemented as a web application for prioritizing sgRNAs that target protein-coding genes in human, monkey and mouse genomes, available at https://bioinformatics.mdanderson.org/apps/GuidePro.


Sign in / Sign up

Export Citation Format

Share Document