variant annotation Latest Research Papers

Background: Sequencing of thousands of samples provides genetic variants with allele frequencies spanning a very large spectrum and gives invaluable insight for genetic determinants of diseases. Protecting the genetic privacy of participants is challenging as only a few rare variants can easily re-identify an individual among millions. In certain cases, there are policy barriers against sharing genetic data from indigenous populations and stigmatizing conditions. Results: We present SVAT, a method for secure outsourcing of variant annotation and aggregation, which are two basic steps in variant interpretation and detection of causal variants. SVAT uses homomorphic encryption to encrypt the data at the client-side. The data always stays encrypted while it is stored, in-transit, and most importantly while it is analyzed. SVAT makes use of a vectorized data representation to convert annotation and aggregation into efficient vectorized operations in a single framework. Also, SVAT utilizes a secure re-encryption approach so that multiple disparate genotype datasets can be combined for federated aggregation and secure computation of allele frequencies on the aggregated dataset. Conclusions: Overall, SVAT provides a secure, flexible, and practical framework for privacy-aware outsourcing of annotation, filtering, and aggregation of genetic variants. SVAT is publicly available for download from https://github.com/harmancilab/SVAT .

Download Full-text

Using attentive gated neural networks to quantify the impact of non-coding variants on transcription factor binding affinity

10.1101/2021.07.30.454350 ◽

2021 ◽

Author(s):

Neel Patel ◽

Haimeng Bai ◽

William Bush

Keyword(s):

Neural Network ◽

Binding Affinity ◽

Binding Sites ◽

Complex Disease ◽

Variant Annotation ◽

Gene Regulatory ◽

Using Data ◽

Coding Variants ◽

The Impact

A large proportion of non-coding variants are present within binding sites of transcription factors(TFs), which play a significant role in gene regulation. Thus, deriving the impact of non-coding variants on TF binding is the first step towards unravelling their regulatory roles within their associated disease traits. Most of the modern algorithms used for this purpose are based on convolutional neural network(CNN) architectures. However, these models are incapable of capturing the positional effect of different sub-sequences within the TF binding sites on the binding affinity. In this paper, we utilize the attentive gated neural network(AGNet) architecture to build a set of TF-AGNet models for predicting in vivo TF binding intensities in the GM12878 lymphoblastoid cells. These models have novel layers capable of deriving the impact of relative positions of different DNA sub-sequences, within a binding site, on TF binding affinity, and of extracting the most relevant prediction features. We show that the TF-AGNet models are able to outperform conventional CNNs for predicting continuous values of TF binding affinity. We also train additional TF-AGNet models for 20 TFs using data from 4 other cell-lines to assess the generalizability of their prediction accuracy. Lastly, we show that the TF-AGNet based models more accurately classify non-coding variants that significantly affect TF binding compared to models based on 7 variant annotation tools. This accuracy can be leveraged to derive gene regulatory roles of millions of non-coding variants across the genome to further examine their mechanistic associations with complex disease traits.

Download Full-text

SvAnna: efficient and accurate pathogenicity prediction for coding and regulatory structural variants in long-read genome sequencing

10.1101/2021.07.14.452267 ◽

2021 ◽

Author(s):

Daniel Danis ◽

Julius O.B. Jacobsen ◽

Parithi Balachandran ◽

Qihui Zhu ◽

Feyza Yilmaz ◽

...

Keyword(s):

Case Reports ◽

Regulatory Sequences ◽

Structural Variants ◽

Variant Annotation ◽

Mendelian Diseases ◽

Topologically Associating Domains ◽

Technological Advances ◽

Pathogenicity Prediction ◽

Phenotype Data ◽

Long Read

Structural variants (SVs) are implicated in the etiology of Mendelian diseases but have been systematically underascertained owing to limitations of existing technology. Recent technological advances such as long-read sequencing (LRS) enable more comprehensive detection of SVs, but approaches for clinical prioritization of candidate SVs are needed. Existing computational approaches do not specifically target LRS data, thereby missing a substantial proportion of candidate SVs, and do not provide a unified computational model for assessing all types of SVs. Structural Variant Annotation and Analysis (SvAnna) assesses all classes of SV and their intersection with transcripts and regulatory sequences in the context of topologically associating domains, relating predicted effects on gene function with clinical phenotype data. We show with a collection of 182 published case reports with pathogenic SVs that SvAnna places over 90% of pathogenic SVs in the top ten ranks. The interpretable prioritizations provided by SvAnna will facilitate the widespread adoption of LRS in diagnostic genomics.

Download Full-text

Watch Out for a Second SNP: Focus on Multi-Nucleotide Variants in Coding Regions and Rescued Stop-Gained

Frontiers in Genetics ◽

10.3389/fgene.2021.659287 ◽

2021 ◽

Vol 12 ◽

Author(s):

Fabien Degalez ◽

Frédéric Jehl ◽

Kévin Muret ◽

Maria Bernard ◽

Frédéric Lecerf ◽

...

Keyword(s):

Companion Paper ◽

Potential Effect ◽

Nucleotide Polymorphisms ◽

Rna Seq ◽

Coding Region ◽

Single Nucleotide ◽

Protein Coding ◽

Variant Annotation ◽

Coding Regions ◽

Reliable Genotypes

Most single-nucleotide polymorphisms (SNPs) are located in non-coding regions, but the fraction usually studied is harbored in protein-coding regions because potential impacts on proteins are relatively easy to predict by popular tools such as the Variant Effect Predictor. These tools annotate variants independently without considering the potential effect of grouped or haplotypic variations, often called “multi-nucleotide variants” (MNVs). Here, we used a large RNA-seq dataset to survey MNVs, comprising 382 chicken samples originating from 11 populations analyzed in the companion paper in which 9.5M SNPs— including 3.3M SNPs with reliable genotypes—were detected. We focused our study on in-codon MNVs and evaluate their potential mis-annotation. Using GATK HaplotypeCaller read-based phasing results, we identified 2,965 MNVs observed in at least five individuals located in 1,792 genes. We found 41.1% of them showing a novel impact when compared to the effect of their constituent SNPs analyzed separately. The biggest impact variation flux concerns the originally annotated stop-gained consequences, for which around 95% were rescued; this flux is followed by the missense consequences for which 37% were reannotated with a different amino acid. We then present in more depth the rescued stop-gained MNVs and give an illustration in the SLC27A4 gene. As previously shown in human datasets, our results in chicken demonstrate the value of haplotype-aware variant annotation, and the interest to consider MNVs in the coding region, particularly when searching for severe functional consequence such as stop-gained variants.

Download Full-text

RAREsim: A simulation method for very rare genetic variants

10.1101/2021.04.13.439644 ◽

2021 ◽

Author(s):

Megan Null ◽

Josée Dupuis ◽

Christopher R. Gignoux ◽

Audrey E. Hendricks

Keyword(s):

Rare Variant ◽

Complex Traits ◽

Rare Variants ◽

Simulated Data ◽

Real Data ◽

Simulation Method ◽

Sequencing Data ◽

Variant Annotation ◽

Causal Variants ◽

Rare Genetic Variants

AbstractIdentification of rare variant associations is crucial to fully characterize the genetic architecture of complex traits and diseases. Essential in this process is the evaluation of novel methods in simulated data that mirrors the distribution of rare variants and haplotype structure in real data. Additionally, importing real variant annotation enables in silico comparison of methods that focus on putative causal variants, such as rare variant association tests, and polygenic scoring methods. Existing simulation methods are either unable to employ real variant annotation or severely under- or over-estimate the number of singletons and doubletons reducing the ability to generalize simulation results to real studies. We present RAREsim, a flexible and accurate rare variant simulation algorithm. Using parameters and haplotypes derived from real sequencing data, RAREsim efficiently simulates the expected variant distribution and enables real variant annotations. We highlight RAREsim’s utility across various genetic regions, sample sizes, ancestries, and variant classes.

Download Full-text

iVar, an Interpretation-Oriented Tool to Manage the Update and Revision of Variant Annotation and Classification

Genes ◽

10.3390/genes12030384 ◽

2021 ◽

Vol 12 (3) ◽

pp. 384

Author(s):

Sara Castellano ◽

Federica Cestari ◽

Giovanni Faglioni ◽

Elena Tenedini ◽

Marco Marino ◽

...

Keyword(s):

Rapid Evolution ◽

Clinical Settings ◽

Data Organization ◽

Web Interface ◽

Variant Call ◽

Variant Annotation ◽

Related Data ◽

Text Annotation ◽

Sequencing Technologies ◽

User Friendly

The rapid evolution of Next Generation Sequencing in clinical settings, and the resulting challenge of variant reinterpretation given the constantly updated information, require robust data management systems and organized approaches. In this paper, we present iVar: a freely available and highly customizable tool with a user-friendly web interface. It represents a platform for the unified management of variants identified by different sequencing technologies. iVar accepts variant call format (VCF) files and text annotation files and elaborates them, optimizing data organization and avoiding redundancies. Updated annotations can be periodically re-uploaded and associated with variants as historically tracked attributes, i.e., modifications can be recorded whenever an updated value is imported, thus keeping track of all changes. Data can be visualized through variant-centered and sample-centered interfaces. A customizable search function can be exploited to periodically check if pathogenicity-related data of a variant has changed over time. Patient recontacting ensuing from variant reinterpretation is made easier by iVar through the effective identification of all patients present in the database carrying a specific variant. We tested iVar by uploading 4171 VCF files and 1463 annotation files, obtaining a database of 4166 samples and 22,569 unique variants. iVar has proven to be a useful tool with good performance in terms of collecting and managing data from a medium-throughput laboratory.

Download Full-text

Nonsense-mediated decay is highly stable across individuals and tissues

10.1101/2021.02.03.429654 ◽

2021 ◽

Author(s):

Nicole A. Teran ◽

Daniel Nachun ◽

Tiffany Eulalio ◽

Nicole M. Ferraro ◽

Craig Smail ◽

...

Keyword(s):

Allele Frequency ◽

Allelic Imbalance ◽

Accurate Determination ◽

Tissue Expression ◽

Peripheral Tissues ◽

Variant Annotation ◽

Nonsense Mediated Decay ◽

High Consistency ◽

Penalized Logistic Regression

AbstractPrecise interpretation of the effects of protein-truncating variants (PTVs) is important for accurate determination of variant impact. Current methods for assessing the ability of PTVs to induce nonsense-mediated decay (NMD) focus primarily on the position of the variant in the transcript. We used RNA-sequencing of the Genotype Tissue Expression v8 cohort to compute the efficiency of NMD using allelic imbalance for 2,320 rare (genome aggregation database minor allele frequency <=1%) PTVs across 809 individuals in 49 tissues. We created an interpretable predictive model using penalized logistic regression in order to evaluate the comprehensive influence of variant annotation, tissue, and inter-individual variation on NMD. We found that variant position, allele frequency, including ultra-rare and singleton variants, and conservation were predictive of allelic imbalance. Furthermore, we found that NMD effects were highly concordant across tissues and individuals. Due to this high consistency, we demonstrate in silico that utilizing peripheral tissues or cell lines provides accurate prediction of NMD for PTVs.

Download Full-text

Table 7. Variant annotation

Diagnosis and Management of Hereditary Cancer ◽

10.1016/b978-0-323-90029-4.00007-9 ◽

2021 ◽

pp. 57-61

Author(s):

John W. Henson ◽

Robert G. Resta

Keyword(s):

Variant Annotation

Download Full-text

iVar, an Interpretation-Oriented Tool to Manage the Update and Revision of Variant Annotation and Classification

10.20944/preprints202012.0387.v1 ◽

2020 ◽

Author(s):

Sara Castellano ◽

Federica Cestari ◽

Giovanni Faglioni ◽

Elena Tenedini ◽

Marco Marino ◽

...

Keyword(s):

Rapid Evolution ◽

Clinical Settings ◽

Data Organization ◽

Web Interface ◽

Variant Annotation ◽

Related Data ◽

Text Annotation ◽

Sequencing Technologies ◽

User Friendly ◽

Generation Sequencing

The rapid evolution of Next Generation Sequencing in clinical settings and the resulting challenge of variants interpretation in the light of constantly updated information, requires robust data management systems and organized approaches to variant reinterpretation. In this paper, we present iVar: a freely available and highly customizable tool provided with a user-friendly web interface. It represents a platform for the unified management of variants identified by different sequencing technologies. iVar accepts, as input, VCF files and text annotation files and elaborates them, optimizing data organization and avoiding redundancies. Updated annotations can be periodically re-uploaded and associated to variants as historicize attributes. Data can be visualized through variant-centered and sample-centered interfaces. A customizable search functionality can be exploited to periodically check if pathogenicity related data of a variant are changed over time. Patient recontacting ensuing from variant reinterpretation is made easier by iVar through the effective identification of all patients present in the database and carrying a specific variant. We tested iVar by uploading 4171 VCF files and 1463 annotation files, obtaining a database of 4166 samples and 22569 unique variants. iVar has proven to be a useful tool with good performances for collecting and managing data from medium-throughput

Download Full-text

A Retrospective Comparison of In Silico Pharmaceutical Recommendations with Tumor Board Recommendations in Pediatric Oncology

Proceedings of IMPRS ◽

10.18060/24786 ◽

2020 ◽

Vol 3 ◽

Author(s):

Jacob Turner ◽

Travis Johnson ◽

Bryan Helm ◽

Karen Pollock ◽

Kun Huang

Keyword(s):

Pediatric Population ◽

Gene List ◽

High Impact ◽

Negative Regulator ◽

Tumor Board ◽

Adolescent Male ◽

Treatment Recommendation ◽

Gene Interactions ◽

Variant Annotation ◽

Chemotherapy Agents

Background and Hypothesis: The objective of this study was to analyze available whole genome sequencing from an adolescent male patient diagnosed with osteosarcoma (OS) in 2014. OS is a primary bone malignancy that most commonly affects the pediatric population. Precision medicine techniques provide new opportunities to improve treatment of OS patients. Pharmaceutical annotation tools such as PharmacoDB and DGIdb can help indicate chemotherapy agents that may benefit patients based on their molecular profiles. We hypothesize that these tools can indicate genome-specific chemotherapy agents for OS after genomic data has been aligned and analyzed. Project Methods: A PDX pipeline and retrospective study were performed that identified and compared pharmaceutical treatment options from software tools with the chemotherapy provided. Gene alignment and variant calling were used to process and analyze DNA sequencing data; germline and somatic mutations were also identified. Ensembl VEP was used for variant annotation. PharmacoDB and DGIdb were then applied to identify potentially beneficial medications. Results: Gene variant annotation indicated 54 potentially high impact mutations. Of these, DGIdb identified 15 drug-gene interactions. PharmacoDB identified no drugs that target any of the genes containing the 54 high impact mutations. For the entire mutated gene list, DGIdb identified 398 drug-gene interactions. After gene set enrichment, DGIdb identified medications targeting genes of pathways such as “O-glycan processing” and “Diseases of glycosylation”. Potentially harmful variants in the NPRL3 gene were identified. Because NPRL3 is a component of the Gator1 complex that serves as a negative regulator of mammalian target of rapamycin complex 1 (mTORC1), the identified variants in NPRL3 could have played a role in the patient’s OS. Potential Impact: This study will foster future collaborations to evaluate the pharmaceutical tool recommendations for this patient’s derived cell lines. These efforts will determine the efficacy of and identify improvements for computational treatment recommendation systems.

Download Full-text

variant annotation
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

SVAT: Secure Outsourcing of Variant Annotation and Genotype Aggregation

Using attentive gated neural networks to quantify the impact of non-coding variants on transcription factor binding affinity

SvAnna: efficient and accurate pathogenicity prediction for coding and regulatory structural variants in long-read genome sequencing

Watch Out for a Second SNP: Focus on Multi-Nucleotide Variants in Coding Regions and Rescued Stop-Gained

RAREsim: A simulation method for very rare genetic variants

iVar, an Interpretation-Oriented Tool to Manage the Update and Revision of Variant Annotation and Classification

Nonsense-mediated decay is highly stable across individuals and tissues

Table 7. Variant annotation

iVar, an Interpretation-Oriented Tool to Manage the Update and Revision of Variant Annotation and Classification

A Retrospective Comparison of In Silico Pharmaceutical Recommendations with Tumor Board Recommendations in Pediatric Oncology

Export Citation Format

variant annotationRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

SVAT: Secure Outsourcing of Variant Annotation and Genotype Aggregation

Using attentive gated neural networks to quantify the impact of non-coding variants on transcription factor binding affinity

SvAnna: efficient and accurate pathogenicity prediction for coding and regulatory structural variants in long-read genome sequencing

Watch Out for a Second SNP: Focus on Multi-Nucleotide Variants in Coding Regions and Rescued Stop-Gained

RAREsim: A simulation method for very rare genetic variants

iVar, an Interpretation-Oriented Tool to Manage the Update and Revision of Variant Annotation and Classification

Nonsense-mediated decay is highly stable across individuals and tissues

Table 7. Variant annotation

iVar, an Interpretation-Oriented Tool to Manage the Update and Revision of Variant Annotation and Classification

A Retrospective Comparison of In Silico Pharmaceutical Recommendations with Tumor Board Recommendations in Pediatric Oncology

variant annotation
Recently Published Documents