variant annotation
Recently Published Documents


TOTAL DOCUMENTS

70
(FIVE YEARS 27)

H-INDEX

15
(FIVE YEARS 2)

2021 ◽  
Author(s):  
Miran Kim ◽  
Su Wang ◽  
Xiaoqian Jiang ◽  
Arif Ozgun Harmanci

Background: Sequencing of thousands of samples provides genetic variants with allele frequencies spanning a very large spectrum and gives invaluable insight for genetic determinants of diseases. Protecting the genetic privacy of participants is challenging as only a few rare variants can easily re-identify an individual among millions. In certain cases, there are policy barriers against sharing genetic data from indigenous populations and stigmatizing conditions. Results: We present SVAT, a method for secure outsourcing of variant annotation and aggregation, which are two basic steps in variant interpretation and detection of causal variants. SVAT uses homomorphic encryption to encrypt the data at the client-side. The data always stays encrypted while it is stored, in-transit, and most importantly while it is analyzed. SVAT makes use of a vectorized data representation to convert annotation and aggregation into efficient vectorized operations in a single framework. Also, SVAT utilizes a secure re-encryption approach so that multiple disparate genotype datasets can be combined for federated aggregation and secure computation of allele frequencies on the aggregated dataset. Conclusions: Overall, SVAT provides a secure, flexible, and practical framework for privacy-aware outsourcing of annotation, filtering, and aggregation of genetic variants. SVAT is publicly available for download from https://github.com/harmancilab/SVAT .


2021 ◽  
Author(s):  
Neel Patel ◽  
Haimeng Bai ◽  
William Bush

A large proportion of non-coding variants are present within binding sites of transcription factors(TFs), which play a significant role in gene regulation. Thus, deriving the impact of non-coding variants on TF binding is the first step towards unravelling their regulatory roles within their associated disease traits. Most of the modern algorithms used for this purpose are based on convolutional neural network(CNN) architectures. However, these models are incapable of capturing the positional effect of different sub-sequences within the TF binding sites on the binding affinity. In this paper, we utilize the attentive gated neural network(AGNet) architecture to build a set of TF-AGNet models for predicting in vivo TF binding intensities in the GM12878 lymphoblastoid cells. These models have novel layers capable of deriving the impact of relative positions of different DNA sub-sequences, within a binding site, on TF binding affinity, and of extracting the most relevant prediction features. We show that the TF-AGNet models are able to outperform conventional CNNs for predicting continuous values of TF binding affinity. We also train additional TF-AGNet models for 20 TFs using data from 4 other cell-lines to assess the generalizability of their prediction accuracy. Lastly, we show that the TF-AGNet based models more accurately classify non-coding variants that significantly affect TF binding compared to models based on 7 variant annotation tools. This accuracy can be leveraged to derive gene regulatory roles of millions of non-coding variants across the genome to further examine their mechanistic associations with complex disease traits.


2021 ◽  
Author(s):  
Daniel Danis ◽  
Julius O.B. Jacobsen ◽  
Parithi Balachandran ◽  
Qihui Zhu ◽  
Feyza Yilmaz ◽  
...  

Structural variants (SVs) are implicated in the etiology of Mendelian diseases but have been systematically underascertained owing to limitations of existing technology. Recent technological advances such as long-read sequencing (LRS) enable more comprehensive detection of SVs, but approaches for clinical prioritization of candidate SVs are needed. Existing computational approaches do not specifically target LRS data, thereby missing a substantial proportion of candidate SVs, and do not provide a unified computational model for assessing all types of SVs. Structural Variant Annotation and Analysis (SvAnna) assesses all classes of SV and their intersection with transcripts and regulatory sequences in the context of topologically associating domains, relating predicted effects on gene function with clinical phenotype data. We show with a collection of 182 published case reports with pathogenic SVs that SvAnna places over 90% of pathogenic SVs in the top ten ranks. The interpretable prioritizations provided by SvAnna will facilitate the widespread adoption of LRS in diagnostic genomics.


2021 ◽  
Vol 12 ◽  
Author(s):  
Fabien Degalez ◽  
Frédéric Jehl ◽  
Kévin Muret ◽  
Maria Bernard ◽  
Frédéric Lecerf ◽  
...  

Most single-nucleotide polymorphisms (SNPs) are located in non-coding regions, but the fraction usually studied is harbored in protein-coding regions because potential impacts on proteins are relatively easy to predict by popular tools such as the Variant Effect Predictor. These tools annotate variants independently without considering the potential effect of grouped or haplotypic variations, often called “multi-nucleotide variants” (MNVs). Here, we used a large RNA-seq dataset to survey MNVs, comprising 382 chicken samples originating from 11 populations analyzed in the companion paper in which 9.5M SNPs— including 3.3M SNPs with reliable genotypes—were detected. We focused our study on in-codon MNVs and evaluate their potential mis-annotation. Using GATK HaplotypeCaller read-based phasing results, we identified 2,965 MNVs observed in at least five individuals located in 1,792 genes. We found 41.1% of them showing a novel impact when compared to the effect of their constituent SNPs analyzed separately. The biggest impact variation flux concerns the originally annotated stop-gained consequences, for which around 95% were rescued; this flux is followed by the missense consequences for which 37% were reannotated with a different amino acid. We then present in more depth the rescued stop-gained MNVs and give an illustration in the SLC27A4 gene. As previously shown in human datasets, our results in chicken demonstrate the value of haplotype-aware variant annotation, and the interest to consider MNVs in the coding region, particularly when searching for severe functional consequence such as stop-gained variants.


2021 ◽  
Author(s):  
Megan Null ◽  
Josée Dupuis ◽  
Christopher R. Gignoux ◽  
Audrey E. Hendricks

AbstractIdentification of rare variant associations is crucial to fully characterize the genetic architecture of complex traits and diseases. Essential in this process is the evaluation of novel methods in simulated data that mirrors the distribution of rare variants and haplotype structure in real data. Additionally, importing real variant annotation enables in silico comparison of methods that focus on putative causal variants, such as rare variant association tests, and polygenic scoring methods. Existing simulation methods are either unable to employ real variant annotation or severely under- or over-estimate the number of singletons and doubletons reducing the ability to generalize simulation results to real studies. We present RAREsim, a flexible and accurate rare variant simulation algorithm. Using parameters and haplotypes derived from real sequencing data, RAREsim efficiently simulates the expected variant distribution and enables real variant annotations. We highlight RAREsim’s utility across various genetic regions, sample sizes, ancestries, and variant classes.


Genes ◽  
2021 ◽  
Vol 12 (3) ◽  
pp. 384
Author(s):  
Sara Castellano ◽  
Federica Cestari ◽  
Giovanni Faglioni ◽  
Elena Tenedini ◽  
Marco Marino ◽  
...  

The rapid evolution of Next Generation Sequencing in clinical settings, and the resulting challenge of variant reinterpretation given the constantly updated information, require robust data management systems and organized approaches. In this paper, we present iVar: a freely available and highly customizable tool with a user-friendly web interface. It represents a platform for the unified management of variants identified by different sequencing technologies. iVar accepts variant call format (VCF) files and text annotation files and elaborates them, optimizing data organization and avoiding redundancies. Updated annotations can be periodically re-uploaded and associated with variants as historically tracked attributes, i.e., modifications can be recorded whenever an updated value is imported, thus keeping track of all changes. Data can be visualized through variant-centered and sample-centered interfaces. A customizable search function can be exploited to periodically check if pathogenicity-related data of a variant has changed over time. Patient recontacting ensuing from variant reinterpretation is made easier by iVar through the effective identification of all patients present in the database carrying a specific variant. We tested iVar by uploading 4171 VCF files and 1463 annotation files, obtaining a database of 4166 samples and 22,569 unique variants. iVar has proven to be a useful tool with good performance in terms of collecting and managing data from a medium-throughput laboratory.


2021 ◽  
Author(s):  
Nicole A. Teran ◽  
Daniel Nachun ◽  
Tiffany Eulalio ◽  
Nicole M. Ferraro ◽  
Craig Smail ◽  
...  

AbstractPrecise interpretation of the effects of protein-truncating variants (PTVs) is important for accurate determination of variant impact. Current methods for assessing the ability of PTVs to induce nonsense-mediated decay (NMD) focus primarily on the position of the variant in the transcript. We used RNA-sequencing of the Genotype Tissue Expression v8 cohort to compute the efficiency of NMD using allelic imbalance for 2,320 rare (genome aggregation database minor allele frequency <=1%) PTVs across 809 individuals in 49 tissues. We created an interpretable predictive model using penalized logistic regression in order to evaluate the comprehensive influence of variant annotation, tissue, and inter-individual variation on NMD. We found that variant position, allele frequency, including ultra-rare and singleton variants, and conservation were predictive of allelic imbalance. Furthermore, we found that NMD effects were highly concordant across tissues and individuals. Due to this high consistency, we demonstrate in silico that utilizing peripheral tissues or cell lines provides accurate prediction of NMD for PTVs.


Author(s):  
John W. Henson ◽  
Robert G. Resta
Keyword(s):  

Author(s):  
Sara Castellano ◽  
Federica Cestari ◽  
Giovanni Faglioni ◽  
Elena Tenedini ◽  
Marco Marino ◽  
...  

The rapid evolution of Next Generation Sequencing in clinical settings and the resulting challenge of variants interpretation in the light of constantly updated information, requires robust data management systems and organized approaches to variant reinterpretation. In this paper, we present iVar: a freely available and highly customizable tool provided with a user-friendly web interface. It represents a platform for the unified management of variants identified by different sequencing technologies. iVar accepts, as input, VCF files and text annotation files and elaborates them, optimizing data organization and avoiding redundancies. Updated annotations can be periodically re-uploaded and associated to variants as historicize attributes. Data can be visualized through variant-centered and sample-centered interfaces. A customizable search functionality can be exploited to periodically check if pathogenicity related data of a variant are changed over time. Patient recontacting ensuing from variant reinterpretation is made easier by iVar through the effective identification of all patients present in the database and carrying a specific variant. We tested iVar by uploading 4171 VCF files and 1463 annotation files, obtaining a database of 4166 samples and 22569 unique variants. iVar has proven to be a useful tool with good performances for collecting and managing data from medium-throughput


2020 ◽  
Vol 3 ◽  
Author(s):  
Jacob Turner ◽  
Travis Johnson ◽  
Bryan Helm ◽  
Karen Pollock ◽  
Kun Huang

Background and Hypothesis:  The objective of this study was to analyze available whole genome sequencing from an adolescent male patient diagnosed with osteosarcoma (OS) in 2014. OS is a primary bone malignancy that most commonly affects the pediatric population. Precision medicine techniques provide new opportunities to improve treatment of OS patients. Pharmaceutical annotation tools such as PharmacoDB and DGIdb can help indicate chemotherapy agents that may benefit patients based on their molecular profiles. We hypothesize that these tools can indicate genome-specific chemotherapy agents for OS after genomic data has been aligned and analyzed.     Project Methods:  A PDX pipeline and retrospective study were performed that identified and compared pharmaceutical treatment options from software tools with the chemotherapy provided. Gene alignment and variant calling were used to process and analyze DNA sequencing data; germline and somatic mutations were also identified. Ensembl VEP was used for variant annotation. PharmacoDB and DGIdb were then applied to identify potentially beneficial medications.    Results:  Gene variant annotation indicated 54 potentially high impact mutations. Of these, DGIdb identified 15 drug-gene interactions. PharmacoDB identified no drugs that target any of the genes containing the 54 high impact mutations. For the entire mutated gene list, DGIdb identified 398 drug-gene interactions. After gene set enrichment, DGIdb identified medications targeting genes of pathways such as “O-glycan processing” and “Diseases of glycosylation”. Potentially harmful variants in the NPRL3 gene were identified. Because NPRL3 is a component of the Gator1 complex that serves as a negative regulator of mammalian target of rapamycin complex 1 (mTORC1), the identified variants in NPRL3 could have played a role in the patient’s OS.    Potential Impact:  This study will foster future collaborations to evaluate the pharmaceutical tool recommendations for this patient’s derived cell lines. These efforts will determine the efficacy of and identify improvements for computational treatment recommendation systems. 


Sign in / Sign up

Export Citation Format

Share Document