variant call
Recently Published Documents


TOTAL DOCUMENTS

56
(FIVE YEARS 35)

H-INDEX

7
(FIVE YEARS 2)

2021 ◽  
Author(s):  
VIVEKANANDA SARANGI ◽  
Yeongjun Jang ◽  
Milovan Suvakov ◽  
Taejeong Bae ◽  
Liana Fasching ◽  
...  

Accurate discovery of somatic mutations in a cell is a challenge that partially lays in immaturity of dedicated analytical approaches. Approaches comparing cell’s genome to a control bulk sample miss common mutations, while approaches to find such mutations from bulk suffer from low sensitivity. We developed a tool, All 2, which enables accurate filtering of mutations in a cell from exhaustive comparison of cells’ genomes to each other without data for bulk(s). Based on all pair-wise comparisons, every variant call (point mutation, indel, and structural variant) is classified as either a germline variant, mosaic mutation, or false positive. As All 2 allows for considering dropped-out regions, it is applicable to whole genome and exome analysis of cloned and amplified cells. By applying the approach to a variety of available data, we showed that its application reduces false positives, enables sensitive discovery of high frequency mutations, and is indispensable for conducting high resolution cell lineage tracing. All 2 is freely available at https://github.com/abyzovlab/All2 .


2021 ◽  
Vol 7 (8) ◽  
Author(s):  
Stephen J. Bush

Minimizing false positives is a critical issue when variant calling as no method is without error. It is common practice to post-process a variant-call file (VCF) using hard filter criteria intended to discriminate true-positive (TP) from false-positive (FP) calls. These are applied on the simple principle that certain characteristics are disproportionately represented among the set of FP calls and that a user-chosen threshold can maximize the number detected. To provide guidance on this issue, this study empirically characterized all false SNP and indel calls made using real Illumina sequencing data from six disparate species and 166 variant-calling pipelines (the combination of 14 read aligners with up to 13 different variant callers, plus four ‘all-in-one’ pipelines). We did not seek to optimize filter thresholds but instead to draw attention to those filters of greatest efficacy and the pipelines to which they may most usefully be applied. In this respect, this study acts as a coda to our previous benchmarking evaluation of bacterial variant callers, and provides general recommendations for effective practice. The results suggest that, of the pipelines analysed in this study, the most straightforward way of minimizing false positives would simply be to use Snippy. We also find that a disproportionate number of false calls, irrespective of the variant-calling pipeline, are located in the vicinity of indels, and highlight this as an issue for future development.


2021 ◽  
Author(s):  
Henry O Ebili ◽  
Adedeji OJ Agboola ◽  
Emad Rakha

Aim: To demonstrate that MSI-WES is an accurate testing method for microsatellite instability (MSI). Materials & methods: Microsatellite-based indels were counted in the variant call-formatted whole exome sequencing (WES) data of 441 gastric cancer cases using Unix-based algorithms, and the counts expressed as a fraction of the genome sequenced to obtain next-generation sequencing-based MSI indices. Results: The next-generation sequencing-based MSI indices showed a near-perfect concordance with PCR-based MSI status, and moderate to good correlations with the molecular targets of MSI index, MLH1 expression and MLH1 methylation status, at a level comparable to the strengths of correlation between PCR-based MSI status and molecular targets of MSI index/ MLH1 expression and methylation. Conclusion: MSI-WES is a valid, adequate and sensitive approach for testing MSI in cancer.


2021 ◽  
Author(s):  
Frank David Vogt ◽  
Gautam Shirsekar ◽  
Detlef Weigel

We present a new software package vcf2gwas to perform reproducible genome-wide association studies (GWAS). vcf2gwas is a Python API for bcftools, PLINK and GEMMA. Before running the analysis a traditional GWAS workflow requires the user to edit and format the genotype information from commonly used Variant Call Format (VCF) file and phenotype information. Post-processing steps involve summarizing and visualizing the analysis results. This workflow requires a user to utilize the command-line, manual text-editing and knowledge of one or more programming/scripting languages which can be time-consuming especially when analyzing multiple phenotypes. Our package provides a convenient pipeline performing all of these steps, reducing the GWAS workflow to a single command-line input without the need to edit or format the VCF file beforehand or to install any additional software. In addition, features like reducing the dimensionality of the phenotypic space and performing analyses on the reduced dimensions or comparing the significant variants from the results to specific genes/regions of interest are implemented. By integrating different tools to perform GWAS under one workflow, the package ensures reproducible GWAS while reducing the user efforts significantly


2021 ◽  
Author(s):  
Erik Garrison ◽  
Zev N Kronenberg ◽  
Eric T Dawson ◽  
Brent S Pedersen ◽  
Pjotr Prins

Since its introduction in 2011 the variant call format (VCF) has been widely adopted for processing DNA and RNA variants in practically all population studies --- as well as in somatic and germline mutation studies. VCF can present single nucleotide variants, multi-nucleotide variants, insertions and deletions, and simple structural variants called against a reference genome. Here we present over 125 useful and much used free and open source software tools and libraries, part of vcflib tools and bio-vcf. We also highlight cyvcf2, hts-nim and slivar tools. Application is typically in the comparison, filtering, normalisation, smoothing, annotation, statistics, visualisation and exporting of variants. Our tools run daily and invisibly in pipelines and countless shell scripts. Our tools are part of a wider bioinformatics ecosystem and we consider it very important to make these tools available as free and open source software to all bioinformaticians so they can be deployed through software distributions, such as Debian, GNU Guix and Bioconda. vcflib, for example, was installed over 40,000 times and bio-vcf was installed over 15,000 times through Bioconda by December 2020. We shortly discuss the design of VCF, lessons learnt, and how we can address more complex variation that can not easily be represented by the VCF format. All source code is published under free and open source software licenses and can be downloaded and installed from https://github.com/vcflib.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Gavin W. Wilson ◽  
Mathieu Derouet ◽  
Gail E. Darling ◽  
Jonathan C. Yeung

AbstractIdentifying single nucleotide variants has become common practice for droplet-based single-cell RNA-seq experiments; however, presently, a pipeline does not exist to maximize variant calling accuracy. Furthermore, molecular duplicates generated in these experiments have not been utilized to optimally detect variant co-expression. Herein, we introduce scSNV designed from the ground up to “collapse” molecular duplicates and accurately identify variants and their co-expression. We demonstrate that scSNV is fast, with a reduced false-positive variant call rate, and enables the co-detection of genetic variants and A>G RNA edits across twenty-two samples.


Author(s):  
Sebastian Deorowicz ◽  
Agnieszka Danek ◽  
Marek Kokot

Abstract Summary Variant Call Format (VCF) files with results of sequencing projects take a lot of space. We propose the VCFShark, which is able to compress VCF files up to an order of magnitude better than the de facto standards (gzipped VCF and BCF). The advantage over competitors is the greatest when compressing VCF files containing large amounts of genotype data. The processing speeds up to 100 MB/s and main memory requirements lower than 30 GB allow to use our tool at typical workstations even for large datasets. Availability and implementation https://github.com/refresh-bio/vcfshark. Supplementary information Supplementary data are available at Bioinformatics online.


2021 ◽  
Author(s):  
Evan McCartney-Melstad ◽  
Ke Bi ◽  
James Han ◽  
Catherine K. Foo

AbstractThe quality of genotyping calls resulting from DNA sequencing is reliant on high quality starting genetic material. One factor that can reduce sample quality and lead to misleading genotyping results is genetic contamination of a sample by another source, such as cells or DNA from another sample of the same or different species. Cross-sample contamination by individuals of the same species is particularly difficult to detect in DNA sequencing data, because the contaminating sequence reads look very similar to those of the intended base sample. We introduce a new method that uses a support vector regression model trained on in silico contaminated datasets to predict empirical contamination using a collection of variables drawn from VCF files, including the fraction of sites that are heterozygous, the fraction of heterozygous sites with imbalanced allele counts, and parameters describing distributions fit to heterozygous allele fractions in a sample. We use the method described here to train a model that can accurately predict the extent of cross-sample contamination within 1% of the actual fraction, for simulated contaminated samples in the 0-5% contamination range, directly from the VCF file.DefinitionsLesser alleleThe allele in a heterozygous position that received less sequencing read support (which may be either the REF or ALT allele).Lesser allele fraction (LAF)The number of sequencing reads supporting the less frequently observed allele divided by the sum of reads supporting both alleles in the genotype at a given genomic position.


Genes ◽  
2021 ◽  
Vol 12 (3) ◽  
pp. 384
Author(s):  
Sara Castellano ◽  
Federica Cestari ◽  
Giovanni Faglioni ◽  
Elena Tenedini ◽  
Marco Marino ◽  
...  

The rapid evolution of Next Generation Sequencing in clinical settings, and the resulting challenge of variant reinterpretation given the constantly updated information, require robust data management systems and organized approaches. In this paper, we present iVar: a freely available and highly customizable tool with a user-friendly web interface. It represents a platform for the unified management of variants identified by different sequencing technologies. iVar accepts variant call format (VCF) files and text annotation files and elaborates them, optimizing data organization and avoiding redundancies. Updated annotations can be periodically re-uploaded and associated with variants as historically tracked attributes, i.e., modifications can be recorded whenever an updated value is imported, thus keeping track of all changes. Data can be visualized through variant-centered and sample-centered interfaces. A customizable search function can be exploited to periodically check if pathogenicity-related data of a variant has changed over time. Patient recontacting ensuing from variant reinterpretation is made easier by iVar through the effective identification of all patients present in the database carrying a specific variant. We tested iVar by uploading 4171 VCF files and 1463 annotation files, obtaining a database of 4166 samples and 22,569 unique variants. iVar has proven to be a useful tool with good performance in terms of collecting and managing data from a medium-throughput laboratory.


Sign in / Sign up

Export Citation Format

Share Document