scholarly journals FastGT: an alignment-free method for calling common SNVs directly from raw sequencing reads

2016 ◽  
Author(s):  
Fanny-Dhelia Pajuste ◽  
Lauris Kaplinski ◽  
Märt Möls ◽  
Tarmo Puurand ◽  
Maarja Lepamets ◽  
...  

We have developed a computational method that counts the frequencies of unique k-mers in FASTQ-formatted genome data and uses this information to infer the genotypes of known variants. FastGT can detect the variants in a 30x genome in less than 1 hour using ordinary low-cost server hardware. The overall concordance with the genotypes of two Illumina “Platinum” genomes1 is 99.96%, and the concordance with the genotypes of the Illumina HumanOmniExpress is 99.82%. Our method provides k-mer database that can be used for the simultaneous genotyping of approximately 30 million single nucleotide variants (SNVs), including >23,000 SNVs from Y chromosome. The source code of FastGT software is available at GitHub (https://github.com/bioinfo-ut/GenomeTester4/).

F1000Research ◽  
2014 ◽  
Vol 2 ◽  
pp. 217 ◽  
Author(s):  
Guillermo Barturen ◽  
Antonio Rueda ◽  
José L. Oliver ◽  
Michael Hackenberg

Whole genome methylation profiling at a single cytosine resolution is now feasible due to the advent of high-throughput sequencing techniques together with bisulfite treatment of the DNA. To obtain the methylation value of each individual cytosine, the bisulfite-treated sequence reads are first aligned to a reference genome, and then the profiling of the methylation levels is done from the alignments. A huge effort has been made to quickly and correctly align the reads and many different algorithms and programs to do this have been created. However, the second step is just as crucial and non-trivial, but much less attention has been paid to the final inference of the methylation states. Important error sources do exist, such as sequencing errors, bisulfite failure, clonal reads, and single nucleotide variants.We developed MethylExtract, a user friendly tool to: i) generate high quality, whole genome methylation maps and ii) detect sequence variation within the same sample preparation. The program is implemented into a single script and takes into account all major error sources. MethylExtract detects variation (SNVs – Single Nucleotide Variants) in a similar way to VarScan, a very sensitive method extensively used in SNV and genotype calling based on non-bisulfite-treated reads. The usefulness of MethylExtract is shown by means of extensive benchmarking based on artificial bisulfite-treated reads and a comparison to a recently published method, called Bis-SNP.MethylExtract is able to detect SNVs within High-Throughput Sequencing experiments of bisulfite treated DNA at the same time as it generates high quality methylation maps. This simultaneous detection of DNA methylation and sequence variation is crucial for many downstream analyses, for example when deciphering the impact of SNVs on differential methylation. An exclusive feature of MethylExtract, in comparison with existing software, is the possibility to assess the bisulfite failure in a statistical way. The source code, tutorial and artificial bisulfite datasets are available at http://bioinfo2.ugr.es/MethylExtract/ and http://sourceforge.net/projects/methylextract/, and also permanently accessible from 10.5281/zenodo.7144.


2020 ◽  
Vol 2 (3) ◽  
Author(s):  
Ross G Murphy ◽  
Aideen C Roddy ◽  
Shambhavi Srivastava ◽  
Esther Baena ◽  
David J Waugh ◽  
...  

Abstract Combining alignment-free methods for phylogenetic analysis with multi-regional sampling using next-generation sequencing can provide an assessment of intra-patient tumour heterogeneity. From multi-regional sampling divergent branching, we validated two different lesions within a patient’s prostate. Where multi-regional sampling has not been used, a single sample from one of these areas could misguide as to which drugs or therapies would best benefit this patient, due to the fact these tumours appear to be genetically different. This application has the power to render, in a fraction of the time used by other approaches, intra-patient heterogeneity and decipher aberrant biomarkers. Another alignment-free method for calling single-nucleotide variants from raw next-generation sequencing samples has determined possible variants and genomic locations that may be able to characterize the differences between the two main branching patterns. Alignment-free approaches have been applied to relevant clinical multi-regional samples and may be considered as a valuable option for comparing and determining heterogeneity to help deliver personalized medicine through more robust efforts in identifying targetable pathways and therapeutic strategies. Our study highlights the application these tools could have on patient-aligned treatment indications.


2020 ◽  
Author(s):  
Denis Touroutine ◽  
Jessica E. Tanis

ABSTRACTWith the widespread use of single nucleotide variants generated through mutagenesis screens, the million mutation project, and genome editing technologies, there is pressing need for an efficient and low-cost strategy to genotype single nucleotide substitutions. We have developed a rapid and inexpensive method for detection of point mutants through optimization of SuperSelective (SS) primers for end point PCR in Caenorhabditis elegans. Each SS primer consists of a 5’ “anchor” that hybridizes to the template, followed by a non-complementary “bridge,” and a “foot” corresponding to the target allele. The foot sequence is short, such that a single mismatch at the terminal 3’ nucleotide destabilizes primer binding and prevents extension, enabling discrimination of different alleles. We explored how length, stability, and sequence composition of each SS primer segment affected selectivity and efficiency in order to develop simple rules for primer design that allow for distinction between any mismatches in various genetic contexts over a broad range of annealing temperatures. Manipulating bridge length affects amplification efficiency, while modifying the foot sequence can increase discriminatory power. Flexibility in the positioning of the anchor enables SS primers to be used for genotyping in regions with sequences that are challenging for standard primer design. In summary, we have demonstrated flexibility in design of SS primers and their utility for genotyping in C. elegans. Since SS primers reliably detect single nucleotide variants, we propose that this method could have broad application for SNP mapping, screening of CRISPR mutants, and colony PCR to identify successful site-directed mutagenesis constructs.


2021 ◽  
Author(s):  
Azza Althagafi ◽  
Lamia Alsubaie ◽  
Nagarajan Kathiresan ◽  
Katsuhiko Mineta ◽  
Taghrid Aloraini ◽  
...  

AbstractMotivationStructural genomic variants account for much of human variability and are involved in several diseases. Structural variants are complex and may affect coding regions of multiple genes, or affect the functions of genomic regions in different ways from single nucleotide variants. Interpreting the phenotypic consequences of structural variants relies on information about gene functions, haploinsufficiency or triplosensitivity, and other genomic features. Phenotype-based methods to identifying variants that are involved in genetic diseases combine molecular features with prior knowledge about the phenotypic consequences of altering gene functions. While phenotype-based methods have been applied successfully to single nucleotide variants, as well as short insertions and deletions, the complexity of structural variants makes it more challenging to link them to phenotypes. Furthermore, structural variants can affect a large number of coding regions, and phenotype information may not be available for all of them.ResultsWe developed DeepSVP, a computational method to prioritize structural variants involved in genetic diseases by combining genomic information with information about gene functions. We incorporate phenotypes linked to genes, functions of gene products, gene expression in individual celltypes, and anatomical sites of expression, and systematically relate them to their phenotypic consequences through ontologies and machine learning. DeepSVP significantly improves the success rate of finding causative variants in several benchmarks and can identify novel pathogenic structural variants in consanguineous families.Availabilityhttps://github.com/bio-ontology-research-group/[email protected]


F1000Research ◽  
2013 ◽  
Vol 2 ◽  
pp. 217 ◽  
Author(s):  
Guillermo Barturen ◽  
Antonio Rueda ◽  
José L. Oliver ◽  
Michael Hackenberg

Whole genome methylation profiling at a single cytosine resolution is now feasible due to the advent of high-throughput sequencing techniques together with bisulfite treatment of the DNA. To obtain the methylation value of each individual cytosine, the bisulfite-treated sequence reads are first aligned to a reference genome, and then the profiling of the methylation levels is done from the alignments. A huge effort has been made to quickly and correctly align the reads and many different algorithms and programs to do this have been created. However, the second step is just as crucial and non-trivial, but much less attention has been paid to the final inference of the methylation states. Important error sources do exist, such as sequencing errors, bisulfite failure, clonal reads, and single nucleotide variants.We developed MethylExtract, a user friendly tool to: i) generate high quality, whole genome methylation maps and ii) detect sequence variation within the same sample preparation. The program is implemented into a single script and takes into account all major error sources. MethylExtract detects variation (SNVs – Single Nucleotide Variants) in a similar way to VarScan, a very sensitive method extensively used in SNV and genotype calling based on non-bisulfite-treated reads. The usefulness of MethylExtract is shown by means of extensive benchmarking based on artificial bisulfite-treated reads and a comparison to a recently published method, called Bis-SNP.MethylExtract is able to detect SNVs within High-Throughput Sequencing experiments of bisulfite treated DNA at the same time as it generates high quality methylation maps. This simultaneous detection of DNA methylation and sequence variation is crucial for many downstream analyses, for example when deciphering the impact of SNVs on differential methylation. An exclusive feature of MethylExtract, in comparison with existing software, is the possibility to assess the bisulfite failure in a statistical way. The source code, tutorial and artificial bisulfite datasets are available at http://bioinfo2.ugr.es/MethylExtract/ and http://sourceforge.net/projects/methylextract/, and also permanently accessible from 10.5281/zenodo.7144.


2019 ◽  
Author(s):  
Egor Dolzhenko ◽  
Viraj Deshpande ◽  
Felix Schlesinger ◽  
Peter Krusche ◽  
Roman Petrovski ◽  
...  

SummaryWe describe a novel computational method for genotyping repeats using sequence graphs. This method addresses the long-standing need to accurately genotype medically important loci containing repeats adjacent to other variants or imperfect DNA repeats such as polyalanine repeats. Here we introduce a new version of our repeat genotyping software, ExpansionHunter, that uses this method to perform targeted genotyping of a broad class of such loci.Availability and implementationExpansionHunter is implemented in C++ and is available under the Apache License Version 2.0. The source code, documentation, and Linux/macOS binaries are available at https://github.com/Illumina/ExpansionHunter/[email protected]


2018 ◽  
Author(s):  
Jedidiah Carlson ◽  
Jun Z Li ◽  
Sebastian Zöllner

AbstractMotivationThe spectrum of somatic single-nucleotide variants in cancer genomes often reflects the signatures of multiple distinct mutational processes, which can provide clinically actionable insights into cancer etiology. Existing software tools for identifying and evaluating these mutational signatures do not scale to analyze large datasets containing thousands of individuals or millions of variants.ResultsWe introduce Helmsman, a program designed to rapidly generate mutation spectra matrices from arbitrarily large datasets. Helmsman is up to 300 times faster than existing methods and can provide more than a 100-fold reduction in memory usage, making mutation signature analysis tractable for any collection of single nucleotide variants, no matter how large.AvailabilityHelmsman is freely available for download at https://github.com/carjed/helmsman under the MIT license. Detailed documentation can be found at https://www.jedidiahcarlson.com/docs/helmsman/, and an interactive Jupyter notebook containing a guided tutorial can be accessed at https://mybinder.org/v2/gh/carjed/helmsman/[email protected] informationSupplementary information for this article is available.


2021 ◽  
Author(s):  
Andreas Rempel ◽  
Roland Wittler

AbstractSummarySANS serif is a novel software for alignment-free, whole-genome based phylogeny estimation that follows a pangenomic approach to efficiently calculate a set of splits in a phylogenetic tree or network.Availability and ImplementationImplemented in C++ and supported on Linux, MacOS, and Windows. The source code is freely available for download at https://gitlab.ub.uni-bielefeld.de/gi/[email protected]


2020 ◽  
Vol 21 (1) ◽  
Author(s):  
Claudio Lorenzi ◽  
Sylvain Barriere ◽  
Jean-Philippe Villemin ◽  
Laureline Dejardin Bretones ◽  
Alban Mancheron ◽  
...  

Abstract iMOKA (interactive multi-objective k-mer analysis) is a software that enables comprehensive analysis of sequencing data from large cohorts to generate robust classification models or explore specific genetic elements associated with disease etiology. iMOKA uses a fast and accurate feature reduction step that combines a Naïve Bayes classifier augmented by an adaptive entropy filter and a graph-based filter to rapidly reduce the search space. By using a flexible file format and distributed indexing, iMOKA can easily integrate data from multiple experiments and also reduces disk space requirements and identifies changes in transcript levels and single nucleotide variants. iMOKA is available at https://github.com/RitchieLabIGH/iMOKA and Zenodo 10.5281/zenodo.4008947.


Parasitology ◽  
2013 ◽  
Vol 141 (4) ◽  
pp. 491-500 ◽  
Author(s):  
J. L. ABAL-FABEIRO ◽  
X. MASIDE ◽  
J. LLOVO ◽  
X. BELLO ◽  
M. TORRES ◽  
...  

SUMMARYThe epidemiological study of human cryptosporidiosis requires the characterization of species and subtypes involved in human disease in large sample collections. Molecular genotyping is costly and time-consuming, making the implementation of low-cost, highly efficient technologies increasingly necessary. Here, we designed a protocol based on MALDI-TOF mass spectrometry for the high-throughput genotyping of a panel of 55 single nucleotide variants (SNVs) selected as markers for the identification of commongp60subtypes of fourCryptosporidiumspecies that infect humans. The method was applied to a panel of 608 human and 63 bovine isolates and the results were compared with control samples typed by Sanger sequencing. The method allowed the identification of species in 610 specimens (90·9%) andgp60subtype in 605 (90·2%). It displayed excellent performance, with sensitivity and specificity values of 87·3 and 98·0%, respectively. Up to nine genotypes from four differentCryptosporidiumspecies (C. hominis, C. parvum, C. meleagridisandC. felis) were detected in humans; the most common ones wereC. hominissubtype Ib, andC. parvumIIa (61·3 and 28·3%, respectively). 96·5% of the bovine samples were typed as IIa. The method performs as well as the widely used Sanger sequencing and is more cost-effective and less time consuming.


Sign in / Sign up

Export Citation Format

Share Document