FastGT: an alignment-free method for calling common SNVs directly from raw sequencing reads

Whole genome methylation profiling at a single cytosine resolution is now feasible due to the advent of high-throughput sequencing techniques together with bisulfite treatment of the DNA. To obtain the methylation value of each individual cytosine, the bisulfite-treated sequence reads are first aligned to a reference genome, and then the profiling of the methylation levels is done from the alignments. A huge effort has been made to quickly and correctly align the reads and many different algorithms and programs to do this have been created. However, the second step is just as crucial and non-trivial, but much less attention has been paid to the final inference of the methylation states. Important error sources do exist, such as sequencing errors, bisulfite failure, clonal reads, and single nucleotide variants.We developed MethylExtract, a user friendly tool to: i) generate high quality, whole genome methylation maps and ii) detect sequence variation within the same sample preparation. The program is implemented into a single script and takes into account all major error sources. MethylExtract detects variation (SNVs – Single Nucleotide Variants) in a similar way to VarScan, a very sensitive method extensively used in SNV and genotype calling based on non-bisulfite-treated reads. The usefulness of MethylExtract is shown by means of extensive benchmarking based on artificial bisulfite-treated reads and a comparison to a recently published method, called Bis-SNP.MethylExtract is able to detect SNVs within High-Throughput Sequencing experiments of bisulfite treated DNA at the same time as it generates high quality methylation maps. This simultaneous detection of DNA methylation and sequence variation is crucial for many downstream analyses, for example when deciphering the impact of SNVs on differential methylation. An exclusive feature of MethylExtract, in comparison with existing software, is the possibility to assess the bisulfite failure in a statistical way. The source code, tutorial and artificial bisulfite datasets are available at http://bioinfo2.ugr.es/MethylExtract/ and http://sourceforge.net/projects/methylextract/, and also permanently accessible from 10.5281/zenodo.7144.

Download Full-text

Prostate cancer heterogeneity assessment with multi-regional sampling and alignment-free methods

NAR Genomics and Bioinformatics ◽

10.1093/nargab/lqaa062 ◽

2020 ◽

Vol 2 (3) ◽

Author(s):

Ross G Murphy ◽

Aideen C Roddy ◽

Shambhavi Srivastava ◽

Esther Baena ◽

David J Waugh ◽

...

Keyword(s):

Next Generation Sequencing ◽

Next Generation ◽

Single Nucleotide Variants ◽

Single Nucleotide ◽

Cancer Heterogeneity ◽

Alignment Free ◽

Treatment Indications ◽

Patient Heterogeneity ◽

Genomic Locations ◽

Generation Sequencing

Abstract Combining alignment-free methods for phylogenetic analysis with multi-regional sampling using next-generation sequencing can provide an assessment of intra-patient tumour heterogeneity. From multi-regional sampling divergent branching, we validated two different lesions within a patient’s prostate. Where multi-regional sampling has not been used, a single sample from one of these areas could misguide as to which drugs or therapies would best benefit this patient, due to the fact these tumours appear to be genetically different. This application has the power to render, in a fraction of the time used by other approaches, intra-patient heterogeneity and decipher aberrant biomarkers. Another alignment-free method for calling single-nucleotide variants from raw next-generation sequencing samples has determined possible variants and genomic locations that may be able to characterize the differences between the two main branching patterns. Alignment-free approaches have been applied to relevant clinical multi-regional samples and may be considered as a valuable option for comparing and determining heterogeneity to help deliver personalized medicine through more robust efforts in identifying targetable pathways and therapeutic strategies. Our study highlights the application these tools could have on patient-aligned treatment indications.

Download Full-text

A rapid, super-selective method for detection of single nucleotide variants in C. elegans

10.1101/2020.04.01.020818 ◽

2020 ◽

Author(s):

Denis Touroutine ◽

Jessica E. Tanis

Keyword(s):

Low Cost ◽

Primer Design ◽

Site Directed Mutagenesis ◽

Amplification Efficiency ◽

Single Nucleotide Variants ◽

Nucleotide Substitutions ◽

Single Nucleotide ◽

Colony Pcr ◽

Sequence Composition ◽

C Elegans

ABSTRACTWith the widespread use of single nucleotide variants generated through mutagenesis screens, the million mutation project, and genome editing technologies, there is pressing need for an efficient and low-cost strategy to genotype single nucleotide substitutions. We have developed a rapid and inexpensive method for detection of point mutants through optimization of SuperSelective (SS) primers for end point PCR in Caenorhabditis elegans. Each SS primer consists of a 5’ “anchor” that hybridizes to the template, followed by a non-complementary “bridge,” and a “foot” corresponding to the target allele. The foot sequence is short, such that a single mismatch at the terminal 3’ nucleotide destabilizes primer binding and prevents extension, enabling discrimination of different alleles. We explored how length, stability, and sequence composition of each SS primer segment affected selectivity and efficiency in order to develop simple rules for primer design that allow for distinction between any mismatches in various genetic contexts over a broad range of annealing temperatures. Manipulating bridge length affects amplification efficiency, while modifying the foot sequence can increase discriminatory power. Flexibility in the positioning of the anchor enables SS primers to be used for genotyping in regions with sequences that are challenging for standard primer design. In summary, we have demonstrated flexibility in design of SS primers and their utility for genotyping in C. elegans. Since SS primers reliably detect single nucleotide variants, we propose that this method could have broad application for SNP mapping, screening of CRISPR mutants, and colony PCR to identify successful site-directed mutagenesis constructs.

Download Full-text

DeepSVP: Integration of genotype and phenotype for structural variant prioritization using deep learning

10.1101/2021.01.28.428557 ◽

2021 ◽

Author(s):

Azza Althagafi ◽

Lamia Alsubaie ◽

Nagarajan Kathiresan ◽

Katsuhiko Mineta ◽

Taghrid Aloraini ◽

...

Keyword(s):

Research Group ◽

Genetic Diseases ◽

Computational Method ◽

Structural Variants ◽

Single Nucleotide Variants ◽

Structural Genomic ◽

Single Nucleotide ◽

Coding Regions ◽

Molecular Features ◽

Gene Functions

AbstractMotivationStructural genomic variants account for much of human variability and are involved in several diseases. Structural variants are complex and may affect coding regions of multiple genes, or affect the functions of genomic regions in different ways from single nucleotide variants. Interpreting the phenotypic consequences of structural variants relies on information about gene functions, haploinsufficiency or triplosensitivity, and other genomic features. Phenotype-based methods to identifying variants that are involved in genetic diseases combine molecular features with prior knowledge about the phenotypic consequences of altering gene functions. While phenotype-based methods have been applied successfully to single nucleotide variants, as well as short insertions and deletions, the complexity of structural variants makes it more challenging to link them to phenotypes. Furthermore, structural variants can affect a large number of coding regions, and phenotype information may not be available for all of them.ResultsWe developed DeepSVP, a computational method to prioritize structural variants involved in genetic diseases by combining genomic information with information about gene functions. We incorporate phenotypes linked to genes, functions of gene products, gene expression in individual celltypes, and anatomical sites of expression, and systematically relate them to their phenotypic consequences through ontologies and machine learning. DeepSVP significantly improves the success rate of finding causative variants in several benchmarks and can identify novel pathogenic structural variants in consanguineous families.Availabilityhttps://github.com/bio-ontology-research-group/[email protected]

Download Full-text

MethylExtract: High-Quality methylation maps and SNV calling from whole genome bisulfite sequencing data

F1000Research ◽

10.12688/f1000research.2-217.v1 ◽

2013 ◽

Vol 2 ◽

pp. 217 ◽

Cited By ~ 18

Author(s):

Guillermo Barturen ◽

Antonio Rueda ◽

José L. Oliver ◽

Michael Hackenberg

Keyword(s):

High Throughput ◽

Sequence Variation ◽

High Throughput Sequencing ◽

Whole Genome ◽

Single Nucleotide Variants ◽

High Quality ◽

Single Nucleotide ◽

Error Sources ◽

Link Type ◽

Genome Methylation

Whole genome methylation profiling at a single cytosine resolution is now feasible due to the advent of high-throughput sequencing techniques together with bisulfite treatment of the DNA. To obtain the methylation value of each individual cytosine, the bisulfite-treated sequence reads are first aligned to a reference genome, and then the profiling of the methylation levels is done from the alignments. A huge effort has been made to quickly and correctly align the reads and many different algorithms and programs to do this have been created. However, the second step is just as crucial and non-trivial, but much less attention has been paid to the final inference of the methylation states. Important error sources do exist, such as sequencing errors, bisulfite failure, clonal reads, and single nucleotide variants.We developed MethylExtract, a user friendly tool to: i) generate high quality, whole genome methylation maps and ii) detect sequence variation within the same sample preparation. The program is implemented into a single script and takes into account all major error sources. MethylExtract detects variation (SNVs – Single Nucleotide Variants) in a similar way to VarScan, a very sensitive method extensively used in SNV and genotype calling based on non-bisulfite-treated reads. The usefulness of MethylExtract is shown by means of extensive benchmarking based on artificial bisulfite-treated reads and a comparison to a recently published method, called Bis-SNP.MethylExtract is able to detect SNVs within High-Throughput Sequencing experiments of bisulfite treated DNA at the same time as it generates high quality methylation maps. This simultaneous detection of DNA methylation and sequence variation is crucial for many downstream analyses, for example when deciphering the impact of SNVs on differential methylation. An exclusive feature of MethylExtract, in comparison with existing software, is the possibility to assess the bisulfite failure in a statistical way. The source code, tutorial and artificial bisulfite datasets are available at http://bioinfo2.ugr.es/MethylExtract/ and http://sourceforge.net/projects/methylextract/, and also permanently accessible from 10.5281/zenodo.7144.

Download Full-text

ExpansionHunter: A sequence-graph based tool to analyze variation in short tandem repeat regions

10.1101/572545 ◽

2019 ◽

Cited By ~ 2

Author(s):

Egor Dolzhenko ◽

Viraj Deshpande ◽

Felix Schlesinger ◽

Peter Krusche ◽

Roman Petrovski ◽

...

Keyword(s):

Tandem Repeat ◽

Short Tandem Repeat ◽

Broad Class ◽

Source Code ◽

Computational Method ◽

Dna Repeats ◽

Link Type ◽

Sequence Graph ◽

Version 2.0 ◽

Short Tandem

SummaryWe describe a novel computational method for genotyping repeats using sequence graphs. This method addresses the long-standing need to accurately genotype medically important loci containing repeats adjacent to other variants or imperfect DNA repeats such as polyalanine repeats. Here we introduce a new version of our repeat genotyping software, ExpansionHunter, that uses this method to perform targeted genotyping of a broad class of such loci.Availability and implementationExpansionHunter is implemented in C++ and is available under the Apache License Version 2.0. The source code, documentation, and Linux/macOS binaries are available at https://github.com/Illumina/ExpansionHunter/[email protected]

Download Full-text

Helmsman: fast and efficient generation of input matrices for mutation signature analysis

10.1101/373076 ◽

2018 ◽

Author(s):

Jedidiah Carlson ◽

Jun Z Li ◽

Sebastian Zöllner

Keyword(s):

Large Datasets ◽

Supplementary Information ◽

Signature Analysis ◽

Single Nucleotide Variants ◽

Cancer Etiology ◽

Single Nucleotide ◽

Link Type ◽

Fold Reduction ◽

Cancer Genomes ◽

Mutation Spectra

AbstractMotivationThe spectrum of somatic single-nucleotide variants in cancer genomes often reflects the signatures of multiple distinct mutational processes, which can provide clinically actionable insights into cancer etiology. Existing software tools for identifying and evaluating these mutational signatures do not scale to analyze large datasets containing thousands of individuals or millions of variants.ResultsWe introduce Helmsman, a program designed to rapidly generate mutation spectra matrices from arbitrarily large datasets. Helmsman is up to 300 times faster than existing methods and can provide more than a 100-fold reduction in memory usage, making mutation signature analysis tractable for any collection of single nucleotide variants, no matter how large.AvailabilityHelmsman is freely available for download at https://github.com/carjed/helmsman under the MIT license. Detailed documentation can be found at https://www.jedidiahcarlson.com/docs/helmsman/, and an interactive Jupyter notebook containing a guided tutorial can be accessed at https://mybinder.org/v2/gh/carjed/helmsman/[email protected] informationSupplementary information for this article is available.

Download Full-text

SANS serif: alignment-free, whole-genome based phylogenetic reconstruction

10.1101/2020.12.31.424643 ◽

2021 ◽

Author(s):

Andreas Rempel ◽

Roland Wittler

Keyword(s):

Phylogenetic Tree ◽

Source Code ◽

Phylogenetic Reconstruction ◽

Whole Genome ◽

Link Type ◽

Alignment Free ◽

Phylogeny Estimation

AbstractSummarySANS serif is a novel software for alignment-free, whole-genome based phylogeny estimation that follows a pangenomic approach to efficiently calculate a set of splits in a phylogenetic tree or network.Availability and ImplementationImplemented in C++ and supported on Linux, MacOS, and Windows. The source code is freely available for download at https://gitlab.ub.uni-bielefeld.de/gi/[email protected]

Download Full-text

iMOKA: k-mer based software to analyze large collections of sequencing data

Genome Biology ◽

10.1186/s13059-020-02165-2 ◽

2020 ◽

Vol 21 (1) ◽

Author(s):

Claudio Lorenzi ◽

Sylvain Barriere ◽

Jean-Philippe Villemin ◽

Laureline Dejardin Bretones ◽

Alban Mancheron ◽

...

Keyword(s):

Search Space ◽

Feature Reduction ◽

Bayes Classifier ◽

Sequencing Data ◽

Single Nucleotide Variants ◽

Single Nucleotide ◽

Disease Etiology ◽

Link Type ◽

Space Requirements ◽

Integrate Data

Abstract iMOKA (interactive multi-objective k-mer analysis) is a software that enables comprehensive analysis of sequencing data from large cohorts to generate robust classification models or explore specific genetic elements associated with disease etiology. iMOKA uses a fast and accurate feature reduction step that combines a Naïve Bayes classifier augmented by an adaptive entropy filter and a graph-based filter to rapidly reduce the search space. By using a flexible file format and distributed indexing, iMOKA can easily integrate data from multiple experiments and also reduces disk space requirements and identifies changes in transcript levels and single nucleotide variants. iMOKA is available at https://github.com/RitchieLabIGH/iMOKA and Zenodo 10.5281/zenodo.4008947.

Download Full-text

High-throughput genotyping assay for the large-scale genetic characterization ofCryptosporidiumparasites from human and bovine samples

Parasitology ◽

10.1017/s0031182013001807 ◽

2013 ◽

Vol 141 (4) ◽

pp. 491-500 ◽

Cited By ~ 12

Author(s):

J. L. ABAL-FABEIRO ◽

X. MASIDE ◽

J. LLOVO ◽

X. BELLO ◽

M. TORRES ◽

...

Keyword(s):

High Throughput ◽

Sanger Sequencing ◽

Large Scale ◽

Genetic Characterization ◽

Low Cost ◽

Cost Effective ◽

Single Nucleotide Variants ◽

Single Nucleotide ◽

Identification Of Species

SUMMARYThe epidemiological study of human cryptosporidiosis requires the characterization of species and subtypes involved in human disease in large sample collections. Molecular genotyping is costly and time-consuming, making the implementation of low-cost, highly efficient technologies increasingly necessary. Here, we designed a protocol based on MALDI-TOF mass spectrometry for the high-throughput genotyping of a panel of 55 single nucleotide variants (SNVs) selected as markers for the identification of commongp60subtypes of fourCryptosporidiumspecies that infect humans. The method was applied to a panel of 608 human and 63 bovine isolates and the results were compared with control samples typed by Sanger sequencing. The method allowed the identification of species in 610 specimens (90·9%) andgp60subtype in 605 (90·2%). It displayed excellent performance, with sensitivity and specificity values of 87·3 and 98·0%, respectively. Up to nine genotypes from four differentCryptosporidiumspecies (C. hominis, C. parvum, C. meleagridisandC. felis) were detected in humans; the most common ones wereC. hominissubtype Ib, andC. parvumIIa (61·3 and 28·3%, respectively). 96·5% of the bovine samples were typed as IIa. The method performs as well as the widely used Sanger sequencing and is more cost-effective and less time consuming.

Download Full-text