poreTally: run and publish de novo nanopore assembler benchmarks

Carlos de Lannoy; Judith Risse; Dick de Ridder

doi:10.1093/bioinformatics/bty1045

poreTally: run and publish de novo nanopore assembler benchmarks

Bioinformatics ◽

10.1093/bioinformatics/bty1045 ◽

2018 ◽

Vol 35 (15) ◽

pp. 2663-2664 ◽

Cited By ~ 2

Author(s):

Carlos de Lannoy ◽

Judith Risse ◽

Dick de Ridder

Keyword(s):

Nucleic Acid ◽

De Novo Assembly ◽

De Novo ◽

Supplementary Information ◽

Nanopore Sequencing ◽

Supplementary Data ◽

Analysis Pipeline ◽

Tool Performance ◽

Nucleic Acid Analysis ◽

Assembly Tool

Abstract Summary Nanopore sequencing is a novel development in nucleic acid analysis. As such, nanopore-sequencing hardware and software are updated frequently and extensively, which quickly renders peer-reviewed publications on analysis pipeline benchmarking efforts outdated. To provide the user community with a faster, more flexible alternative to peer-reviewed benchmark papers for de novo assembly tool performance we constructed poreTally, a comprehensive benchmarking tool. poreTally automatically assembles a given read set using several often-used assembly pipelines, analyzes the resulting assemblies for correctness and continuity, and finally generates a quality report, which can immediately be published on Github/Gitlab. Availability and implementation poreTally is available on Github at https://github.com/ cvdelannoy/poreTally, under an MIT license. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

poreTally: run and publish de novo Nanopore assembler benchmarks

10.1101/424184 ◽

2018 ◽

Author(s):

Carlos de Lannoy ◽

Judith Risse ◽

Dick de Ridder

Keyword(s):

Best Practices ◽

De Novo ◽

Nanopore Sequencing ◽

Base Calling ◽

Novel Approach ◽

Tool Performance ◽

Assembly Pipeline ◽

Nucleic Acid Analysis ◽

Sequencing Platforms ◽

Assembly Tool

AbstractNanopore sequencing is a novel approach to nucleic acid analysis that generates long, error-prone reads. Since device components, base calling software and best practices for sample preparation are updated frequently and extensively, the nature of the produced data also changes frequently. As a result, peer-reviewed publications on de novo assembly pipeline benchmarking efforts are quickly rendered outdated by the next major improvement to the sequencing platforms. To provide the user community with a faster, more flexible alternative to peer-reviewed benchmark papers for de novo assembly tool performance we constructed poreTally, a comprehensive benchmarking tool. poreTally automatically assembles a given read set using several often-used assembly pipelines, analyzes the resulting assemblies for correctness and continuity, and finally generates a quality report. Results can immediately be shared with peers in a Github/Gitlab repository. Furthermore, we aim to give a more inclusive overview of assembly pipeline performance than any individual research group can, by offering users the possibility to submit their results to a collective benchmarking effort. poreTally is available on Github.

Download Full-text

KEC: unique sequence search by K-mer exclusion

Bioinformatics ◽

10.1093/bioinformatics/btab196 ◽

2021 ◽

Author(s):

Pavel Beran ◽

Dagmar Stehlíková ◽

Stephen P Cohen ◽

Vladislav Čurn

Keyword(s):

Amino Acid ◽

Nucleic Acid ◽

Source Code ◽

Unique Sequence ◽

Supplementary Information ◽

Supplementary Data ◽

Laptop Computers ◽

Sequence Search ◽

Target Sequences ◽

Cross Reference

Abstract Summary Searching for amino acid or nucleic acid sequences unique to one organism may be challenging depending on size of the available datasets. K-mer elimination by cross-reference (KEC) allows users to quickly and easily find unique sequences by providing target and non-target sequences. Due to its speed, it can be used for datasets of genomic size and can be run on desktop or laptop computers with modest specifications. Availability and implementation KEC is freely available for non-commercial purposes. Source code and executable binary files compiled for Linux, Mac and Windows can be downloaded from https://github.com/berybox/KEC. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

A De Novo-Assembly Based Data Analysis Pipeline for Plant Obligate Parasite Metatranscriptomic Studies

Frontiers in Plant Science ◽

10.3389/fpls.2016.00925 ◽

2016 ◽

Vol 7 ◽

Cited By ~ 5

Author(s):

Li Guo ◽

Kelly S. Allen ◽

Greg Deiulio ◽

Yong Zhang ◽

Angela M. Madeiras ◽

...

Keyword(s):

Data Analysis ◽

De Novo Assembly ◽

De Novo ◽

Analysis Pipeline ◽

Obligate Parasite ◽

Data Analysis Pipeline

Download Full-text

Jasmine: a Java pipeline for isomiR characterization in miRNA-Seq data

Bioinformatics ◽

10.1093/bioinformatics/btz806 ◽

2019 ◽

Cited By ~ 2

Author(s):

Xiangfu Zhong ◽

Albert Pla ◽

Simon Rayner

Keyword(s):

Population Structure ◽

Software Tool ◽

Supplementary Information ◽

Supplementary Data ◽

Analysis Pipeline ◽

Detailed Characterization ◽

Fasta Format ◽

Java Application

Abstract Motivation The existence of complex subpopulations of miRNA isoforms, or isomiRs, is well established. While many tools exist for investigating isomiR populations, they differ in how they characterize an isomiR, making it difficult to compare results across different tools. Thus, there is a need for a more comprehensive and systematic standard for defining isomiRs. Such a standard would allow investigation of isomiR population structure in progressively more refined sub-populations, permitting the identification of more subtle changes between conditions and leading to an improved understanding of the processes that generate these differences. Results We developed Jasmine, a software tool that incorporates a hierarchal framework for characterizing isomiR populations. Jasmine is a Java application that can process raw read data in fastq/fasta format, or mapped reads in SAM format to produce a detailed characterization of isomiR populations. Thus, Jasmine can reveal structure not apparent in a standard miRNA-Seq analysis pipeline. Availability and implementation Jasmine is implemented in Java and R and freely available at bitbucket https://bitbucket.org/bipous/jasmine/src/master/. Contact [email protected] Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Author Correction: Rapid de novo assembly of the European eel genome from nanopore sequencing reads

Scientific Reports ◽

10.1038/s41598-019-44275-3 ◽

2019 ◽

Vol 9 (1) ◽

Cited By ~ 1

Author(s):

Hans J. Jansen ◽

Michael Liem ◽

Susanne A. Jong-Raadsen ◽

Sylvie Dufour ◽

Finn-Arne Weltzien ◽

...

Keyword(s):

De Novo Assembly ◽

De Novo ◽

European Eel ◽

Nanopore Sequencing

Download Full-text

Minirmd: accurate and fast duplicate removal tool for short reads via multiple minimizers

Bioinformatics ◽

10.1093/bioinformatics/btaa915 ◽

2020 ◽

Author(s):

Yuansheng Liu ◽

Xiaocai Zhang ◽

Quan Zou ◽

Xiangxiang Zeng

Keyword(s):

High Throughput ◽

High Throughput Sequencing ◽

De Novo ◽

Supplementary Information ◽

Supplementary Data ◽

Complementary Strand ◽

Short Reads ◽

Sequencing Technologies ◽

Computational Resources

Abstract Summary Removing duplicate and near-duplicate reads, generated by high-throughput sequencing technologies, is able to reduce computational resources in downstream applications. Here we develop minirmd, a de novo tool to remove duplicate reads via multiple rounds of clustering using different length of minimizer. Experiments demonstrate that minirmd removes more near-duplicate reads than existing clustering approaches and is faster than existing multi-core tools. To the best of our knowledge, minirmd is the first tool to remove near-duplicates on reverse-complementary strand. Availability and implementation https://github.com/yuansliu/minirmd. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

HiLight-PTM: an online application to aid matching peptide pairs with isotopically labelled PTMs

Bioinformatics ◽

10.1093/bioinformatics/btz654 ◽

2019 ◽

Author(s):

Harry J Whitwell ◽

Peter DiMaggio

Keyword(s):

De Novo ◽

De Novo Sequencing ◽

Mass Shift ◽

Supplementary Information ◽

Database Searching ◽

Supplementary Data ◽

Exact Match ◽

High Confidence ◽

Online Application ◽

Internet Browser

Abstract Motivation Database searching of isotopically labelled PTMs can be problematic and we frequently find that only one, or neither in a heavy/light pair are assigned. In such cases, having a pair of MS/MS spectra that differ due to an isotopic label can assist in identifying the relevant m/z values that support the correct peptide annotation or can be used for de novo sequencing. Results We have developed an online application that identifies matching peaks and peaks differing by the appropriate mass shift (difference between heavy and light PTM) between two MS/MS spectra. Furthermore, the application predicts, from the exact-match peaks, the mass of their complementary ions and highlights these as high confidence matches between the two spectra. The result is a tool to visually compare two spectra, and downloadable peaks lists that can be used to support de novo sequencing. Availability and implementation HiLight-PTM is released using shinyapps.io by RStudio, and can be accessed from any internet browser at https://harrywhitwell.shinyapps.io/hilight-ptm/. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Nanopore sequencing enables near-complete de novo assembly of Saccharomyces cerevisiae reference strain CEN.PK113-7D

FEMS Yeast Research ◽

10.1093/femsyr/fox074 ◽

2017 ◽

Vol 17 (7) ◽

Cited By ~ 43

Author(s):

Alex N. Salazar ◽

Arthur R. Gorter de Vries ◽

Marcel van den Broek ◽

Melanie Wijsman ◽

Pilar de la Torre Cortés ◽

...

Keyword(s):

Saccharomyces Cerevisiae ◽

De Novo Assembly ◽

Reference Strain ◽

De Novo ◽

Nanopore Sequencing

Download Full-text

Nanopype: a modular and scalable nanopore data processing pipeline

Bioinformatics ◽

10.1093/bioinformatics/btz461 ◽

2019 ◽

Vol 35 (22) ◽

pp. 4770-4772

Author(s):

Pay Giesselmann ◽

Sara Hetzel ◽

Franz-Josef Müller ◽

Alexander Meissner ◽

Helene Kretzmer

Keyword(s):

Data Processing ◽

Supplementary Information ◽

Nanopore Sequencing ◽

Third Generation ◽

Supplementary Data ◽

Seamless Integration ◽

Short Read ◽

Processing Pipeline ◽

Bioinformatics Software ◽

Long Read

Abstract Summary Long-read third-generation nanopore sequencing enables researchers to now address a range of questions that are difficult to tackle with short read approaches. The rapidly expanding user base and continuously increasing throughput have sparked the development of a growing number of specialized analysis tools. However, streamlined processing of nanopore datasets using reproducible and transparent workflows is still lacking. Here we present Nanopype, a nanopore data processing pipeline that integrates a diverse set of established bioinformatics software while maintaining consistent and standardized output formats. Seamless integration into compute cluster environments makes the framework suitable for high-throughput applications. As a result, Nanopype facilitates comparability of nanopore data analysis workflows and thereby should enhance the reproducibility of biological insights. Availability and implementation https://github.com/giesselmann/nanopype, https://nanopype.readthedocs.io. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

mixtureS: a novel tool for bacterial strain genome reconstruction from reads

Bioinformatics ◽

10.1093/bioinformatics/btaa728 ◽

2020 ◽

Author(s):

Xin Li ◽

Haiyan Hu ◽

Xiaoman Li

Keyword(s):

Environmental Samples ◽

De Novo ◽

Source Code ◽

Supplementary Information ◽

Supplementary Data ◽

Bacterial Strains ◽

Metagenomic Sample ◽

Almost All ◽

User Friendly ◽

Strain Genome

Abstract Motivation It is essential to study bacterial strains in environmental samples. Existing methods and tools often depend on known strains or known variations, cannot work on individual samples, not reliable, or not easy to use, etc. It is thus important to develop more user-friendly tools that can identify bacterial strains more accurately. Results We developed a new tool called mixtureS that can de novo identify bacterial strains from shotgun reads of a clonal or metagenomic sample, without prior knowledge about the strains and their variations. Tested on 243 simulated datasets and 195 experimental datasets, mixtureS reliably identified the strains, their numbers and their abundance. Compared with three tools, mixtureS showed better performance in almost all simulated datasets and the vast majority of experimental datasets. Availability and implementation The source code and tool mixtureS is available at http://www.cs.ucf.edu/˜xiaoman/mixtureS/. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text