NanoPack: visualizing and processing long read sequencing data

AbstractSummary: Here we describe NanoPack, a set of tools developed for visualization and processing of long read sequencing data from Oxford Nanopore Technologies and Pacific Biosciences.Availability and Implementation: The NanoPack tools are written in Python3 and released under the GNU GPL3.0 Licence. The source code can be found at https://github.com/wdecoster/nanopack, together with links to separate scripts and their documentation. The scripts are compatible with Linux, Mac OS and the MS Windows 10 subsystem for linux and are available as a graphical user interface, a web service at http://nanoplot.bioinf.be and command line tools.Contact:[email protected] information: Supplementary tables and figures are available at Bioinformatics online.

Download Full-text

Yersinia canariae sp. nov., isolated from a human yersiniosis case

INTERNATIONAL JOURNAL OF SYSTEMATIC AND EVOLUTIONARY MICROBIOLOGY ◽

10.1099/ijsem.0.004047 ◽

2020 ◽

Vol 70 (4) ◽

pp. 2382-2387 ◽

Cited By ~ 7

Author(s):

Scott V. Nguyen ◽

David R. Greig ◽

Daniel Hurley ◽

Orla Donoghue ◽

Yu Cao ◽

...

Keyword(s):

Ribosomal Rna ◽

Novel Species ◽

Sequencing Data ◽

Illumina Hiseq ◽

Content Type ◽

Link Type ◽

The United Kingdom ◽

Oxford Nanopore ◽

Long Read ◽

Oxford Nanopore Technologies

A Gram-negative rod from the Yersinia genus was isolated from a clinical case of yersiniosis in the United Kingdom. Long read sequencing data from an Oxford Nanopore Technologies (ONT) MinION in conjunction with Illumina HiSeq reads were used to generate a finished quality genome of this strain. Overall Genome Related Index (OGRI) of the strain was used to determine that it was a novel species within Yersinia , despite biochemical similarities to Yersinia enterocolitica . The 16S ribosomal RNA gene accessions are MN434982-MN434987 and the accession number for the complete and closed chromosome is CP043727. The type strain is SRR7544370T (=NCTC 14382T/=LMG 31573T).

Download Full-text

Alview: Portable Software for Viewing Sequence Reads in BAM Formatted Files

Cancer Informatics ◽

10.4137/cin.s26470 ◽

2015 ◽

Vol 14 ◽

pp. CIN.S26470 ◽

Cited By ~ 2

Author(s):

Richard P. Finney ◽

Qing-Rong Chen ◽

Cu V. Nguyen ◽

Chih Hao Hsu ◽

Chunhua Yan ◽

...

Keyword(s):

Graphical User Interface ◽

Reference Genome ◽

Source Code ◽

Software Tool ◽

Command Line ◽

Sequencing Data ◽

Genome Data ◽

Command Line Tool ◽

Portable Software ◽

Microsoft Windows

The name Alview is a contraction of the term Alignment Viewer. Alview is a compiled to native architecture software tool for visualizing the alignment of sequencing data. Inputs are files of short-read sequences aligned to a reference genome in the SAM/BAM format and files containing reference genome data. Outputs are visualizations of these aligned short reads. Alview is written in portable C with optional graphical user interface (GUI) code written in C, C++, and Objective-C. The application can run in three different ways: as a web server, as a command line tool, or as a native, GUI program. Alview is compatible with Microsoft Windows, Linux, and Apple OS X. It is available as a web demo at https://cgwb.nci.nih.gov/cgi-bin/alview . The source code and Windows/Mac/Linux executables are available via https://github.com/NCIP/alview .

Download Full-text

idCOV: a pipeline for quick clade identification of SARS-CoV-2 isolates

10.1101/2020.10.08.330456 ◽

2020 ◽

Author(s):

Xun Zhu ◽

Ti-Cheng Chang ◽

Richard Webby ◽

Gang Wu

Keyword(s):

Personal Computer ◽

Source Code ◽

Command Line ◽

Sequencing Data ◽

Link Type ◽

Public Dataset ◽

Virus Isolates

AbstractidCOV is a phylogenetic pipeline for quickly identifying the clades of SARS-CoV-2 virus isolates from raw sequencing data based on a selected clade-defining marker list. Using a public dataset, we show that idCOV can make equivalent calls as annotated by Nextstrain.org on all three common clade systems using user uploaded FastQ files directly. Web and equivalent command-line interfaces are available. It can be deployed on any Linux environment, including personal computer, HPC and the cloud. The source code is available at https://github.com/xz-stjude/idcov. A documentation for installation can be found at https://github.com/xz-stjude/idcov/blob/master/README.md.

Download Full-text

Yersinia canariae sp. nov., isolated from a human yersiniosis case

10.1101/803825 ◽

2019 ◽

Author(s):

Scott V. Nguyen ◽

David R. Greig ◽

Daniel Hurley ◽

Yu Cao ◽

Evonne McCabe ◽

...

Keyword(s):

Ribosomal Rna ◽

Novel Species ◽

Accession Number ◽

Sequencing Data ◽

Illumina Hiseq ◽

Link Type ◽

The United Kingdom ◽

Oxford Nanopore ◽

16S Ribosomal Rna Gene ◽

Long Read

ABSTRACTA Gram-negative rod from the Yersinia genus was isolated from a clinical case of yersiniosis in the United Kingdom. Long read sequencing data from an Oxford Nanopore Technology (ONT) MinION in conjunction with Illumina HiSeq reads were used to generate a finished quality genome of this strain. Overall Genome Related Index (OGRI) of the strain was used to determine that it was a novel species within Yersinia, despite biochemical similarities to Yersinia enterocolitica. The 16S ribosomal RNA gene accessions are MN434982-MN434987 and the accession number for the complete and closed chromosome is CP043727. The type strain is CFS3336T (=NCTC 14382T/ =LMG Accession under process).

Download Full-text

LongTron: Automated Analysis of Long Read Spliced Alignment Accuracy

10.1101/2020.11.10.376871 ◽

2020 ◽

Author(s):

Christopher Wilks ◽

Michael C. Schatz

Keyword(s):

Random Forest ◽

Cancer Cell Line ◽

Automated Analysis ◽

Error Rates ◽

Supplementary Information ◽

Splice Sites ◽

Link Type ◽

Spliced Alignment ◽

Oxford Nanopore ◽

Long Read

AbstractMotivationLong read sequencing has increased the accuracy and completeness of assemblies of various organisms’ genomes in recent months. Similarly, spliced alignments of long read RNA sequencing hold the promise of delivering much longer transcripts of existing and novel isoforms in known genes without the need for error-prone transcript assemblies from short reads. However, low coverage and high-error rates potentially hamper the widespread adoption of long-read spliced alignments in annotation updates and isoform-level expression quantifications.ResultsAddressing these issues, we first develop a simulation of error modes for both Oxford Nanopore and PacBio CCS spliced-alignments. Based on this we train a Random Forest classifier to assign new long-read alignments to one of two error categories, a novel category, or label them as non-error. We use this classifier to label reads from the spliced-alignments of the popular aligner minimap2, run on three long read sequencing datasets, including NA12878 from Oxford Nanopore and PacBio CCS, as well as a PacBio SKBR3 cancer cell line. Finally, we compare the intron chains of the three long read alignments against individual splice sites, short read assemblies, and the output from the FLAIR pipeline on the same samples.Our results demonstrate a substantial lack of precision in determining exact splice sites for long reads during alignment on both platforms while showing some benefit from postprocessing. This work motivates the need for both better aligners and additional post-alignment processing to adjust incorrectly called putative splice-sites and clarify novel transcripts support.Availability and implementationSource code for the random forest implemented in python is available at https://github.com/schatzlab/LongTron under the MIT license. The modified version of GffCompare used to construct Table 3 and related is here: https://github.com/ChristopherWilks/gffcompare/releases/tag/0.11.2LTSupplementary InformationSupplementary notes and figures are available online.

Download Full-text

UROPA GUI: A web platform for genomic region annotation

10.1101/302091 ◽

2018 ◽

Author(s):

Hendrik Schultheis ◽

Jens Preussner ◽

Annika Fust ◽

Mette Bentsen ◽

Carsten Kuenne ◽

...

Keyword(s):

Graphical User Interface ◽

Bioinformatics Analysis ◽

Source Code ◽

Genomic Region ◽

Command Line ◽

Web Based ◽

Link Type ◽

R Shiny ◽

Considerable Impact ◽

Web Platform

AbstractThe annotation of genomic ranges such as peaks resulting from ChIP-seq/ATAC-seq or other techniques represents a fundamental task of bioinformatics analysis with considerable impact on many downstream analyses. In our previous work, we introduced the Universal Robust Peak Annotator (UROPA), a flexible command line based tool which improves upon the functionality of existing annotation software. In order to reduce the complexity for biologists and clinicians, we have implemented an intuitive web-based graphical user interface (GUI) and fully functional service platform for UROPA. This extension will empower all users to generate annotations for regions of interest interactively.Availability and ImplementationThe open source UROPA GUI server was implemented in R Shiny and Python and is available from http://loosolab.mpi-bn.mpg.de. The source code of our App can be downloaded at https://github.molgen.mpg.de/loosolab/UROPA_GUI under the MIT license.

Download Full-text

SVJedi: genotyping structural variations with long reads

Bioinformatics ◽

10.1093/bioinformatics/btaa527 ◽

2020 ◽

Vol 36 (17) ◽

pp. 4568-4575

Author(s):

Lolita Lecompte ◽

Pierre Peterlongo ◽

Dominique Lavenier ◽

Claire Lemaitre

Keyword(s):

Supplementary Information ◽

Sequencing Data ◽

Structural Variations ◽

Short Read ◽

Sequencing Technologies ◽

Long Reads ◽

Oxford Nanopore ◽

Clinical Diagnoses ◽

Long Read ◽

The One

Abstract Motivation Studies on structural variants (SVs) are expanding rapidly. As a result, and thanks to third generation sequencing technologies, the number of discovered SVs is increasing, especially in the human genome. At the same time, for several applications such as clinical diagnoses, it is important to genotype newly sequenced individuals on well-defined and characterized SVs. Whereas several SV genotypers have been developed for short read data, there is a lack of such dedicated tool to assess whether known SVs are present or not in a new long read sequenced sample, such as the one produced by Pacific Biosciences or Oxford Nanopore Technologies. Results We present a novel method to genotype known SVs from long read sequencing data. The method is based on the generation of a set of representative allele sequences that represent the two alleles of each structural variant. Long reads are aligned to these allele sequences. Alignments are then analyzed and filtered out to keep only informative ones, to quantify and estimate the presence of each SV allele and the allele frequencies. We provide an implementation of the method, SVJedi, to genotype SVs with long reads. The tool has been applied to both simulated and real human datasets and achieves high genotyping accuracy. We show that SVJedi obtains better performances than other existing long read genotyping tools and we also demonstrate that SV genotyping is considerably improved with SVJedi compared to other approaches, namely SV discovery and short read SV genotyping approaches. Availability and implementation https://github.com/llecompte/SVJedi.git Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

ARBitR: An overlap-aware genome assembly scaffolder for linked reads

10.1101/2020.04.29.065847 ◽

2020 ◽

Author(s):

Markus Hiltunen ◽

Martin Ryberg ◽

Hanna Johannesson

Keyword(s):

Genome Assembly ◽

General Public ◽

Source Code ◽

Draft Genome ◽

Supplementary Information ◽

Ltr Retrotransposons ◽

Sequencing Data ◽

Long Read ◽

Genome Assemblies ◽

General Public License

Abstract10X Genomics Chromium linked reads contain information that can be used to link sequences together into scaffolds in draft genome assemblies. Existing software for this purpose perform the scaffolding by joining sequences together with a gap between them, not considering potential contig overlaps. Such overlaps can be particularly prominent in genome drafts assembled from long-read sequencing data where an overlap-layout-consensus (OLC) algorithm has been used. Ignoring overlapping contig ends may result in genes and other features being incomplete or fragmented in the resulting scaffolds. We developed the application ARBitR to generate scaffolds from genome drafts using 10X Chromium data, with a focus on minimizing the number of gaps in resulting scaffolds by incorporating an OLC step to resolve junctions between linked contigs. We tested the performance of ARBitR on three published and simulated datasets and compared to the previously published tools ARCS and ARKS. The results revealed that ARBitR performed similarly considering contiguity statistics, and the advantage of the overlapping step was revealed by fewer long and short variants in ARBitR produced scaffolds, in addition to a higher proportion of completely assembled LTR retrotransposons. We expect ARBitR to have broad applicability in genome assembly projects that utilize 10X Chromium linked reads.Availability and implementationARBitR is written and implemented in Python3 for Unix-like operative systems. All source code is available at https://github.com/markhilt/ARBitR under the GNU General Public License [email protected] informationavailable online

Download Full-text

Methplotlib: analysis of modified nucleotides from nanopore sequencing

Bioinformatics ◽

10.1093/bioinformatics/btaa093 ◽

2020 ◽

Vol 36 (10) ◽

pp. 3236-3238 ◽

Cited By ~ 2

Author(s):

Wouter De Coster ◽

Endre Bakken Stovner ◽

Mojca Strazisar

Keyword(s):

Gene Expression Regulation ◽

Supplementary Information ◽

Command Line ◽

Modified Nucleotides ◽

Oxford Nanopore ◽

Command Line Tool ◽

Within Subjects ◽

Allele Specific ◽

Sequencing Platforms ◽

Oxford Nanopore Technologies

Abstract Summary Modified nucleotides play a crucial role in gene expression regulation. Here, we describe methplotlib, a tool developed for the visualization of modified nucleotides detected from Oxford Nanopore Technologies sequencing platforms, together with additional scripts for statistical analysis of allele-specific modification within-subjects and differential modification frequency across subjects. Availability and implementation The methplotlib command-line tool is written in Python3, is compatible with Linux, Mac OS and the MS Windows 10 Subsystem for Linux and released under the MIT license. The source code can be found at https://github.com/wdecoster/methplotlib and can be installed from PyPI and bioconda. Our repository includes test data, and the tool is continuously tested at travis-ci.com. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

IsoTV: processing and visualizing functional features of translated transcript isoforms

Bioinformatics ◽

10.1093/bioinformatics/btab103 ◽

2021 ◽

Author(s):

Siddharth Annaldasula ◽

Martyna Gajos ◽

Andreas Mayer

Keyword(s):

Cell Types ◽

Supplementary Information ◽

Sequencing Data ◽

Data Types ◽

Transcript Isoforms ◽

Oxford Nanopore ◽

Long Read ◽

Functional Consequences ◽

Functional Features ◽

Eukaryotic Organisms

Abstract Summary Despite the continuous discovery of new transcript isoforms, fueled by the recent increase in accessibility and accuracy of long-read RNA sequencing data, functional differences between isoforms originating from the same gene often remain obscure. To address this issue and enable researchers to assess potential functional consequences of transcript isoform variation on the proteome, we developed IsoTV. IsoTV is a versatile pipeline to process, predict, and visualize the functional features of translated transcript isoforms. Attributes such as gene and isoform expression, transcript composition, and functional features are summarized in an easy-to-interpret visualization. IsoTV is able to analyze a variety of data types from all eukaryotic organisms, including short- and long-read RNA-seq data. Using Oxford Nanopore long read data, we demonstrate that IsoTV facilitates the understanding of potential protein isoform function in different cancer cell types. Availability IsoTV is available at https://github.molgen.mpg.de/MayerGroup/IsoTV, with the corresponding documentation at https://isotv.readthedocs.io/. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text