Phigaro: high throughput prophage sequence annotation

AbstractSummaryPhigaro is a standalone command-line application that is able to detect prophage regions taking raw genome and metagenome assemblies as an input. It also produces dynamic annotated “prophage genome maps” and marks possible transposon insertion spots inside prophages. It provides putative taxonomic annotations that can distinguish tailed from non-tailed phages. It is applicable for mining prophage regions from large metagenomic datasets.AvailabilitySource code for Phigaro is freely available for download at https://github.com/bobeobibo/phigaro along with test data. The code is written in Python.

Download Full-text

Phigaro: high-throughput prophage sequence annotation

Bioinformatics ◽

10.1093/bioinformatics/btaa250 ◽

2020 ◽

Vol 36 (12) ◽

pp. 3882-3884 ◽

Cited By ~ 2

Author(s):

Elizaveta V Starikova ◽

Polina O Tikhonova ◽

Nikita A Prianichnikov ◽

Chris M Rands ◽

Evgeny M Zdobnov ◽

...

Keyword(s):

Test Data ◽

High Throughput ◽

Source Code ◽

Supplementary Information ◽

Sequence Annotation ◽

Command Line ◽

Supplementary Data ◽

Genome Maps ◽

Transposon Insertion ◽

Prophage Sequence

Abstract Summary Phigaro is a standalone command-line application that is able to detect prophage regions taking raw genome and metagenome assemblies as an input. It also produces dynamic annotated ‘prophage genome maps’ and marks possible transposon insertion spots inside prophages. It is applicable for mining prophage regions from large metagenomic datasets. Availability and implementation Source code for Phigaro is freely available for download at https://github.com/bobeobibo/phigaro along with test data. The code is written in Python. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

idCOV: a pipeline for quick clade identification of SARS-CoV-2 isolates

10.1101/2020.10.08.330456 ◽

2020 ◽

Author(s):

Xun Zhu ◽

Ti-Cheng Chang ◽

Richard Webby ◽

Gang Wu

Keyword(s):

Personal Computer ◽

Source Code ◽

Command Line ◽

Sequencing Data ◽

Link Type ◽

Public Dataset ◽

Virus Isolates

AbstractidCOV is a phylogenetic pipeline for quickly identifying the clades of SARS-CoV-2 virus isolates from raw sequencing data based on a selected clade-defining marker list. Using a public dataset, we show that idCOV can make equivalent calls as annotated by Nextstrain.org on all three common clade systems using user uploaded FastQ files directly. Web and equivalent command-line interfaces are available. It can be deployed on any Linux environment, including personal computer, HPC and the cloud. The source code is available at https://github.com/xz-stjude/idcov. A documentation for installation can be found at https://github.com/xz-stjude/idcov/blob/master/README.md.

Download Full-text

ASaiM: a Galaxy-based framework to analyze raw shotgun data from microbiota

10.1101/183970 ◽

2017 ◽

Cited By ~ 2

Author(s):

Bérénice Batut ◽

Kévin Gravouil ◽

Clémence Defois ◽

Saskia Hiltemann ◽

Jean-François Brugère ◽

...

Keyword(s):

Technological Progress ◽

Source Code ◽

Command Line ◽

Bioinformatic Tools ◽

Link Type ◽

Data Analyses ◽

The Galaxy ◽

Sequencing Platforms ◽

User Friendly ◽

New Generation

AbstractBackgroundNew generation of sequencing platforms coupled to numerous bioinformatics tools has led to rapid technological progress in metagenomics and metatranscriptomics to investigate complex microorganism communities. Nevertheless, a combination of different bioinformatic tools remains necessary to draw conclusions out of microbiota studies. Modular and user-friendly tools would greatly improve such studies.FindingsWe therefore developed ASaiM, an Open-Source Galaxy-based framework dedicated to microbiota data analyses. ASaiM provides a curated collection of tools to explore and visualize taxonomic and functional information from raw amplicon, metagenomic or metatranscriptomic sequences. To guide different analyses, several customizable workflows are included. All workflows are supported by tutorials and Galaxy interactive tours to guide the users through the analyses step by step. ASaiM is implemented as Galaxy Docker flavour. It is scalable to many thousand datasets, but also can be used a normal PC. The associated source code is available under Apache 2 license at https://github.com/ASaiM/framework and documentation can be found online (http://asaim.readthedocs.io/)ConclusionsBased on the Galaxy framework, ASaiM offers sophisticated analyses to scientists without command-line knowledge. ASaiM provides a powerful framework to easily and quickly explore microbiota data in a reproducible and transparent environment.

Download Full-text

fluff: exploratory analysis and visualization of high-throughput sequencing data

PeerJ ◽

10.7717/peerj.2209 ◽

2016 ◽

Vol 4 ◽

pp. e2209 ◽

Cited By ~ 28

Author(s):

Georgios Georgiou ◽

Simon J. van Heeringen

Keyword(s):

High Throughput ◽

High Throughput Sequencing ◽

Developmental Stages ◽

Command Line ◽

Clustering Methods ◽

Sequencing Data ◽

Link Type ◽

High Throughput Sequencing Data ◽

Genome Wide ◽

Genome Wide Data

Summary.In this article we describe fluff, a software package that allows for simple exploration, clustering and visualization of high-throughput sequencing data mapped to a reference genome. The package contains three command-line tools to generate publication-quality figures in an uncomplicated manner using sensible defaults. Genome-wide data can be aggregated, clustered and visualized in a heatmap, according to different clustering methods. This includes a predefined setting to identify dynamic clusters between different conditions or developmental stages. Alternatively, clustered data can be visualized in a bandplot. Finally, fluff includes a tool to generate genomic profiles. As command-line tools, the fluff programs can easily be integrated into standard analysis pipelines. The installation is straightforward and documentation is available athttp://fluff.readthedocs.org.Availability.fluff is implemented in Python and runs on Linux. The source code is freely available for download athttps://github.com/simonvh/fluff.

Download Full-text

AGEpy: a Python package for computational biology

10.1101/450890 ◽

2018 ◽

Cited By ~ 1

Author(s):

Franziska Metge ◽

Robert Sehlke ◽

Jorge Boucas

Keyword(s):

Computational Biology ◽

Open Source ◽

High Throughput ◽

Biological Data ◽

Command Line ◽

High Throughput Analysis ◽

Throughput Analysis ◽

Link Type ◽

Biological Meaning ◽

Python Package

AbstractSummary:AGEpy is a Python package focused on the transformation of interpretable data into biological meaning. It is designed to support high-throughput analysis of pre-processed biological data using either local Python based processing or Python based API calls to local or remote servers. In this application note we describe its different Python modules as well as its command line accessible toolsaDiff,abed,blasto,david, andobo2tsv.Availability:The open source AGEpy Python package is freely available at:https://github.com/mpg-age-bioinformatics/AGEpy.Contact:[email protected]

Download Full-text

MultiPhATE2: Code for Functional Annotation and Comparison of Bacteriophage Genomes

10.1101/2020.10.05.324566 ◽

2020 ◽

Author(s):

Carol L. Ecale Zhou ◽

Jeffrey Kimbrel ◽

Robert Edwards ◽

Katlyn McNair ◽

Brian A. Souza ◽

...

Keyword(s):

Comparative Genomics ◽

Functional Annotation ◽

Input Data ◽

Search Space ◽

Search Algorithms ◽

Third Party ◽

Data Sets ◽

Sequence Annotation ◽

Command Line ◽

Link Type

AbstractTo address the need for improved tools for annotation and comparative genomics of bacteriophage genomes, we developed multiPhATE2. As an extension of the multiPhATE code, multiPhATE2 performs gene finding and functional sequence annotation of predicted gene and protein sequences, and additional search algorithms and databases extend the search space of the original functional annotation subsystem. MultiPhATE2 includes comparative genomics codes for gene matching among sets of input bacteriophage genomes, and scales well to large input data sets with the incorporation of multiprocessing in the functional annotation and comparative genomics subsystems. MultiPhATE2 was implemented in Python 3.7 and runs as a command-line code under Linux or MAC-OS. MultiPhATE2 is freely available under an open-source GPL-3 license at https://github.com/carolzhou/multiPhATE2. Instructions for acquiring the databases and third party codes used by multiPhATE2 are found in the README file included with the distribution. Users may report bugs by submitting issues to the project GitHub repository webpage. Contact: [email protected] or [email protected]. Supplementary materials, which demonstrate the outputs of multiPhATE2, are available in a GitHub repository, at https://github.com/carolzhou/multiPhATE2_supplementaryData/.

Download Full-text

NanoPack: visualizing and processing long read sequencing data

10.1101/237180 ◽

2017 ◽

Cited By ~ 2

Author(s):

Wouter De Coster ◽

Svenn D’Hert ◽

Darrin T. Schultz ◽

Marc Cruts ◽

Christine Van Broeckhoven

Keyword(s):

Web Service ◽

Graphical User Interface ◽

Source Code ◽

Supplementary Information ◽

Command Line ◽

Sequencing Data ◽

Link Type ◽

Oxford Nanopore ◽

Long Read ◽

Oxford Nanopore Technologies

AbstractSummary: Here we describe NanoPack, a set of tools developed for visualization and processing of long read sequencing data from Oxford Nanopore Technologies and Pacific Biosciences.Availability and Implementation: The NanoPack tools are written in Python3 and released under the GNU GPL3.0 Licence. The source code can be found at https://github.com/wdecoster/nanopack, together with links to separate scripts and their documentation. The scripts are compatible with Linux, Mac OS and the MS Windows 10 subsystem for linux and are available as a graphical user interface, a web service at http://nanoplot.bioinf.be and command line tools.Contact:[email protected] information: Supplementary tables and figures are available at Bioinformatics online.

Download Full-text

myCircos: Facilitating the Creation and Use of Circos Plots Online

10.1101/052605 ◽

2016 ◽

Author(s):

Caroline Labelle ◽

Geneviève Boucher ◽

Sébastien Lemieux

Keyword(s):

User Interface ◽

Web Application ◽

Graphical Representation ◽

Source Code ◽

Command Line ◽

Genomic Information ◽

Link Type ◽

Intuitive User Interface ◽

The Creation

AbstractCircos plots were designed to display large amounts of processed genomic information on a single graphical representation. The creation of such plots remains challenging for less technical users as the leading tool requires command-line proficiency. Here, we introduce myCircos, a web application that facilitates the generation of Circos plots by providing an intuitive user interface, adding interactive functionalities to the representation and providing persistence of previous requests. myCircos is available at: http://mycircos.iric.ca. Non registered users can explore the application through the Guest user. Source code (for local server installation) is available upon request.

Download Full-text

Edlib: a C/C++ library for fast, exact sequence alignment using edit distance

10.1101/070649 ◽

2016 ◽

Cited By ~ 2

Author(s):

Martin Šošić ◽

Mile Šikić

Keyword(s):

Exact Sequence ◽

Open Source ◽

Sequence Alignment ◽

Test Data ◽

Edit Distance ◽

Source Code ◽

Memory Usage ◽

Pairwise Sequence Alignment ◽

Link Type ◽

Bioinformatics Tools

AbstractWe present Edlib, an open-source C/C++ library for exact pairwise sequence alignment using edit distance. We compare Edlib to other libraries and show that it is the fastest while not lacking in functionality, and can also easily handle very large sequences. Being easy to use, flexible, fast and low on memory usage, we expect it to be a cornerstone for many future bioinformatics tools.Source code, installation instructions and test data are freely available for download at https://github.com/Martinsos/edlib, implemented in C/C++ and supported on Linux, MS Windows, and Mac OS.Contact:[email protected]

Download Full-text

UROPA GUI: A web platform for genomic region annotation

10.1101/302091 ◽

2018 ◽

Author(s):

Hendrik Schultheis ◽

Jens Preussner ◽

Annika Fust ◽

Mette Bentsen ◽

Carsten Kuenne ◽

...

Keyword(s):

Graphical User Interface ◽

Bioinformatics Analysis ◽

Source Code ◽

Genomic Region ◽

Command Line ◽

Web Based ◽

Link Type ◽

R Shiny ◽

Considerable Impact ◽

Web Platform

AbstractThe annotation of genomic ranges such as peaks resulting from ChIP-seq/ATAC-seq or other techniques represents a fundamental task of bioinformatics analysis with considerable impact on many downstream analyses. In our previous work, we introduced the Universal Robust Peak Annotator (UROPA), a flexible command line based tool which improves upon the functionality of existing annotation software. In order to reduce the complexity for biologists and clinicians, we have implemented an intuitive web-based graphical user interface (GUI) and fully functional service platform for UROPA. This extension will empower all users to generate annotations for regions of interest interactively.Availability and ImplementationThe open source UROPA GUI server was implemented in R Shiny and Python and is available from http://loosolab.mpi-bn.mpg.de. The source code of our App can be downloaded at https://github.molgen.mpg.de/loosolab/UROPA_GUI under the MIT license.

Download Full-text