CONSTAX2: Improved taxonomic classification of environmental DNA markers

Mapping Intimacies ◽

10.1101/2021.02.15.430803 ◽

2021 ◽

Author(s):

Julian Liber ◽

Gregory Bonito ◽

Gian Maria Niccolò Benucci

Keyword(s):

Dna Markers ◽

Environmental Dna ◽

Taxonomic Classification ◽

Command Line ◽

Consensus Approach ◽

Link Type ◽

Command Line Tool ◽

High Level ◽

Taxonomic Annotation

SummaryCONSTAX - the CONSensus TAXonomy classifier - was developed for accurate and reproducible taxonomic annotation of fungal rDNA amplicons and is based upon a consensus approach of RDP, SINTAX and UTAX algorithms. CONSTAX2 can be used to classify prokaryotes and incorporates BLAST-based classifiers to reduce classification errors. Additionally, CONSTAX2 implements a conda-installable, command line tool with improved classification metrics, faster training, multithreading support, capacity to incorporate external taxonomic databases, new isolate matching and high-level taxonomy tools, replete with documentation and example tutorials.Availability and ImplementationCONSTAX2 is available at https://github.com/liberjul/CONSTAXv2, and is packaged for Linux and MacOS from Bioconda. A tutorial and documentation are available at https://constax.readthedocs.io/en/latest/.

Download Full-text

Genesis and Gappa: Processing, Analyzing and Visualizing Phylogenetic (Placement) Data

10.1101/647958 ◽

2019 ◽

Cited By ~ 3

Author(s):

Lucas Czech ◽

Pierre Barbera ◽

Alexandros Stamatakis

Keyword(s):

Phylogenetic Trees ◽

Command Line ◽

Computationally Efficient ◽

Data Types ◽

Low Level ◽

Phylogenetic Placement ◽

Link Type ◽

Phylogenetic Data ◽

Command Line Tool ◽

High Level

SummaryWe present GENESIS, a library for working with phylogenetic data, and GAPPA, an accompanying command line tool for conducting typical analyses on such data. The tools target phylogenetic trees and phylogenetic placements, sequences, taxonomies, and other relevant data types, offer high-level simplicity as well as low-level customizability, and are computationally efficient, well-tested, and field-proven.Availability and ImplementationBoth GENESIS and GAPPA are written in modern C++11, and are freely available under GPLv3 at http://github.com/lczech/genesis and http://github.com/lczech/[email protected] and [email protected].

Download Full-text

Genesis and Gappa: processing, analyzing and visualizing phylogenetic (placement) data

Bioinformatics ◽

10.1093/bioinformatics/btaa070 ◽

2020 ◽

Vol 36 (10) ◽

pp. 3263-3265 ◽

Cited By ~ 14

Author(s):

Lucas Czech ◽

Pierre Barbera ◽

Alexandros Stamatakis

Keyword(s):

Phylogenetic Trees ◽

Supplementary Information ◽

Command Line ◽

Supplementary Data ◽

Computationally Efficient ◽

Data Types ◽

Low Level ◽

Phylogenetic Placement ◽

Command Line Tool ◽

High Level

Abstract Summary We present genesis, a library for working with phylogenetic data, and gappa, an accompanying command-line tool for conducting typical analyses on such data. The tools target phylogenetic trees and phylogenetic placements, sequences, taxonomies and other relevant data types, offer high-level simplicity as well as low-level customizability, and are computationally efficient, well-tested and field-proven. Availability and implementation Both genesis and gappa are written in modern C++11, and are freely available under GPLv3 at http://github.com/lczech/genesis and http://github.com/lczech/gappa. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Contig annotation tool CAT robustly classifies assembled metagenomic contigs and long sequences

10.1101/072868 ◽

2016 ◽

Cited By ~ 13

Author(s):

Diego D. Cambuy ◽

Felipe H. Coutinho ◽

Bas E. Dutilh

Keyword(s):

Single Molecule ◽

Dna Sequences ◽

Taxonomic Classification ◽

Annotation Tool ◽

Single Molecule Sequencing ◽

Short Read ◽

Long Read ◽

Micro Organisms ◽

Taxonomic Annotation

AbstractIn modern-day metagenomics, there is an increasing need for robust taxonomic annotation of long DNA sequences from unknown micro-organisms. Long metagenomic sequences may be derived from assembly of short-read metagenomes, or from long-read single molecule sequencing. Here we introduce CAT, a pipeline for robust taxonomic classification of long DNA sequences. We show that CAT correctly classifies contigs at different taxonomic levels, even in simulated metagenomic datasets that are very distantly related from the sequences in the database. CAT is implemented in Python and the required scripts can be freely downloaded from Github.

Download Full-text

CoRC: the COPASI R Connector

Bioinformatics ◽

10.1093/bioinformatics/btab033 ◽

2021 ◽

Author(s):

Jonas Förster ◽

Frank T Bergmann ◽

Jürgen Pahle

Keyword(s):

Graphical User Interface ◽

Academic Research ◽

R Package ◽

Supplementary Information ◽

Command Line ◽

Graphical Interface ◽

Thought Process ◽

Extensive Analysis ◽

Command Line Tool ◽

High Level

Abstract Motivation COPASI is a biochemical simulator and model analyzer which has found widespread use in academic research, teaching and beyond. One of COPASI’s strengths is its graphical user interface, and this is what most users work with. COPASI also provides a command-line tool. So far, an intuitive scripting interface that allows the creation and documentation of systems biology workflows was missing though. Results We have developed CoRC, the COPASI R Connector, an R package which provides a high-level scripting interface for COPASI. It closely mirrors the thought process of a (graphical interface) user and should therefore be very easy to use. This allows for complex workflows to be reproducibly scripted, utilizing COPASI’s powerful analytic toolset in combination with R’s extensive analysis and package ecosystem. Availability and implementation CoRC is a free and open-source R package, available via GitHub at https://jpahle.github.io/CoRC/ under the Artistic-2.0 license. Supplementary information: We provide tutorial articles as well as several example scripts on the project’s website.

Download Full-text

Megadepth: efficient coverage quantification for BigWigs and BAMs

10.1101/2020.12.17.423317 ◽

2020 ◽

Author(s):

Christopher Wilks ◽

Omar Ahmed ◽

Daniel N. Baker ◽

David Zhang ◽

Leonardo Collado-Torres ◽

...

Keyword(s):

Gene Annotation ◽

Command Line ◽

Bioconductor Package ◽

Input File ◽

Link Type ◽

Command Line Tool

AbstractMotivationA common way to summarize sequencing datasets is to quantify data lying within genes or other genomic intervals. This can be slow and can require different tools for different input file types.ResultsMegadepth is a fast tool for quantifying alignments and coverage for BigWig and BAM/CRAM input files, using substantially less memory than the next-fastest competitor. Megadepth can summarize coverage within all disjoint intervals of the Gencode V35 gene annotation for more than 19,000 GTExV8 BigWig files in approximately one hour using 32 threads. Megadepth is available both as a command-line tool and as an R/Bioconductor package providing much faster quantification compared to the rtracklayer package.Availabilityhttps://github.com/ChristopherWilks/megadepth, https://bioconductor.org/packages/[email protected]

Download Full-text

Taxonomic identification from metagenomic and metabarcoding data using any genetic marker

10.1101/253377 ◽

2018 ◽

Author(s):

Johan Bengtsson-Palme ◽

Rodney T. Richardson ◽

Marco Meola ◽

Christian Wurzbacher ◽

Émilie D. Tremblay ◽

...

Keyword(s):

Genetic Marker ◽

Dna Sequences ◽

Sequence Data ◽

Taxonomic Diversity ◽

Taxonomic Classification ◽

Taxonomic Identification ◽

Link Type

Correct taxonomic identification of DNA sequences is central to studies of biodiversity using both shotgun metagenomic and metabarcoding approaches. However, there is no genetic marker that gives sufficient performance across all the biological kingdoms, hampering studies of taxonomic diversity in many groups of organisms. We here present a major update to Metaxa2 (http://microbiology.se/software/metaxa2/) that enables the use of any genetic marker for taxonomic classification of metagenome and amplicon sequence data.

Download Full-text

Easily phylotyping E. coli via the EzClermont web app and command-line tool

10.1101/317610 ◽

2018 ◽

Cited By ~ 3

Author(s):

Nicholas R. Waters ◽

Florence Abram ◽

Fiona Brennan ◽

Ashleigh Holmes ◽

Leighton Pritchard

Keyword(s):

Supplementary Information ◽

Validation Dataset ◽

Command Line ◽

E Coli ◽

Link Type ◽

Command Line Tool ◽

Pcr Method ◽

Web App ◽

Local Use ◽

Genome Assemblies

SummaryThe Clermont PCR method of phylotyping Escherichia coli has remained a useful classification scheme despite the proliferation of higher-resolution sequence typing schemes. We have implemented an in silico Clermont PCR method as both a web app and as a command-line tool to allow researchers to easily apply this phylotyping scheme to genome assemblies easily.Availability and ImplementationEzClermont is available as a web app at http://www.ezclermont.org. For local use, EzClermont can be installed with pip or installed from the source code at https://github.com/nickp60/ezclermont. All analysis was done with version [email protected], [email protected] informationTable S1: test dataset; S2: validation dataset; S3: results.

Download Full-text

Easy phylotyping of Escherichia coli via the EzClermont web app and command-line tool

Access Microbiology ◽

10.1099/acmi.0.000143 ◽

2020 ◽

Vol 2 (9) ◽

Cited By ~ 2

Author(s):

Nicholas R. Waters ◽

Florence Abram ◽

Fiona Brennan ◽

Ashleigh Holmes ◽

Leighton Pritchard

Keyword(s):

Escherichia Coli ◽

Type Species ◽

Whole Genome ◽

Command Line ◽

Content Type ◽

Link Type ◽

Command Line Tool ◽

Pcr Method ◽

Web App ◽

Genome Assemblies

The Clermont PCR method for phylotyping Escherichia coli remains a useful classification scheme even though genome sequencing is now routine, and higher-resolution sequence typing schemes are now available. Relating present-day whole-genome E. coli classifications to legacy phylotyping is essential for harmonizing the historical literature and understanding of this important organism. Therefore, we present EzClermont – a novel in silico Clermont PCR phylotyping tool to enable ready application of this phylotyping scheme to whole-genome assemblies. We evaluate this tool against phylogenomic classifications, and an alternative software implementation of Clermont typing. EzClermont is available as a web app at www.ezclermont.org, and as a command-line tool at https://nickp60.github.io/EzClermont/.

Download Full-text

phenix.model_vs_data: a high-level tool for the calculation of crystallographic model and data statistics

Journal of Applied Crystallography ◽

10.1107/s0021889810015608 ◽

2010 ◽

Vol 43 (4) ◽

pp. 669-676 ◽

Cited By ~ 86

Author(s):

Pavel V. Afonine ◽

Ralf W. Grosse-Kunstleve ◽

Vincent B. Chen ◽

Jeffrey J. Headd ◽

Nigel W. Moriarty ◽

...

Keyword(s):

Experimental Data ◽

Data Analysis ◽

Protein Data Bank ◽

Data Bank ◽

Command Line ◽

Percentage Points ◽

Command Line Tool ◽

Data Statistics ◽

High Level

phenix.model_vs_datais a high-level command-line tool for the computation of crystallographic model and data statistics, and the evaluation of the fit of the model to data. Analysis of all Protein Data Bank structures that have experimental data available shows that in most cases the reported statistics, in particularRfactors, can be reproduced within a few percentage points. However, there are a number of outliers where the recomputedRvalues are significantly different from those originally reported. The reasons for these discrepancies are discussed.

Download Full-text

iMLP, a predictor for internal matrix targeting-like sequences in mitochondrial proteins

Biological Chemistry ◽

10.1515/hsz-2021-0185 ◽

2021 ◽

Vol 0 (0) ◽

Author(s):

Kevin Schneider ◽

David Zimmer ◽

Henrik Nielsen ◽

Johannes M. Herrmann ◽

Timo Mühlhaus

Keyword(s):

Neural Network ◽

Deep Learning ◽

Recurrent Neural Network ◽

Protein Sequences ◽

Structural Features ◽

Mitochondrial Proteins ◽

Learning Approach ◽

Command Line ◽

Link Type ◽

Command Line Tool

Abstract Matrix targeting sequences (MTSs) direct proteins from the cytosol into mitochondria. Efficient targeting often relies on internal matrix targeting-like sequences (iMTS-Ls) which share structural features with MTSs. Predicting iMTS-Ls was tedious and required multiple tools and webservices. We present iMLP, a deep learning approach for the prediction of iMTS-Ls in protein sequences. A recurrent neural network has been trained to predict iMTS-L propensity profiles for protein sequences of interest. The iMLP predictor considerably exceeds the speed of existing approaches. Expanding on our previous work on iMTS-L prediction, we now serve an intuitive iMLP webservice available at http://iMLP.bio.uni-kl.de and a stand-alone command line tool for power user in addition.

Download Full-text