scholarly journals CONSTAX2: Improved taxonomic classification of environmental DNA markers

2021 ◽  
Author(s):  
Julian Liber ◽  
Gregory Bonito ◽  
Gian Maria Niccolò Benucci

SummaryCONSTAX - the CONSensus TAXonomy classifier - was developed for accurate and reproducible taxonomic annotation of fungal rDNA amplicons and is based upon a consensus approach of RDP, SINTAX and UTAX algorithms. CONSTAX2 can be used to classify prokaryotes and incorporates BLAST-based classifiers to reduce classification errors. Additionally, CONSTAX2 implements a conda-installable, command line tool with improved classification metrics, faster training, multithreading support, capacity to incorporate external taxonomic databases, new isolate matching and high-level taxonomy tools, replete with documentation and example tutorials.Availability and ImplementationCONSTAX2 is available at https://github.com/liberjul/CONSTAXv2, and is packaged for Linux and MacOS from Bioconda. A tutorial and documentation are available at https://constax.readthedocs.io/en/latest/.

2019 ◽  
Author(s):  
Lucas Czech ◽  
Pierre Barbera ◽  
Alexandros Stamatakis

SummaryWe present GENESIS, a library for working with phylogenetic data, and GAPPA, an accompanying command line tool for conducting typical analyses on such data. The tools target phylogenetic trees and phylogenetic placements, sequences, taxonomies, and other relevant data types, offer high-level simplicity as well as low-level customizability, and are computationally efficient, well-tested, and field-proven.Availability and ImplementationBoth GENESIS and GAPPA are written in modern C++11, and are freely available under GPLv3 at http://github.com/lczech/genesis and http://github.com/lczech/[email protected] and [email protected].


2020 ◽  
Vol 36 (10) ◽  
pp. 3263-3265 ◽  
Author(s):  
Lucas Czech ◽  
Pierre Barbera ◽  
Alexandros Stamatakis

Abstract Summary We present genesis, a library for working with phylogenetic data, and gappa, an accompanying command-line tool for conducting typical analyses on such data. The tools target phylogenetic trees and phylogenetic placements, sequences, taxonomies and other relevant data types, offer high-level simplicity as well as low-level customizability, and are computationally efficient, well-tested and field-proven. Availability and implementation Both genesis and gappa are written in modern C++11, and are freely available under GPLv3 at http://github.com/lczech/genesis and http://github.com/lczech/gappa. Supplementary information Supplementary data are available at Bioinformatics online.


2016 ◽  
Author(s):  
Diego D. Cambuy ◽  
Felipe H. Coutinho ◽  
Bas E. Dutilh

AbstractIn modern-day metagenomics, there is an increasing need for robust taxonomic annotation of long DNA sequences from unknown micro-organisms. Long metagenomic sequences may be derived from assembly of short-read metagenomes, or from long-read single molecule sequencing. Here we introduce CAT, a pipeline for robust taxonomic classification of long DNA sequences. We show that CAT correctly classifies contigs at different taxonomic levels, even in simulated metagenomic datasets that are very distantly related from the sequences in the database. CAT is implemented in Python and the required scripts can be freely downloaded from Github.


Author(s):  
Jonas Förster ◽  
Frank T Bergmann ◽  
Jürgen Pahle

Abstract Motivation COPASI is a biochemical simulator and model analyzer which has found widespread use in academic research, teaching and beyond. One of COPASI’s strengths is its graphical user interface, and this is what most users work with. COPASI also provides a command-line tool. So far, an intuitive scripting interface that allows the creation and documentation of systems biology workflows was missing though. Results We have developed CoRC, the COPASI R Connector, an R package which provides a high-level scripting interface for COPASI. It closely mirrors the thought process of a (graphical interface) user and should therefore be very easy to use. This allows for complex workflows to be reproducibly scripted, utilizing COPASI’s powerful analytic toolset in combination with R’s extensive analysis and package ecosystem. Availability and implementation CoRC is a free and open-source R package, available via GitHub at https://jpahle.github.io/CoRC/ under the Artistic-2.0 license.   Supplementary information: We provide tutorial articles as well as several example scripts on the project’s website.


2020 ◽  
Author(s):  
Christopher Wilks ◽  
Omar Ahmed ◽  
Daniel N. Baker ◽  
David Zhang ◽  
Leonardo Collado-Torres ◽  
...  

AbstractMotivationA common way to summarize sequencing datasets is to quantify data lying within genes or other genomic intervals. This can be slow and can require different tools for different input file types.ResultsMegadepth is a fast tool for quantifying alignments and coverage for BigWig and BAM/CRAM input files, using substantially less memory than the next-fastest competitor. Megadepth can summarize coverage within all disjoint intervals of the Gencode V35 gene annotation for more than 19,000 GTExV8 BigWig files in approximately one hour using 32 threads. Megadepth is available both as a command-line tool and as an R/Bioconductor package providing much faster quantification compared to the rtracklayer package.Availabilityhttps://github.com/ChristopherWilks/megadepth, https://bioconductor.org/packages/[email protected]


2018 ◽  
Author(s):  
Johan Bengtsson-Palme ◽  
Rodney T. Richardson ◽  
Marco Meola ◽  
Christian Wurzbacher ◽  
Émilie D. Tremblay ◽  
...  

Correct taxonomic identification of DNA sequences is central to studies of biodiversity using both shotgun metagenomic and metabarcoding approaches. However, there is no genetic marker that gives sufficient performance across all the biological kingdoms, hampering studies of taxonomic diversity in many groups of organisms. We here present a major update to Metaxa2 (http://microbiology.se/software/metaxa2/) that enables the use of any genetic marker for taxonomic classification of metagenome and amplicon sequence data.


2018 ◽  
Author(s):  
Nicholas R. Waters ◽  
Florence Abram ◽  
Fiona Brennan ◽  
Ashleigh Holmes ◽  
Leighton Pritchard

SummaryThe Clermont PCR method of phylotyping Escherichia coli has remained a useful classification scheme despite the proliferation of higher-resolution sequence typing schemes. We have implemented an in silico Clermont PCR method as both a web app and as a command-line tool to allow researchers to easily apply this phylotyping scheme to genome assemblies easily.Availability and ImplementationEzClermont is available as a web app at http://www.ezclermont.org. For local use, EzClermont can be installed with pip or installed from the source code at https://github.com/nickp60/ezclermont. All analysis was done with version [email protected], [email protected] informationTable S1: test dataset; S2: validation dataset; S3: results.


2020 ◽  
Vol 2 (9) ◽  
Author(s):  
Nicholas R. Waters ◽  
Florence Abram ◽  
Fiona Brennan ◽  
Ashleigh Holmes ◽  
Leighton Pritchard

The Clermont PCR method for phylotyping Escherichia coli remains a useful classification scheme even though genome sequencing is now routine, and higher-resolution sequence typing schemes are now available. Relating present-day whole-genome E. coli classifications to legacy phylotyping is essential for harmonizing the historical literature and understanding of this important organism. Therefore, we present EzClermont – a novel in silico Clermont PCR phylotyping tool to enable ready application of this phylotyping scheme to whole-genome assemblies. We evaluate this tool against phylogenomic classifications, and an alternative software implementation of Clermont typing. EzClermont is available as a web app at www.ezclermont.org, and as a command-line tool at https://nickp60.github.io/EzClermont/.


2010 ◽  
Vol 43 (4) ◽  
pp. 669-676 ◽  
Author(s):  
Pavel V. Afonine ◽  
Ralf W. Grosse-Kunstleve ◽  
Vincent B. Chen ◽  
Jeffrey J. Headd ◽  
Nigel W. Moriarty ◽  
...  

phenix.model_vs_datais a high-level command-line tool for the computation of crystallographic model and data statistics, and the evaluation of the fit of the model to data. Analysis of all Protein Data Bank structures that have experimental data available shows that in most cases the reported statistics, in particularRfactors, can be reproduced within a few percentage points. However, there are a number of outliers where the recomputedRvalues are significantly different from those originally reported. The reasons for these discrepancies are discussed.


2021 ◽  
Vol 0 (0) ◽  
Author(s):  
Kevin Schneider ◽  
David Zimmer ◽  
Henrik Nielsen ◽  
Johannes M. Herrmann ◽  
Timo Mühlhaus

Abstract Matrix targeting sequences (MTSs) direct proteins from the cytosol into mitochondria. Efficient targeting often relies on internal matrix targeting-like sequences (iMTS-Ls) which share structural features with MTSs. Predicting iMTS-Ls was tedious and required multiple tools and webservices. We present iMLP, a deep learning approach for the prediction of iMTS-Ls in protein sequences. A recurrent neural network has been trained to predict iMTS-L propensity profiles for protein sequences of interest. The iMLP predictor considerably exceeds the speed of existing approaches. Expanding on our previous work on iMTS-L prediction, we now serve an intuitive iMLP webservice available at http://iMLP.bio.uni-kl.de and a stand-alone command line tool for power user in addition.


Sign in / Sign up

Export Citation Format

Share Document