scholarly journals Bakta: Rapid & standardized annotation of bacterial genomes via alignment-free sequence identification

2021 ◽  
Author(s):  
Oliver Schwengers ◽  
Lukas Jelonek ◽  
Marius Dieckmann ◽  
Sebastian Beyvers ◽  
Jochen Blom ◽  
...  

AbstractCommand line annotation software tools have continuously gained popularity compared to centralized online services due to the worldwide increase of sequenced bacterial genomes. However, results of existing command line software pipelines heavily depend on taxon specific databases or sufficiently well annotated reference genomes. Here, we introduce Bakta, a new command line software tool for the robust, taxon-independent, thorough and nonetheless fast annotation of bacterial genomes. Bakta conducts a comprehensive annotation workflow including the detection of small proteins taking into account replicon metadata. The annotation of coding sequences is accelerated via an alignment-free sequence identification approach that in addition facilitates the precise assignment of public database cross references. Annotation results are exported in GFF3 and INSDC-compliant flat files as well as comprehensive JSON files facilitating automated downstream analysis. We compared Bakta to other rapid contemporary command line annotation software tools in both targeted and taxonomically broad benchmarks including isolates and metagenomic-assembled genomes. We demonstrated that Bakta outperforms other tools in terms of functional annotations, the assignment of functional categories and database cross-references whilst providing comparable wall clock runtimes. Bakta is implemented in Python 3 and runs on MacOS and Linux systems. It is freely available under a GPLv3 license at https://github.com/oschwengers/bakta. An accompanying web version is available at https://bakta.computational.bio.

2021 ◽  
Vol 7 (11) ◽  
Author(s):  
Oliver Schwengers ◽  
Lukas Jelonek ◽  
Marius Alfred Dieckmann ◽  
Sebastian Beyvers ◽  
Jochen Blom ◽  
...  

Command-line annotation software tools have continuously gained popularity compared to centralized online services due to the worldwide increase of sequenced bacterial genomes. However, results of existing command-line software pipelines heavily depend on taxon-specific databases or sufficiently well annotated reference genomes. Here, we introduce Bakta, a new command-line software tool for the robust, taxon-independent, thorough and, nonetheless, fast annotation of bacterial genomes. Bakta conducts a comprehensive annotation workflow including the detection of small proteins taking into account replicon metadata. The annotation of coding sequences is accelerated via an alignment-free sequence identification approach that in addition facilitates the precise assignment of public database cross-references. Annotation results are exported in GFF3 and International Nucleotide Sequence Database Collaboration (INSDC)-compliant flat files, as well as comprehensive JSON files, facilitating automated downstream analysis. We compared Bakta to other rapid contemporary command-line annotation software tools in both targeted and taxonomically broad benchmarks including isolates and metagenomic-assembled genomes. We demonstrated that Bakta outperforms other tools in terms of functional annotations, the assignment of functional categories and database cross-references, whilst providing comparable wall-clock runtimes. Bakta is implemented in Python 3 and runs on MacOS and Linux systems. It is freely available under a GPLv3 license at https://github.com/oschwengers/bakta. An accompanying web version is available at https://bakta.computational.bio.


2016 ◽  
Author(s):  
Dmitry Antipov ◽  
Nolan Hartwick ◽  
Max Shen ◽  
Mikhail Raiko ◽  
Alla Lapidus ◽  
...  

ABSTRACTMotivationPlasmids are stably maintained extra-chromosomal genetic elements that replicate independently from the host cell’s chromosomes. Although plasmids harbor biomedically important genes, (such as genes involved in virulence and antibiotics resistance), there is a shortage of specialized software tools for extracting and assembling plasmid data from whole genome sequencing projects.ResultsWe present the plasmidSPAdes algorithm and software tool for assembling plasmids from whole genome sequencing data and benchmark its performance on a diverse set of bacterial genomes.Availability and implementationPLASMIDSPADESis publicly available athttp://spades.bioinf.spbau.ru/plasmidSPAdes/[email protected]


BMC Biology ◽  
2021 ◽  
Vol 19 (1) ◽  
Author(s):  
Congyu Lu ◽  
Zheng Zhang ◽  
Zena Cai ◽  
Zhaozhong Zhu ◽  
Ye Qiu ◽  
...  

Abstract Background Viruses are ubiquitous biological entities, estimated to be the largest reservoirs of unexplored genetic diversity on Earth. Full functional characterization and annotation of newly discovered viruses requires tools to enable taxonomic assignment, the range of hosts, and biological properties of the virus. Here we focus on prokaryotic viruses, which include phages and archaeal viruses, and for which identifying the viral host is an essential step in characterizing the virus, as the virus relies on the host for survival. Currently, the method for determining the viral host is either to culture the virus, which is low-throughput, time-consuming, and expensive, or to computationally predict the viral hosts, which needs improvements at both accuracy and usability. Here we develop a Gaussian model to predict hosts for prokaryotic viruses with better performances than previous computational methods. Results We present here Prokaryotic virus Host Predictor (PHP), a software tool using a Gaussian model, to predict hosts for prokaryotic viruses using the differences of k-mer frequencies between viral and host genomic sequences as features. PHP gave a host prediction accuracy of 34% (genus level) on the VirHostMatcher benchmark dataset and a host prediction accuracy of 35% (genus level) on a new dataset containing 671 viruses and 60,105 prokaryotic genomes. The prediction accuracy exceeded that of two alignment-free methods (VirHostMatcher and WIsH, 28–34%, genus level). PHP also outperformed these two alignment-free methods much (24–38% vs 18–20%, genus level) when predicting hosts for prokaryotic viruses which cannot be predicted by the BLAST-based or the CRISPR-spacer-based methods alone. Requiring a minimal score for making predictions (thresholding) and taking the consensus of the top 30 predictions further improved the host prediction accuracy of PHP. Conclusions The Prokaryotic virus Host Predictor software tool provides an intuitive and user-friendly API for the Gaussian model described herein. This work will facilitate the rapid identification of hosts for newly identified prokaryotic viruses in metagenomic studies.


2016 ◽  
Vol 106 (07-08) ◽  
pp. 544-549
Author(s):  
V. K. Bellmann ◽  
P. Prof. Nyhuis

Zur Erhaltung ihrer Wettbewerbsfähigkeit setzen Unternehmen sowohl prozessverbessernde als auch kompetenzsteigernde Methoden ein. Jedoch erschwert die Vielzahl an Methoden eine anwendungsspezifische Auswahl. Somit wird ein Software-Tool benötigt, das neben den individuellen Zielstellungen auch die Voraussetzungen für eine erfolgreiche Umsetzung der Methoden berücksichtigt. Dieser Fachbeitrag beschreibt die Entwicklung eines Software-Tools zur zielgerichteten Entscheidungsunterstützung.   Companies apply process-improving and competence-increasing methods to maintain their competitiveness. However the huge amount of existing methods impedes an application-oriented selection. Thus a software tool is needed which considers individual objectives as well as requirements for a successful application of the methods. This paper describes the development of a software tool for a target-oriented decision support.


2015 ◽  
Vol 14 ◽  
pp. CIN.S26470 ◽  
Author(s):  
Richard P. Finney ◽  
Qing-Rong Chen ◽  
Cu V. Nguyen ◽  
Chih Hao Hsu ◽  
Chunhua Yan ◽  
...  

The name Alview is a contraction of the term Alignment Viewer. Alview is a compiled to native architecture software tool for visualizing the alignment of sequencing data. Inputs are files of short-read sequences aligned to a reference genome in the SAM/BAM format and files containing reference genome data. Outputs are visualizations of these aligned short reads. Alview is written in portable C with optional graphical user interface (GUI) code written in C, C++, and Objective-C. The application can run in three different ways: as a web server, as a command line tool, or as a native, GUI program. Alview is compatible with Microsoft Windows, Linux, and Apple OS X. It is available as a web demo at https://cgwb.nci.nih.gov/cgi-bin/alview . The source code and Windows/Mac/Linux executables are available via https://github.com/NCIP/alview .


2018 ◽  
Vol 13 (1) ◽  
pp. 44-61 ◽  
Author(s):  
Lorraine Lee ◽  
William Kerler ◽  
Daniel Ivancevich

The ability to use various software and tools is important for students entering the accounting profession. In an exploratory study, we develop a survey to assess accounting practitioners' evaluations of the importance of various software tools, as well as the importance of data analytics and data visualization skills. Responses from 197 practitioners indicate that Excel is the most frequently utilized software / tool, the most important software tool for new hires, and that Excel should be emphasized in university accounting programs. We find that the importance of Excel is consistent across different accounting areas (audit, tax, advisory, and corporate) and across all experience levels. In addition, Adobe Acrobat, PowerPoint, accounting / ERP software, and the FASB Codification were identified as frequently utilized across the various accounting areas and experience levels. Finally, practitioners in each of the different accounting areas and at all experience levels indicate data analytic skills and data visualization skills are important, but that data analytic skills are perceived as more important than data visualization skills. Our study contributes to the accounting information systems literature by identifying the specific software and tools that are relevant to the profession and provides guidance on the software and tools that should be emphasized in university accounting programs.


2021 ◽  
Author(s):  
Weiqian Cao ◽  
Siyuan Kong ◽  
Wenfeng Zeng ◽  
Pengyun Gong ◽  
Biyun Jiang ◽  
...  

Interpreting large-scale glycoproteomic data for intact glycopeptide identification has been tremendously advanced by software tools. However, software tools for quantitative analysis of intact glycopeptides remain lagging behind, which greatly hinders exploring the differential expression and functions of site-specific glycosylation in organisms. Here, we report pGlycoQuant, a generic software tool for accurate and convenient quantitative intact glycopeptide analysis, supporting both primary and tandem mass spectrometry quantitation for multiple quantitative strategies. pGlycoQuant enables intact glycopeptide quantitation with very low missing values via a deep residual network, thus greatly expanding the quantitative function of several powerful search engines, currently including pGlyco 2.0, pGlyco3, Byonic and MSFragger-Glyco. The pGlycoQuant-based site-specific N-glycoproteomic study conducted here quantifies 6435 intact N-glycopeptides in three hepatocellular carcinoma cell lines with different metastatic potentials and, together with in vitro molecular biology experiments, illustrates core fucosylation at site 979 of the L1 cell adhesion molecule (L1CAM) as a potential regulator of HCC metastasis. pGlycoQuant is freely available at https://github.com/expellir-arma/pGlycoQuant/releases/. We have demonstrated pGlycoQuant to be a powerful tool for the quantitative analysis of site-specific glycosylation and the exploration of potential glycosylation-related biomarker candidates, and we expect further applications in glycoproteomic studies.


2020 ◽  
Author(s):  
Robert A. Petit ◽  
Timothy D. Read

AbstractSequencing of bacterial genomes using Illumina technology has become such a standard procedure that often data are generated faster than can be conveniently analyzed. We created a new series of pipelines called Bactopia, built using Nextflow workflow software, to provide efficient comparative genomic analyses for bacterial species or genera. Bactopia consists of a dataset setup step (Bactopia Datasets; BaDs) where a series of customizable datasets are created for the species of interest; the Bactopia Analysis Pipeline (BaAP), which performs quality control, genome assembly and several other functions based on the available datasets and outputs the processed data to a structured directory format; and a series of Bactopia Tools (BaTs) that perform specific post-processing on some or all of the processed data. BaTs include pan-genome analysis, computing average nucleotide identity between samples, extracting and profiling the 16S genes and taxonomic classification using highly conserved genes. It is expected that the number of BaTs will increase to fill specific applications in the future. As a demonstration, we performed an analysis of 1,664 public Lactobacillus genomes, focusing on L. crispatus, a species that is a common part of the human vaginal microbiome. Bactopia is an open source system that can scale from projects as small as one bacterial genome to thousands that allows for great flexibility in choosing comparison datasets and options for downstream analysis. Bactopia code can be accessed at https://www.github.com/bactopia/bactopia.


2015 ◽  
Vol 8 (1) ◽  
Author(s):  
Emanuel Weitschek ◽  
Fabio Cunial ◽  
Giovanni Felici

Author(s):  
Helena Hashemi Farzaneh ◽  
Lorenz Neuner

AbstractMuch of the work in design research focusses on the development of methods and tools to support engineering designers. Many of these tools are nowadays implemented in software. Due to the strongly growing use of computers and smart devices in the last two decades, the expectations of users increased dramatically. In particular users expect good usability, for example little effort for learning to apply the software. Therefore, the usability evaluation of design software tools is crucial. A software tool with bad usability will not be used in industrial practice. Recommendations for usability evaluation of software often stem from the field of Human Computer Interaction. The aim of this paper is to tailor these general approaches to the specific needs of engineering design. In addition, we propose a method to analyse the results of the evaluation and to derive suggestions for improving the design software tool. We apply the usability evaluation method on a use case - the KoMBi software tool for bio-inspired design. The case study provides additional insights with regards to problem, causes and improvement categories.


Sign in / Sign up

Export Citation Format

Share Document