Bakta: Rapid & standardized annotation of bacterial genomes via alignment-free sequence identification

Mapping Intimacies ◽

10.1101/2021.09.02.458689 ◽

2021 ◽

Author(s):

Oliver Schwengers ◽

Lukas Jelonek ◽

Marius Dieckmann ◽

Sebastian Beyvers ◽

Jochen Blom ◽

...

Keyword(s):

Software Tool ◽

Software Tools ◽

Command Line ◽

Bacterial Genomes ◽

Functional Annotations ◽

Small Proteins ◽

Alignment Free ◽

Sequence Identification ◽

Identification Approach ◽

Downstream Analysis

AbstractCommand line annotation software tools have continuously gained popularity compared to centralized online services due to the worldwide increase of sequenced bacterial genomes. However, results of existing command line software pipelines heavily depend on taxon specific databases or sufficiently well annotated reference genomes. Here, we introduce Bakta, a new command line software tool for the robust, taxon-independent, thorough and nonetheless fast annotation of bacterial genomes. Bakta conducts a comprehensive annotation workflow including the detection of small proteins taking into account replicon metadata. The annotation of coding sequences is accelerated via an alignment-free sequence identification approach that in addition facilitates the precise assignment of public database cross references. Annotation results are exported in GFF3 and INSDC-compliant flat files as well as comprehensive JSON files facilitating automated downstream analysis. We compared Bakta to other rapid contemporary command line annotation software tools in both targeted and taxonomically broad benchmarks including isolates and metagenomic-assembled genomes. We demonstrated that Bakta outperforms other tools in terms of functional annotations, the assignment of functional categories and database cross-references whilst providing comparable wall clock runtimes. Bakta is implemented in Python 3 and runs on MacOS and Linux systems. It is freely available under a GPLv3 license at https://github.com/oschwengers/bakta. An accompanying web version is available at https://bakta.computational.bio.

Download Full-text

Bakta: rapid and standardized annotation of bacterial genomes via alignment-free sequence identification

Microbial Genomics ◽

10.1099/mgen.0.000685 ◽

2021 ◽

Vol 7 (11) ◽

Author(s):

Oliver Schwengers ◽

Lukas Jelonek ◽

Marius Alfred Dieckmann ◽

Sebastian Beyvers ◽

Jochen Blom ◽

...

Keyword(s):

Software Tool ◽

Software Tools ◽

Command Line ◽

Bacterial Genomes ◽

Functional Annotations ◽

Link Type ◽

Small Proteins ◽

Alignment Free ◽

Sequence Identification ◽

Downstream Analysis

Command-line annotation software tools have continuously gained popularity compared to centralized online services due to the worldwide increase of sequenced bacterial genomes. However, results of existing command-line software pipelines heavily depend on taxon-specific databases or sufficiently well annotated reference genomes. Here, we introduce Bakta, a new command-line software tool for the robust, taxon-independent, thorough and, nonetheless, fast annotation of bacterial genomes. Bakta conducts a comprehensive annotation workflow including the detection of small proteins taking into account replicon metadata. The annotation of coding sequences is accelerated via an alignment-free sequence identification approach that in addition facilitates the precise assignment of public database cross-references. Annotation results are exported in GFF3 and International Nucleotide Sequence Database Collaboration (INSDC)-compliant flat files, as well as comprehensive JSON files, facilitating automated downstream analysis. We compared Bakta to other rapid contemporary command-line annotation software tools in both targeted and taxonomically broad benchmarks including isolates and metagenomic-assembled genomes. We demonstrated that Bakta outperforms other tools in terms of functional annotations, the assignment of functional categories and database cross-references, whilst providing comparable wall-clock runtimes. Bakta is implemented in Python 3 and runs on MacOS and Linux systems. It is freely available under a GPLv3 license at https://github.com/oschwengers/bakta. An accompanying web version is available at https://bakta.computational.bio.

Download Full-text

plasmidSPAdes: Assembling Plasmids from Whole Genome Sequencing Data

10.1101/048942 ◽

2016 ◽

Cited By ~ 15

Author(s):

Dmitry Antipov ◽

Nolan Hartwick ◽

Max Shen ◽

Mikhail Raiko ◽

Alla Lapidus ◽

...

Keyword(s):

Whole Genome Sequencing ◽

Genome Sequencing ◽

Software Tool ◽

Software Tools ◽

Whole Genome Sequencing Data ◽

Antibiotics Resistance ◽

Whole Genome ◽

Sequencing Data ◽

Bacterial Genomes ◽

Specialized Software

ABSTRACTMotivationPlasmids are stably maintained extra-chromosomal genetic elements that replicate independently from the host cell’s chromosomes. Although plasmids harbor biomedically important genes, (such as genes involved in virulence and antibiotics resistance), there is a shortage of specialized software tools for extracting and assembling plasmid data from whole genome sequencing projects.ResultsWe present the plasmidSPAdes algorithm and software tool for assembling plasmids from whole genome sequencing data and benchmark its performance on a diverse set of bacterial genomes.Availability and implementationPLASMIDSPADESis publicly available athttp://spades.bioinf.spbau.ru/plasmidSPAdes/[email protected]

Download Full-text

Prokaryotic virus host predictor: a Gaussian model for host prediction of prokaryotic viruses in metagenomics

BMC Biology ◽

10.1186/s12915-020-00938-6 ◽

2021 ◽

Vol 19 (1) ◽

Author(s):

Congyu Lu ◽

Zheng Zhang ◽

Zena Cai ◽

Zhaozhong Zhu ◽

Ye Qiu ◽

...

Keyword(s):

Prediction Accuracy ◽

Functional Characterization ◽

Gaussian Model ◽

Software Tool ◽

Biological Properties ◽

Rapid Identification ◽

Taxonomic Assignment ◽

Genus Level ◽

Alignment Free ◽

Archaeal Viruses

Abstract Background Viruses are ubiquitous biological entities, estimated to be the largest reservoirs of unexplored genetic diversity on Earth. Full functional characterization and annotation of newly discovered viruses requires tools to enable taxonomic assignment, the range of hosts, and biological properties of the virus. Here we focus on prokaryotic viruses, which include phages and archaeal viruses, and for which identifying the viral host is an essential step in characterizing the virus, as the virus relies on the host for survival. Currently, the method for determining the viral host is either to culture the virus, which is low-throughput, time-consuming, and expensive, or to computationally predict the viral hosts, which needs improvements at both accuracy and usability. Here we develop a Gaussian model to predict hosts for prokaryotic viruses with better performances than previous computational methods. Results We present here Prokaryotic virus Host Predictor (PHP), a software tool using a Gaussian model, to predict hosts for prokaryotic viruses using the differences of k-mer frequencies between viral and host genomic sequences as features. PHP gave a host prediction accuracy of 34% (genus level) on the VirHostMatcher benchmark dataset and a host prediction accuracy of 35% (genus level) on a new dataset containing 671 viruses and 60,105 prokaryotic genomes. The prediction accuracy exceeded that of two alignment-free methods (VirHostMatcher and WIsH, 28–34%, genus level). PHP also outperformed these two alignment-free methods much (24–38% vs 18–20%, genus level) when predicting hosts for prokaryotic viruses which cannot be predicted by the BLAST-based or the CRISPR-spacer-based methods alone. Requiring a minimal score for making predictions (thresholding) and taking the consensus of the top 30 predictions further improved the host prediction accuracy of PHP. Conclusions The Prokaryotic virus Host Predictor software tool provides an intuitive and user-friendly API for the Gaussian model described herein. This work will facilitate the rapid identification of hosts for newly identified prokaryotic viruses in metagenomic studies.

Download Full-text

Effiziente Produktionsgestaltung*/Efficient production design - Development of a software tool for a process- and competence-oriented decision support

wt Werkstattstechnik online ◽

10.37544/1436-4980-2016-07-08-78 ◽

2016 ◽

Vol 106 (07-08) ◽

pp. 544-549

Author(s):

V. K. Bellmann ◽

P. Prof. Nyhuis

Keyword(s):

Decision Support ◽

Software Tool ◽

Software Tools ◽

Efficient Production ◽

Design Development ◽

Huge Amount

Zur Erhaltung ihrer Wettbewerbsfähigkeit setzen Unternehmen sowohl prozessverbessernde als auch kompetenzsteigernde Methoden ein. Jedoch erschwert die Vielzahl an Methoden eine anwendungsspezifische Auswahl. Somit wird ein Software-Tool benötigt, das neben den individuellen Zielstellungen auch die Voraussetzungen für eine erfolgreiche Umsetzung der Methoden berücksichtigt. Dieser Fachbeitrag beschreibt die Entwicklung eines Software-Tools zur zielgerichteten Entscheidungsunterstützung.   Companies apply process-improving and competence-increasing methods to maintain their competitiveness. However the huge amount of existing methods impedes an application-oriented selection. Thus a software tool is needed which considers individual objectives as well as requirements for a successful application of the methods. This paper describes the development of a software tool for a target-oriented decision support.

Download Full-text

Alview: Portable Software for Viewing Sequence Reads in BAM Formatted Files

Cancer Informatics ◽

10.4137/cin.s26470 ◽

2015 ◽

Vol 14 ◽

pp. CIN.S26470 ◽

Cited By ~ 2

Author(s):

Richard P. Finney ◽

Qing-Rong Chen ◽

Cu V. Nguyen ◽

Chih Hao Hsu ◽

Chunhua Yan ◽

...

Keyword(s):

Graphical User Interface ◽

Reference Genome ◽

Source Code ◽

Software Tool ◽

Command Line ◽

Sequencing Data ◽

Genome Data ◽

Command Line Tool ◽

Portable Software ◽

Microsoft Windows

The name Alview is a contraction of the term Alignment Viewer. Alview is a compiled to native architecture software tool for visualizing the alignment of sequencing data. Inputs are files of short-read sequences aligned to a reference genome in the SAM/BAM format and files containing reference genome data. Outputs are visualizations of these aligned short reads. Alview is written in portable C with optional graphical user interface (GUI) code written in C, C++, and Objective-C. The application can run in three different ways: as a web server, as a command line tool, or as a native, GUI program. Alview is compatible with Microsoft Windows, Linux, and Apple OS X. It is available as a web demo at https://cgwb.nci.nih.gov/cgi-bin/alview . The source code and Windows/Mac/Linux executables are available via https://github.com/NCIP/alview .

Download Full-text

Beyond Excel: Software Tools and the Accounting Curriculum

AIS Educator Journal ◽

10.3194/1935-8156-13.1.44 ◽

2018 ◽

Vol 13 (1) ◽

pp. 44-61 ◽

Cited By ~ 5

Author(s):

Lorraine Lee ◽

William Kerler ◽

Daniel Ivancevich

Keyword(s):

Data Visualization ◽

Software Tool ◽

Software Tools ◽

Adobe Acrobat ◽

Accounting Information Systems ◽

Accounting Curriculum ◽

Experience Levels ◽

Accounting Practitioners ◽

Analytic Skills ◽

Data Analytic

The ability to use various software and tools is important for students entering the accounting profession. In an exploratory study, we develop a survey to assess accounting practitioners' evaluations of the importance of various software tools, as well as the importance of data analytics and data visualization skills. Responses from 197 practitioners indicate that Excel is the most frequently utilized software / tool, the most important software tool for new hires, and that Excel should be emphasized in university accounting programs. We find that the importance of Excel is consistent across different accounting areas (audit, tax, advisory, and corporate) and across all experience levels. In addition, Adobe Acrobat, PowerPoint, accounting / ERP software, and the FASB Codification were identified as frequently utilized across the various accounting areas and experience levels. Finally, practitioners in each of the different accounting areas and at all experience levels indicate data analytic skills and data visualization skills are important, but that data analytic skills are perceived as more important than data visualization skills. Our study contributes to the accounting information systems literature by identifying the specific software and tools that are relevant to the profession and provides guidance on the software and tools that should be emphasized in university accounting programs.

Download Full-text

pGlycoQuant with a deep residual network for precise and minuscule-missing-value quantitative glycoproteomics enabling the functional exploration of site-specific glycosylation

10.1101/2021.11.15.468561 ◽

2021 ◽

Author(s):

Weiqian Cao ◽

Siyuan Kong ◽

Wenfeng Zeng ◽

Pengyun Gong ◽

Biyun Jiang ◽

...

Keyword(s):

Quantitative Analysis ◽

Large Scale ◽

Missing Values ◽

Software Tool ◽

Software Tools ◽

Residual Network ◽

Site Specific ◽

Carcinoma Cell Lines ◽

L1 Cell Adhesion Molecule

Interpreting large-scale glycoproteomic data for intact glycopeptide identification has been tremendously advanced by software tools. However, software tools for quantitative analysis of intact glycopeptides remain lagging behind, which greatly hinders exploring the differential expression and functions of site-specific glycosylation in organisms. Here, we report pGlycoQuant, a generic software tool for accurate and convenient quantitative intact glycopeptide analysis, supporting both primary and tandem mass spectrometry quantitation for multiple quantitative strategies. pGlycoQuant enables intact glycopeptide quantitation with very low missing values via a deep residual network, thus greatly expanding the quantitative function of several powerful search engines, currently including pGlyco 2.0, pGlyco3, Byonic and MSFragger-Glyco. The pGlycoQuant-based site-specific N-glycoproteomic study conducted here quantifies 6435 intact N-glycopeptides in three hepatocellular carcinoma cell lines with different metastatic potentials and, together with in vitro molecular biology experiments, illustrates core fucosylation at site 979 of the L1 cell adhesion molecule (L1CAM) as a potential regulator of HCC metastasis. pGlycoQuant is freely available at https://github.com/expellir-arma/pGlycoQuant/releases/. We have demonstrated pGlycoQuant to be a powerful tool for the quantitative analysis of site-specific glycosylation and the exploration of potential glycosylation-related biomarker candidates, and we expect further applications in glycoproteomic studies.

Download Full-text

Bactopia: a flexible pipeline for complete analysis of bacterial genomes

10.1101/2020.02.28.969394 ◽

2020 ◽

Author(s):

Robert A. Petit ◽

Timothy D. Read

Keyword(s):

Standard Procedure ◽

Bacterial Species ◽

Bacterial Genome ◽

Complete Analysis ◽

Comparative Genomic ◽

Bacterial Genomes ◽

Analysis Pipeline ◽

Genomic Analyses ◽

Conserved Genes ◽

Downstream Analysis

AbstractSequencing of bacterial genomes using Illumina technology has become such a standard procedure that often data are generated faster than can be conveniently analyzed. We created a new series of pipelines called Bactopia, built using Nextflow workflow software, to provide efficient comparative genomic analyses for bacterial species or genera. Bactopia consists of a dataset setup step (Bactopia Datasets; BaDs) where a series of customizable datasets are created for the species of interest; the Bactopia Analysis Pipeline (BaAP), which performs quality control, genome assembly and several other functions based on the available datasets and outputs the processed data to a structured directory format; and a series of Bactopia Tools (BaTs) that perform specific post-processing on some or all of the processed data. BaTs include pan-genome analysis, computing average nucleotide identity between samples, extracting and profiling the 16S genes and taxonomic classification using highly conserved genes. It is expected that the number of BaTs will increase to fill specific applications in the future. As a demonstration, we performed an analysis of 1,664 public Lactobacillus genomes, focusing on L. crispatus, a species that is a common part of the human vaginal microbiome. Bactopia is an open source system that can scale from projects as small as one bacterial genome to thousands that allows for great flexibility in choosing comparison datasets and options for downstream analysis. Bactopia code can be accessed at https://www.github.com/bactopia/bactopia.

Download Full-text

LAF: Logic Alignment Free and its application to bacterial genomes classification

BioData Mining ◽

10.1186/s13040-015-0073-1 ◽

2015 ◽

Vol 8 (1) ◽

Cited By ~ 12

Author(s):

Emanuel Weitschek ◽

Fabio Cunial ◽

Giovanni Felici

Keyword(s):

Bacterial Genomes ◽

Alignment Free

Download Full-text

Usability Evaluation of Software Tools for Engineering Design

Proceedings of the Design Society: International Conference on Engineering Design ◽

10.1017/dsi.2019.136 ◽

2019 ◽

Vol 1 (1) ◽

pp. 1303-1312

Author(s):

Helena Hashemi Farzaneh ◽

Lorenz Neuner

Keyword(s):

Engineering Design ◽

Design Research ◽

Evaluation Method ◽

Software Tool ◽

Software Tools ◽

Usability Evaluation ◽

Smart Devices ◽

Design Software ◽

Support Engineering

AbstractMuch of the work in design research focusses on the development of methods and tools to support engineering designers. Many of these tools are nowadays implemented in software. Due to the strongly growing use of computers and smart devices in the last two decades, the expectations of users increased dramatically. In particular users expect good usability, for example little effort for learning to apply the software. Therefore, the usability evaluation of design software tools is crucial. A software tool with bad usability will not be used in industrial practice. Recommendations for usability evaluation of software often stem from the field of Human Computer Interaction. The aim of this paper is to tailor these general approaches to the specific needs of engineering design. In addition, we propose a method to analyse the results of the evaluation and to derive suggestions for improving the design software tool. We apply the usability evaluation method on a use case - the KoMBi software tool for bio-inspired design. The case study provides additional insights with regards to problem, causes and improvement categories.

Download Full-text