Bactopia: a flexible pipeline for complete analysis of bacterial genomes

Mapping Intimacies ◽

10.1101/2020.02.28.969394 ◽

2020 ◽

Author(s):

Robert A. Petit ◽

Timothy D. Read

Keyword(s):

Standard Procedure ◽

Bacterial Species ◽

Bacterial Genome ◽

Complete Analysis ◽

Comparative Genomic ◽

Bacterial Genomes ◽

Analysis Pipeline ◽

Genomic Analyses ◽

Conserved Genes ◽

Downstream Analysis

AbstractSequencing of bacterial genomes using Illumina technology has become such a standard procedure that often data are generated faster than can be conveniently analyzed. We created a new series of pipelines called Bactopia, built using Nextflow workflow software, to provide efficient comparative genomic analyses for bacterial species or genera. Bactopia consists of a dataset setup step (Bactopia Datasets; BaDs) where a series of customizable datasets are created for the species of interest; the Bactopia Analysis Pipeline (BaAP), which performs quality control, genome assembly and several other functions based on the available datasets and outputs the processed data to a structured directory format; and a series of Bactopia Tools (BaTs) that perform specific post-processing on some or all of the processed data. BaTs include pan-genome analysis, computing average nucleotide identity between samples, extracting and profiling the 16S genes and taxonomic classification using highly conserved genes. It is expected that the number of BaTs will increase to fill specific applications in the future. As a demonstration, we performed an analysis of 1,664 public Lactobacillus genomes, focusing on L. crispatus, a species that is a common part of the human vaginal microbiome. Bactopia is an open source system that can scale from projects as small as one bacterial genome to thousands that allows for great flexibility in choosing comparison datasets and options for downstream analysis. Bactopia code can be accessed at https://www.github.com/bactopia/bactopia.

Download Full-text

Bactopia: a Flexible Pipeline for Complete Analysis of Bacterial Genomes

mSystems ◽

10.1128/msystems.00190-20 ◽

2020 ◽

Vol 5 (4) ◽

Author(s):

Robert A. Petit ◽

Timothy D. Read

Keyword(s):

Open Source ◽

Genome Analysis ◽

Bacterial Species ◽

Bacterial Genome ◽

Complete Analysis ◽

Comparative Genomic ◽

Data Sets ◽

Bacterial Genomes ◽

Data Set ◽

Content Type

ABSTRACT Sequencing of bacterial genomes using Illumina technology has become such a standard procedure that often data are generated faster than can be conveniently analyzed. We created a new series of pipelines called Bactopia, built using Nextflow workflow software, to provide efficient comparative genomic analyses for bacterial species or genera. Bactopia consists of a data set setup step (Bactopia Data Sets [BaDs]), which creates a series of customizable data sets for the species of interest, the Bactopia Analysis Pipeline (BaAP), which performs quality control, genome assembly, and several other functions based on the available data sets and outputs the processed data to a structured directory format, and a series of Bactopia Tools (BaTs) that perform specific postprocessing on some or all of the processed data. BaTs include pan-genome analysis, computing average nucleotide identity between samples, extracting and profiling the 16S genes, and taxonomic classification using highly conserved genes. It is expected that the number of BaTs will increase to fill specific applications in the future. As a demonstration, we performed an analysis of 1,664 public Lactobacillus genomes, focusing on Lactobacillus crispatus, a species that is a common part of the human vaginal microbiome. Bactopia is an open source system that can scale from projects as small as one bacterial genome to ones including thousands of genomes and that allows for great flexibility in choosing comparison data sets and options for downstream analysis. Bactopia code can be accessed at https://www.github.com/bactopia/bactopia. IMPORTANCE It is now relatively easy to obtain a high-quality draft genome sequence of a bacterium, but bioinformatic analysis requires organization and optimization of multiple open source software tools. We present Bactopia, a pipeline for bacterial genome analysis, as an option for processing bacterial genome data. Bactopia also automates downloading of data from multiple public sources and species-specific customization. Because the pipeline is written in the Nextflow language, analyses can be scaled from individual genomes on a local computer to thousands of genomes using cloud resources. As a usage example, we processed 1,664 Lactobacillus genomes from public sources and used comparative analysis workflows (Bactopia Tools) to identify and analyze members of the L. crispatus species.

Download Full-text

Consistent Metagenome-Derived Metrics Verify and Delineate Bacterial Species Boundaries

mSystems ◽

10.1128/msystems.00731-19 ◽

2020 ◽

Vol 5 (1) ◽

Cited By ~ 14

Author(s):

Matthew R. Olm ◽

Alexander Crits-Christoph ◽

Spencer Diamond ◽

Adi Lavy ◽

Paula B. Matheus Carnevali ◽

...

Keyword(s):

Bacterial Diversity ◽

Ribosomal Proteins ◽

Large Scale ◽

Bacterial Species ◽

Bacterial Genome ◽

16S Rrna Genes ◽

Rrna Genes ◽

Species Discrimination ◽

Bacterial Genomes ◽

Discrimination Power

ABSTRACT Longstanding questions relate to the existence of naturally distinct bacterial species and genetic approaches to distinguish them. Bacterial genomes in public databases form distinct groups, but these databases are subject to isolation and deposition biases. To avoid these biases, we compared 5,203 bacterial genomes from 1,457 environmental metagenomic samples to test for distinct clouds of diversity and evaluated metrics that could be used to define the species boundary. Bacterial genomes from the human gut, soil, and the ocean all exhibited gaps in whole-genome average nucleotide identities (ANI) near the previously suggested species threshold of 95% ANI. While genome-wide ratios of nonsynonymous and synonymous nucleotide differences (dN/dS) decrease until ANI values approach ∼98%, two methods for estimating homologous recombination approached zero at ∼95% ANI, supporting breakdown of recombination due to sequence divergence as a species-forming force. We evaluated 107 genome-based metrics for their ability to distinguish species when full genomes are not recovered. Full-length 16S rRNA genes were least useful, in part because they were underrecovered from metagenomes. However, many ribosomal proteins displayed both high metagenomic recoverability and species discrimination power. Taken together, our results verify the existence of sequence-discrete microbial species in metagenome-derived genomes and highlight the usefulness of ribosomal genes for gene-level species discrimination. IMPORTANCE There is controversy about whether bacterial diversity is clustered into distinct species groups or exists as a continuum. To address this issue, we analyzed bacterial genome databases and reports from several previous large-scale environment studies and identified clear discrete groups of species-level bacterial diversity in all cases. Genetic analysis further revealed that quasi-sexual reproduction via horizontal gene transfer is likely a key evolutionary force that maintains bacterial species integrity. We next benchmarked over 100 metrics to distinguish these bacterial species from each other and identified several genes encoding ribosomal proteins with high species discrimination power. Overall, the results from this study provide best practices for bacterial species delineation based on genome content and insight into the nature of bacterial species population genetics.

Download Full-text

A Universal, Genomewide GuideFinder for CRISPR/Cas9 Targeting in Microbial Genomes

mSphere ◽

10.1128/msphere.00086-20 ◽

2020 ◽

Vol 5 (1) ◽

Author(s):

Michelle Spoto ◽

Changhui Guan ◽

Elizabeth Fleming ◽

Julia Oh

Keyword(s):

Gene Function ◽

Large Scale ◽

Essential Gene ◽

Bacterial Species ◽

Bacterial Genome ◽

Model Organisms ◽

Design Parameters ◽

Bacterial Genomes ◽

Wide Range ◽

User Friendly

ABSTRACT The CRISPR/Cas system has significant potential to facilitate gene editing in a variety of bacterial species. CRISPR interference (CRISPRi) and CRISPR activation (CRISPRa) represent modifications of the CRISPR/Cas9 system utilizing a catalytically inactive Cas9 protein for transcription repression and activation, respectively. While CRISPRi and CRISPRa have tremendous potential to systematically investigate gene function in bacteria, few programs are specifically tailored to identify guides in draft bacterial genomes genomewide. Furthermore, few programs offer open-source code with flexible design parameters for bacterial targeting. To address these limitations, we created GuideFinder, a customizable, user-friendly program that can design guides for any annotated bacterial genome. GuideFinder designs guides from NGG protospacer-adjacent motif (PAM) sites for any number of genes by the use of an annotated genome and FASTA file input by the user. Guides are filtered according to user-defined design parameters and removed if they contain any off-target matches. Iteration with lowered parameter thresholds allows the program to design guides for genes that did not produce guides with the more stringent parameters, one of several features unique to GuideFinder. GuideFinder can also identify paired guides for targeting multiplicity, whose validity we tested experimentally. GuideFinder has been tested on a variety of diverse bacterial genomes, finding guides for 95% of genes on average. Moreover, guides designed by the program are functionally useful—focusing on CRISPRi as a potential application—as demonstrated by essential gene knockdown in two staphylococcal species. Through the large-scale generation of guides, this open-access software will improve accessibility to CRISPR/Cas studies of a variety of bacterial species. IMPORTANCE With the explosion in our understanding of human and environmental microbial diversity, corresponding efforts to understand gene function in these organisms are strongly needed. CRISPR/Cas9 technology has revolutionized interrogation of gene function in a wide variety of model organisms. Efficient CRISPR guide design is required for systematic gene targeting. However, existing tools are not adapted for the broad needs of microbial targeting, which include extraordinary species and subspecies genetic diversity, the overwhelming majority of which is characterized by draft genomes. In addition, flexibility in guide design parameters is important to consider the wide range of factors that can affect guide efficacy, many of which can be species and strain specific. We designed GuideFinder, a customizable, user-friendly program that addresses the limitations of existing software and that can design guides for any annotated bacterial genome with numerous features that facilitate guide design in a wide variety of microorganisms.

Download Full-text

Decoding the Genomic Variability among Members of the Bifidobacterium dentium Species

Microorganisms ◽

10.3390/microorganisms8111720 ◽

2020 ◽

Vol 8 (11) ◽

pp. 1720

Author(s):

Gabriele Andrea Lugli ◽

Chiara Tarracchini ◽

Giulia Alessandri ◽

Christian Milani ◽

Leonardo Mancabelli ◽

...

Keyword(s):

Defense Mechanisms ◽

Ursus Arctos ◽

Primate Species ◽

Comparative Genomic ◽

Glycosyl Hydrolases ◽

Genomic Variability ◽

Fecal Samples ◽

Genomic Analyses ◽

Conserved Genes ◽

Genetic Features

Members of the Bifidobacterium dentium species are usually identified in the oral cavity of humans and associated with the development of plaque and dental caries. Nevertheless, they have also been detected from fecal samples, highlighting a widespread distribution among mammals. To explore the genetic variability of this species, we isolated and sequenced the genomes of 18 different B. dentium strains collected from fecal samples of several primate species and an Ursus arctos. Thus, we investigated the genomic variability and metabolic abilities of the new B. dentium isolates together with 20 public genome sequences. Comparative genomic analyses provided insights into the vast metabolic repertoire of the species, highlighting 19 glycosyl hydrolases families shared between each analyzed strain. Phylogenetic analysis of the B. dentium taxon, involving 1140 conserved genes, revealed a very close phylogenetic relatedness among members of this species. Furthermore, low genomic variability between strains was also confirmed by an average nucleotide identity analysis showing values higher than 98.2%. Investigating the genetic features of each strain, few putative functional mobile elements were identified. Besides, a consistent occurrence of defense mechanisms such as CRISPR–Cas and restriction–modification systems may be responsible for the high genome synteny identified among members of this taxon.

Download Full-text

Antimicrobial Peptides, Polymorphic Toxins, and Self-Nonself Recognition Systems in Archaea: an Untapped Armory for Intermicrobial Conflicts

mBio ◽

10.1128/mbio.00715-19 ◽

2019 ◽

Vol 10 (3) ◽

Cited By ~ 13

Author(s):

Kira S. Makarova ◽

Yuri I. Wolf ◽

Svetlana Karamycheva ◽

Dapeng Zhang ◽

L. Aravind ◽

...

Keyword(s):

Antimicrobial Peptides ◽

Comparative Genomic ◽

Protein Families ◽

Bacterial Genomes ◽

Uncharacterized Protein ◽

Genomic Locus ◽

Genomic Analyses ◽

Nonself Recognition ◽

Guilt By Association ◽

Recognition Systems

ABSTRACTNumerous, diverse, highly variable defense and offense genetic systems are encoded in most bacterial genomes and are involved in various forms of conflict among competing microbes or their eukaryotic hosts. Here we focus on the offense and self-versus-nonself discrimination systems encoded by archaeal genomes that so far have remained largely uncharacterized and unannotated. Specifically, we analyze archaeal genomic loci encoding polymorphic and related toxin systems and ribosomally synthesized antimicrobial peptides. Using sensitive methods for sequence comparison and the “guilt by association” approach, we identified such systems in 141 archaeal genomes. These toxins can be classified into four major groups based on the structure of the components involved in the toxin delivery. The toxin domains are often shared between and within each system. We revisit halocin families and substantially expand the halocin C8 family, which was identified in diverse archaeal genomes and also certain bacteria. Finally, we employ features of protein sequences and genomic locus organization characteristic of archaeocins and polymorphic toxins to identify candidates for analogous but not necessarily homologous systems among uncharacterized protein families. This work confidently predicts that more than 1,600 archaeal proteins, currently annotated as “hypothetical” in public databases, are components of conflict and self-versus-nonself discrimination systems.IMPORTANCEDiverse and highly variable systems involved in biological conflicts and self-versus-nonself discrimination are ubiquitous in bacteria but much less studied in archaea. We performed comprehensive comparative genomic analyses of the archaeal systems that share components with analogous bacterial systems and propose an approach to identify new systems that could be involved in these functions. We predict polymorphic toxin systems in 141 archaeal genomes and identify new, archaea-specific toxin and immunity protein families. These systems are widely represented in archaea and are predicted to play major roles in interactions between species and in intermicrobial conflicts. This work is expected to stimulate experimental research to advance the understanding of poorly characterized major aspects of archaeal biology.

Download Full-text

Operon Prediction for Sequenced Bacterial Genomes without Experimental Information

Applied and Environmental Microbiology ◽

10.1128/aem.01686-06 ◽

2006 ◽

Vol 73 (3) ◽

pp. 846-854 ◽

Cited By ~ 25

Author(s):

Nicholas H. Bergman ◽

Karla D. Passalacqua ◽

Philip C. Hanna ◽

Zhaohui S. Qin

Keyword(s):

Bacterial Genome ◽

Experimental Information ◽

Prediction Algorithm ◽

Comparative Genomic ◽

Small Subset ◽

Bacterial Genomes ◽

Functional Relationships ◽

Operon Prediction ◽

Wide Range ◽

Generic Set

ABSTRACT Various computational approaches have been proposed for operon prediction, but most algorithms rely on experimental or functional data that are only available for a small subset of sequenced genomes. In this study, we explored the possibility of using phylogenetic information to aid in operon prediction, and we constructed a Bayesian hidden Markov model that incorporates comparative genomic data with traditional predictors, such as intergenic distances. The prediction algorithm performs as well as the best previously reported method, with several significant advantages. It uses fewer data sources and so it is easier to implement, and the method is more broadly applicable than previous methods—it can be applied to essentially every gene in any sequenced bacterial genome. Furthermore, we show that near-optimal performance is easily reached with a generic set of comparative genomes and does not depend on a specific relationship between the subject genome and the comparative set. We applied the algorithm to the Bacillus anthracis genome and found that it successfully predicted all previously verified B. anthracis operons. To further test its performance, we chose a predicted operon (BA1489-92) containing several genes with little apparent functional relatedness and tested their cotranscriptional nature. Experimental evidence shows that these genes are cotranscribed, and the data have interesting implications for B. anthracis biology. Overall, our findings show that this algorithm is capable of highly sensitive and accurate operon prediction in a wide range of bacterial genomes and that these predictions can lead to the rapid discovery of new functional relationships among genes.

Download Full-text

Searching for a “Hidden” Prophage in a Marine Bacterium

Applied and Environmental Microbiology ◽

10.1128/aem.01450-09 ◽

2009 ◽

Vol 76 (2) ◽

pp. 589-595 ◽

Cited By ~ 20

Author(s):

Yanlin Zhao ◽

Kui Wang ◽

Hans-Wolfgang Ackermann ◽

Rolf U. Halden ◽

Nianzhi Jiao ◽

...

Keyword(s):

Sequence Similarity ◽

Genomic Analysis ◽

Bioinformatic Analysis ◽

Open Reading Frames ◽

Careful Examination ◽

Comparative Genomic ◽

Bacterial Genomes ◽

Genomic Analyses ◽

Experimental Laboratory ◽

Bacterial Genes

ABSTRACT Prophages are common in many bacterial genomes. Distinguishing putatively viable prophages from nonviable sequences can be a challenge, since some prophages are remnants of once-functional prophages that have been rendered inactive by mutational changes. In some cases, a putative prophage may be missed due to the lack of recognizable prophage loci. The genome of a marine roseobacter, Roseovarius nubinhibens ISM (hereinafter referred to as ISM), was recently sequenced and was reported to contain no intact prophage based on customary bioinformatic analysis. However, prophage induction experiments performed with this organism led to a different conclusion. In the laboratory, virus-like particles in the ISM culture increased more than 3 orders of magnitude following induction with mitomycin C. After careful examination of the ISM genome sequence, a putative prophage (ISM-pro1) was identified. Although this prophage contains only minimal phage-like genes, we demonstrated that this “hidden” prophage is inducible. Genomic analysis and reannotation showed that most of the ISM-pro1 open reading frames (ORFs) display the highest sequence similarity with Rhodobacterales bacterial genes and some ORFs are only distantly related to genes of other known phages or prophages. Comparative genomic analyses indicated that ISM-pro1-like prophages or prophage remnants are also present in other Rhodobacterales genomes. In addition, the lysis of ISM by this previously unrecognized prophage appeared to increase the production of gene transfer agents (GTAs). Our study suggests that a combination of in silico genomic analyses and experimental laboratory work is needed to fully understand the lysogenic features of a given bacterium.

Download Full-text

Novel metrics for quantifying bacterial genome composition skews

10.1101/176370 ◽

2017 ◽

Author(s):

Lena M. Joesch-Cohen ◽

Max Robinson ◽

Neda Jabbari ◽

Christopher Lausted ◽

Gustavo Glusman

Keyword(s):

Gene Annotation ◽

Bacterial Species ◽

Bacterial Genome ◽

Gc Content ◽

Bacterial Genomes ◽

Genome Composition ◽

Single Genome ◽

A Genome ◽

Dna Strands ◽

Interactive Visualizations

AbstractBackgroundBacterial genomes have characteristic compositional skews, which are differences in nucleotide frequency between the leading and lagging DNA strands across a segment of a genome. It is thought that these strand asymmetries arise as a result of mutational biases and selective constraints, particularly for energy efficiency. Analysis of compositional skews in a diverse set of bacteria provides a comparative context in which mutational and selective environmental constraints can be studied. These analyses typically require finished and well-annotated genomic sequences.ResultsWe present three novel metrics for examining genome composition skews; all three metrics can be computed for unfinished or partially-annotated genomes. The first two metrics, (dot-skew and cross-skew) depend on sequence and gene annotation of a single genome, while the third metric (residual skew) highlights unusual genomes by subtracting a GC content-based model of a library of genome sequences. We applied these metrics to all 7738 available bacterial genomes, including partial drafts, and identified outlier species. A number of these outliers (i.e., Borrelia, Ehrlichia, Kinetoplastibacterium, and Phytoplasma) display similar skew patterns despite only distant phylogenetic relationship. While unrelated, some of the outlier bacterial species share lifestyle characteristics, in particular intracellularity and biosynthetic dependence on their hosts.ConclusionsOur novel metrics appear to reflect the effects of biosynthetic constraints and adaptations to life within one or more hosts on genome composition. We provide results for each analyzed genome, software and interactive visualizations at http://db.systemsbiology.net/gestalt/skew_metrics.

Download Full-text

Reads2Resistome: An adaptable and high-throughput whole-genome sequencing pipeline for bacterial resistome characterization

10.1101/2020.05.18.102715 ◽

2020 ◽

Author(s):

Reed Woyda ◽

Adelumola Oladeinde ◽

Zaid Abdo

Keyword(s):

Antibiotic Resistance ◽

Bacterial Isolate ◽

Bacterial Genome ◽

Hybrid Approach ◽

Antibiotic Resistance Genes ◽

Bacterial Genomes ◽

Analysis Pipeline ◽

Comprehensive Description ◽

Short Read Sequencing ◽

Sequencing Technologies

AbstractSummaryThe bacterial resistome is the collection of all the antibiotic resistance genes, virulence genes, and other resistance elements within a bacterial isolate genome including plasmids and bacteriophage regions. Accurately characterizing the resistome is crucial for prevention and mitigation of emerging antibiotic resistance threats to animal and human health. Reads2Resistome is a tool which allows researchers to assemble and annotate bacterial genomes using long or short read sequencing technologies or both in a hybrid approach. Using a massively parallel analysis pipeline, Reads2Resistome performs assembly, annotation and resistome characterization with the goal of producing an accurate and comprehensive description of a bacterial genome and resistome contents. Key features of the Reads2Resistome pipeline include quality control of input sequencing reads, genome assembly, genome annotation, resistome characterization and alignment. All prerequisite dependencies come packaged together in a single suit which can easily be downloaded and run on Linux and Mac operating systems.AvailabilityReads2Resistome is freely available as an open-source package under the MIT license, and can be downloaded via GitHub (https://github.com/BioRRW/Reads2Resistome).

Download Full-text

A Mating Procedure for Genetic Transfer of Integrative and Conjugative Elements (ICEs) of Streptococci and Enterococci

Methods and Protocols ◽

10.3390/mps4030059 ◽

2021 ◽

Vol 4 (3) ◽

pp. 59

Author(s):

Francesco Iannelli ◽

Francesco Santoro ◽

Valeria Fox ◽

Gianni Pozzi

Keyword(s):

Sequence Data ◽

Bacterial Species ◽

Bacterial Genome ◽

Conjugative Transfer ◽

Bacterial Genomes ◽

Genetic Elements ◽

Integrative And Conjugative Elements ◽

Solid Media ◽

Donor Cells

DNA sequencing of whole bacterial genomes has revealed that the entire set of mobile genes (mobilome) represents as much as 25% of the bacterial genome. Despite the huge availability of sequence data, the functional analysis of the mobile genetic elements (MGEs) is rarely reported. Therefore, established laboratory protocols are needed to investigate the biology of this important part of the bacterial genome. Conjugation is a mechanism of horizontal gene transfer which allows the exchange of MGEs among strains of the same or different bacterial species. In streptococci and enterococci, integrative and conjugative elements (ICEs) represent a large part of the mobilome. Here, we describe an efficient and easy-to-perform plate mating protocol for in vitro conjugative transfer of ICEs in streptococci (Streptococcus pneumoniae, Streptococcus agalactiae, Streptococcus gordonii, Streptococcus pyogenes), Enterococcus faecalis, and Bacillus subtilis. Conjugative transfer is carried out on solid media and selection of transconjugants is performed with a multilayer plating. This protocol allows the transfer of large genetic elements with a size up to 81 kb, and a transfer frequency up to 6.7 × 10−3 transconjugants/donor cells.

Download Full-text