scholarly journals Bactopia: a flexible pipeline for complete analysis of bacterial genomes

2020 ◽  
Author(s):  
Robert A. Petit ◽  
Timothy D. Read

AbstractSequencing of bacterial genomes using Illumina technology has become such a standard procedure that often data are generated faster than can be conveniently analyzed. We created a new series of pipelines called Bactopia, built using Nextflow workflow software, to provide efficient comparative genomic analyses for bacterial species or genera. Bactopia consists of a dataset setup step (Bactopia Datasets; BaDs) where a series of customizable datasets are created for the species of interest; the Bactopia Analysis Pipeline (BaAP), which performs quality control, genome assembly and several other functions based on the available datasets and outputs the processed data to a structured directory format; and a series of Bactopia Tools (BaTs) that perform specific post-processing on some or all of the processed data. BaTs include pan-genome analysis, computing average nucleotide identity between samples, extracting and profiling the 16S genes and taxonomic classification using highly conserved genes. It is expected that the number of BaTs will increase to fill specific applications in the future. As a demonstration, we performed an analysis of 1,664 public Lactobacillus genomes, focusing on L. crispatus, a species that is a common part of the human vaginal microbiome. Bactopia is an open source system that can scale from projects as small as one bacterial genome to thousands that allows for great flexibility in choosing comparison datasets and options for downstream analysis. Bactopia code can be accessed at https://www.github.com/bactopia/bactopia.

mSystems ◽  
2020 ◽  
Vol 5 (4) ◽  
Author(s):  
Robert A. Petit ◽  
Timothy D. Read

ABSTRACT Sequencing of bacterial genomes using Illumina technology has become such a standard procedure that often data are generated faster than can be conveniently analyzed. We created a new series of pipelines called Bactopia, built using Nextflow workflow software, to provide efficient comparative genomic analyses for bacterial species or genera. Bactopia consists of a data set setup step (Bactopia Data Sets [BaDs]), which creates a series of customizable data sets for the species of interest, the Bactopia Analysis Pipeline (BaAP), which performs quality control, genome assembly, and several other functions based on the available data sets and outputs the processed data to a structured directory format, and a series of Bactopia Tools (BaTs) that perform specific postprocessing on some or all of the processed data. BaTs include pan-genome analysis, computing average nucleotide identity between samples, extracting and profiling the 16S genes, and taxonomic classification using highly conserved genes. It is expected that the number of BaTs will increase to fill specific applications in the future. As a demonstration, we performed an analysis of 1,664 public Lactobacillus genomes, focusing on Lactobacillus crispatus, a species that is a common part of the human vaginal microbiome. Bactopia is an open source system that can scale from projects as small as one bacterial genome to ones including thousands of genomes and that allows for great flexibility in choosing comparison data sets and options for downstream analysis. Bactopia code can be accessed at https://www.github.com/bactopia/bactopia. IMPORTANCE It is now relatively easy to obtain a high-quality draft genome sequence of a bacterium, but bioinformatic analysis requires organization and optimization of multiple open source software tools. We present Bactopia, a pipeline for bacterial genome analysis, as an option for processing bacterial genome data. Bactopia also automates downloading of data from multiple public sources and species-specific customization. Because the pipeline is written in the Nextflow language, analyses can be scaled from individual genomes on a local computer to thousands of genomes using cloud resources. As a usage example, we processed 1,664 Lactobacillus genomes from public sources and used comparative analysis workflows (Bactopia Tools) to identify and analyze members of the L. crispatus species.


mSystems ◽  
2020 ◽  
Vol 5 (1) ◽  
Author(s):  
Matthew R. Olm ◽  
Alexander Crits-Christoph ◽  
Spencer Diamond ◽  
Adi Lavy ◽  
Paula B. Matheus Carnevali ◽  
...  

ABSTRACT Longstanding questions relate to the existence of naturally distinct bacterial species and genetic approaches to distinguish them. Bacterial genomes in public databases form distinct groups, but these databases are subject to isolation and deposition biases. To avoid these biases, we compared 5,203 bacterial genomes from 1,457 environmental metagenomic samples to test for distinct clouds of diversity and evaluated metrics that could be used to define the species boundary. Bacterial genomes from the human gut, soil, and the ocean all exhibited gaps in whole-genome average nucleotide identities (ANI) near the previously suggested species threshold of 95% ANI. While genome-wide ratios of nonsynonymous and synonymous nucleotide differences (dN/dS) decrease until ANI values approach ∼98%, two methods for estimating homologous recombination approached zero at ∼95% ANI, supporting breakdown of recombination due to sequence divergence as a species-forming force. We evaluated 107 genome-based metrics for their ability to distinguish species when full genomes are not recovered. Full-length 16S rRNA genes were least useful, in part because they were underrecovered from metagenomes. However, many ribosomal proteins displayed both high metagenomic recoverability and species discrimination power. Taken together, our results verify the existence of sequence-discrete microbial species in metagenome-derived genomes and highlight the usefulness of ribosomal genes for gene-level species discrimination. IMPORTANCE There is controversy about whether bacterial diversity is clustered into distinct species groups or exists as a continuum. To address this issue, we analyzed bacterial genome databases and reports from several previous large-scale environment studies and identified clear discrete groups of species-level bacterial diversity in all cases. Genetic analysis further revealed that quasi-sexual reproduction via horizontal gene transfer is likely a key evolutionary force that maintains bacterial species integrity. We next benchmarked over 100 metrics to distinguish these bacterial species from each other and identified several genes encoding ribosomal proteins with high species discrimination power. Overall, the results from this study provide best practices for bacterial species delineation based on genome content and insight into the nature of bacterial species population genetics.


mSphere ◽  
2020 ◽  
Vol 5 (1) ◽  
Author(s):  
Michelle Spoto ◽  
Changhui Guan ◽  
Elizabeth Fleming ◽  
Julia Oh

ABSTRACT The CRISPR/Cas system has significant potential to facilitate gene editing in a variety of bacterial species. CRISPR interference (CRISPRi) and CRISPR activation (CRISPRa) represent modifications of the CRISPR/Cas9 system utilizing a catalytically inactive Cas9 protein for transcription repression and activation, respectively. While CRISPRi and CRISPRa have tremendous potential to systematically investigate gene function in bacteria, few programs are specifically tailored to identify guides in draft bacterial genomes genomewide. Furthermore, few programs offer open-source code with flexible design parameters for bacterial targeting. To address these limitations, we created GuideFinder, a customizable, user-friendly program that can design guides for any annotated bacterial genome. GuideFinder designs guides from NGG protospacer-adjacent motif (PAM) sites for any number of genes by the use of an annotated genome and FASTA file input by the user. Guides are filtered according to user-defined design parameters and removed if they contain any off-target matches. Iteration with lowered parameter thresholds allows the program to design guides for genes that did not produce guides with the more stringent parameters, one of several features unique to GuideFinder. GuideFinder can also identify paired guides for targeting multiplicity, whose validity we tested experimentally. GuideFinder has been tested on a variety of diverse bacterial genomes, finding guides for 95% of genes on average. Moreover, guides designed by the program are functionally useful—focusing on CRISPRi as a potential application—as demonstrated by essential gene knockdown in two staphylococcal species. Through the large-scale generation of guides, this open-access software will improve accessibility to CRISPR/Cas studies of a variety of bacterial species. IMPORTANCE With the explosion in our understanding of human and environmental microbial diversity, corresponding efforts to understand gene function in these organisms are strongly needed. CRISPR/Cas9 technology has revolutionized interrogation of gene function in a wide variety of model organisms. Efficient CRISPR guide design is required for systematic gene targeting. However, existing tools are not adapted for the broad needs of microbial targeting, which include extraordinary species and subspecies genetic diversity, the overwhelming majority of which is characterized by draft genomes. In addition, flexibility in guide design parameters is important to consider the wide range of factors that can affect guide efficacy, many of which can be species and strain specific. We designed GuideFinder, a customizable, user-friendly program that addresses the limitations of existing software and that can design guides for any annotated bacterial genome with numerous features that facilitate guide design in a wide variety of microorganisms.


2020 ◽  
Vol 8 (11) ◽  
pp. 1720
Author(s):  
Gabriele Andrea Lugli ◽  
Chiara Tarracchini ◽  
Giulia Alessandri ◽  
Christian Milani ◽  
Leonardo Mancabelli ◽  
...  

Members of the Bifidobacterium dentium species are usually identified in the oral cavity of humans and associated with the development of plaque and dental caries. Nevertheless, they have also been detected from fecal samples, highlighting a widespread distribution among mammals. To explore the genetic variability of this species, we isolated and sequenced the genomes of 18 different B. dentium strains collected from fecal samples of several primate species and an Ursus arctos. Thus, we investigated the genomic variability and metabolic abilities of the new B. dentium isolates together with 20 public genome sequences. Comparative genomic analyses provided insights into the vast metabolic repertoire of the species, highlighting 19 glycosyl hydrolases families shared between each analyzed strain. Phylogenetic analysis of the B. dentium taxon, involving 1140 conserved genes, revealed a very close phylogenetic relatedness among members of this species. Furthermore, low genomic variability between strains was also confirmed by an average nucleotide identity analysis showing values higher than 98.2%. Investigating the genetic features of each strain, few putative functional mobile elements were identified. Besides, a consistent occurrence of defense mechanisms such as CRISPR–Cas and restriction–modification systems may be responsible for the high genome synteny identified among members of this taxon.


mBio ◽  
2019 ◽  
Vol 10 (3) ◽  
Author(s):  
Kira S. Makarova ◽  
Yuri I. Wolf ◽  
Svetlana Karamycheva ◽  
Dapeng Zhang ◽  
L. Aravind ◽  
...  

ABSTRACTNumerous, diverse, highly variable defense and offense genetic systems are encoded in most bacterial genomes and are involved in various forms of conflict among competing microbes or their eukaryotic hosts. Here we focus on the offense and self-versus-nonself discrimination systems encoded by archaeal genomes that so far have remained largely uncharacterized and unannotated. Specifically, we analyze archaeal genomic loci encoding polymorphic and related toxin systems and ribosomally synthesized antimicrobial peptides. Using sensitive methods for sequence comparison and the “guilt by association” approach, we identified such systems in 141 archaeal genomes. These toxins can be classified into four major groups based on the structure of the components involved in the toxin delivery. The toxin domains are often shared between and within each system. We revisit halocin families and substantially expand the halocin C8 family, which was identified in diverse archaeal genomes and also certain bacteria. Finally, we employ features of protein sequences and genomic locus organization characteristic of archaeocins and polymorphic toxins to identify candidates for analogous but not necessarily homologous systems among uncharacterized protein families. This work confidently predicts that more than 1,600 archaeal proteins, currently annotated as “hypothetical” in public databases, are components of conflict and self-versus-nonself discrimination systems.IMPORTANCEDiverse and highly variable systems involved in biological conflicts and self-versus-nonself discrimination are ubiquitous in bacteria but much less studied in archaea. We performed comprehensive comparative genomic analyses of the archaeal systems that share components with analogous bacterial systems and propose an approach to identify new systems that could be involved in these functions. We predict polymorphic toxin systems in 141 archaeal genomes and identify new, archaea-specific toxin and immunity protein families. These systems are widely represented in archaea and are predicted to play major roles in interactions between species and in intermicrobial conflicts. This work is expected to stimulate experimental research to advance the understanding of poorly characterized major aspects of archaeal biology.


2006 ◽  
Vol 73 (3) ◽  
pp. 846-854 ◽  
Author(s):  
Nicholas H. Bergman ◽  
Karla D. Passalacqua ◽  
Philip C. Hanna ◽  
Zhaohui S. Qin

ABSTRACT Various computational approaches have been proposed for operon prediction, but most algorithms rely on experimental or functional data that are only available for a small subset of sequenced genomes. In this study, we explored the possibility of using phylogenetic information to aid in operon prediction, and we constructed a Bayesian hidden Markov model that incorporates comparative genomic data with traditional predictors, such as intergenic distances. The prediction algorithm performs as well as the best previously reported method, with several significant advantages. It uses fewer data sources and so it is easier to implement, and the method is more broadly applicable than previous methods—it can be applied to essentially every gene in any sequenced bacterial genome. Furthermore, we show that near-optimal performance is easily reached with a generic set of comparative genomes and does not depend on a specific relationship between the subject genome and the comparative set. We applied the algorithm to the Bacillus anthracis genome and found that it successfully predicted all previously verified B. anthracis operons. To further test its performance, we chose a predicted operon (BA1489-92) containing several genes with little apparent functional relatedness and tested their cotranscriptional nature. Experimental evidence shows that these genes are cotranscribed, and the data have interesting implications for B. anthracis biology. Overall, our findings show that this algorithm is capable of highly sensitive and accurate operon prediction in a wide range of bacterial genomes and that these predictions can lead to the rapid discovery of new functional relationships among genes.


2009 ◽  
Vol 76 (2) ◽  
pp. 589-595 ◽  
Author(s):  
Yanlin Zhao ◽  
Kui Wang ◽  
Hans-Wolfgang Ackermann ◽  
Rolf U. Halden ◽  
Nianzhi Jiao ◽  
...  

ABSTRACT Prophages are common in many bacterial genomes. Distinguishing putatively viable prophages from nonviable sequences can be a challenge, since some prophages are remnants of once-functional prophages that have been rendered inactive by mutational changes. In some cases, a putative prophage may be missed due to the lack of recognizable prophage loci. The genome of a marine roseobacter, Roseovarius nubinhibens ISM (hereinafter referred to as ISM), was recently sequenced and was reported to contain no intact prophage based on customary bioinformatic analysis. However, prophage induction experiments performed with this organism led to a different conclusion. In the laboratory, virus-like particles in the ISM culture increased more than 3 orders of magnitude following induction with mitomycin C. After careful examination of the ISM genome sequence, a putative prophage (ISM-pro1) was identified. Although this prophage contains only minimal phage-like genes, we demonstrated that this “hidden” prophage is inducible. Genomic analysis and reannotation showed that most of the ISM-pro1 open reading frames (ORFs) display the highest sequence similarity with Rhodobacterales bacterial genes and some ORFs are only distantly related to genes of other known phages or prophages. Comparative genomic analyses indicated that ISM-pro1-like prophages or prophage remnants are also present in other Rhodobacterales genomes. In addition, the lysis of ISM by this previously unrecognized prophage appeared to increase the production of gene transfer agents (GTAs). Our study suggests that a combination of in silico genomic analyses and experimental laboratory work is needed to fully understand the lysogenic features of a given bacterium.


2017 ◽  
Author(s):  
Lena M. Joesch-Cohen ◽  
Max Robinson ◽  
Neda Jabbari ◽  
Christopher Lausted ◽  
Gustavo Glusman

AbstractBackgroundBacterial genomes have characteristic compositional skews, which are differences in nucleotide frequency between the leading and lagging DNA strands across a segment of a genome. It is thought that these strand asymmetries arise as a result of mutational biases and selective constraints, particularly for energy efficiency. Analysis of compositional skews in a diverse set of bacteria provides a comparative context in which mutational and selective environmental constraints can be studied. These analyses typically require finished and well-annotated genomic sequences.ResultsWe present three novel metrics for examining genome composition skews; all three metrics can be computed for unfinished or partially-annotated genomes. The first two metrics, (dot-skew and cross-skew) depend on sequence and gene annotation of a single genome, while the third metric (residual skew) highlights unusual genomes by subtracting a GC content-based model of a library of genome sequences. We applied these metrics to all 7738 available bacterial genomes, including partial drafts, and identified outlier species. A number of these outliers (i.e., Borrelia, Ehrlichia, Kinetoplastibacterium, and Phytoplasma) display similar skew patterns despite only distant phylogenetic relationship. While unrelated, some of the outlier bacterial species share lifestyle characteristics, in particular intracellularity and biosynthetic dependence on their hosts.ConclusionsOur novel metrics appear to reflect the effects of biosynthetic constraints and adaptations to life within one or more hosts on genome composition. We provide results for each analyzed genome, software and interactive visualizations at http://db.systemsbiology.net/gestalt/skew_metrics.


2020 ◽  
Author(s):  
Reed Woyda ◽  
Adelumola Oladeinde ◽  
Zaid Abdo

AbstractSummaryThe bacterial resistome is the collection of all the antibiotic resistance genes, virulence genes, and other resistance elements within a bacterial isolate genome including plasmids and bacteriophage regions. Accurately characterizing the resistome is crucial for prevention and mitigation of emerging antibiotic resistance threats to animal and human health. Reads2Resistome is a tool which allows researchers to assemble and annotate bacterial genomes using long or short read sequencing technologies or both in a hybrid approach. Using a massively parallel analysis pipeline, Reads2Resistome performs assembly, annotation and resistome characterization with the goal of producing an accurate and comprehensive description of a bacterial genome and resistome contents. Key features of the Reads2Resistome pipeline include quality control of input sequencing reads, genome assembly, genome annotation, resistome characterization and alignment. All prerequisite dependencies come packaged together in a single suit which can easily be downloaded and run on Linux and Mac operating systems.AvailabilityReads2Resistome is freely available as an open-source package under the MIT license, and can be downloaded via GitHub (https://github.com/BioRRW/Reads2Resistome).


2021 ◽  
Vol 4 (3) ◽  
pp. 59
Author(s):  
Francesco Iannelli ◽  
Francesco Santoro ◽  
Valeria Fox ◽  
Gianni Pozzi

DNA sequencing of whole bacterial genomes has revealed that the entire set of mobile genes (mobilome) represents as much as 25% of the bacterial genome. Despite the huge availability of sequence data, the functional analysis of the mobile genetic elements (MGEs) is rarely reported. Therefore, established laboratory protocols are needed to investigate the biology of this important part of the bacterial genome. Conjugation is a mechanism of horizontal gene transfer which allows the exchange of MGEs among strains of the same or different bacterial species. In streptococci and enterococci, integrative and conjugative elements (ICEs) represent a large part of the mobilome. Here, we describe an efficient and easy-to-perform plate mating protocol for in vitro conjugative transfer of ICEs in streptococci (Streptococcus pneumoniae, Streptococcus agalactiae, Streptococcus gordonii, Streptococcus pyogenes), Enterococcus faecalis, and Bacillus subtilis. Conjugative transfer is carried out on solid media and selection of transconjugants is performed with a multilayer plating. This protocol allows the transfer of large genetic elements with a size up to 81 kb, and a transfer frequency up to 6.7 × 10−3 transconjugants/donor cells.


Sign in / Sign up

Export Citation Format

Share Document