Genome-Wide Analyses in Bacteria Show Small-RNA Enrichment for Long and Conserved Intergenic Regions

Interest in finding small RNAs (sRNAs) in bacteria has significantly increased in recent years due to their regulatory functions. Development of high-throughput methods and more sophisticated computational algorithms has allowed rapid identification of sRNA candidates in different species. However, given their various sizes (50 to 500 nucleotides [nt]) and their potential genomic locations in the 5′ and 3′ untranslated regions as well as in intergenic regions, identification and validation of true sRNAs have been challenging. In addition, the evolution of bacterial sRNAs across different species continues to be puzzling, given that they can exert similar functions with various sequences and structures. In this study, we analyzed the enrichment patterns of sRNAs in 13 well-annotated bacterial species using existing transcriptome and experimental data. All intergenic regions were analyzed by WU-BLAST to examine conservation levels relative to species within or outside their genus. In total, more than 900 validated bacterial sRNAs and 23,000 intergenic regions were analyzed. The results indicate that sRNAs are enriched in intergenic regions, which are longer and more conserved than the average intergenic regions in the corresponding bacterial genome. We also found that sRNA-coding regions have different conservation levels relative to their flanking regions. This work provides a way to analyze how noncoding RNAs are distributed in bacterial genomes and also shows conserved features of intergenic regions that encode sRNAs. These results also provide insight into the functions of regions surrounding sRNAs and into optimization of RNA search algorithms.

Download Full-text

A universal, genome-wide guide finder for CRISPR/Cas9 targeting in microbial genomes

10.1101/194241 ◽

2017 ◽

Author(s):

Michelle Spoto ◽

Elizabeth Fleming ◽

Julia Oh

Keyword(s):

Large Scale ◽

Essential Gene ◽

Bacterial Species ◽

Bacterial Genome ◽

Design Parameters ◽

Bacterial Genomes ◽

Microbial Genomes ◽

Genome Wide ◽

Cas9 Protein ◽

User Friendly

AbstractBackgroundThe CRISPR/Cas system has significant potential to facilitate gene editing in a variety of bacterial species. CRISPR interference (CRISPRi) and CRISPR activation (CRISPRa) represent modifications of the CRISPR/Cas9 system utilizing a catalytically inactive Cas9 protein for transcription repression or activation, respectively. While CRISPRi and CRISPRa have tremendous potential to systematically investigate gene function in bacteria, no pan-bacterial, genome-wide tools exist for guide discovery. We have created Guide Finder: a customizable, user-friendly program that can design guides for any annotated bacterial genome.ResultsGuide Finder designs guides from NGG PAM sites for any number of genes using an annotated genome and fasta file input by the user. Guides are filtered according to user-defined design parameters and removed if they contain any off-target matches. Iteration with lowered parameter thresholds allows the program to design guides for genes that did not produce guides with the more stringent parameters, a feature unique to Guide Finder. Guide Finder has been tested on a variety of diverse bacterial genomes, on average finding guides for 95% of genes. Moreover, guides designed by the program are functionally useful—focusing on CRISPRi as a potential application—as demonstrated by essential gene knockdown in two staphylococcal species.ConclusionsThrough the large-scale generation of guides, this open-access software will improve accessibility to CRISPR/Cas studies for a variety of bacterial species.

Download Full-text

Consistent Metagenome-Derived Metrics Verify and Delineate Bacterial Species Boundaries

mSystems ◽

10.1128/msystems.00731-19 ◽

2020 ◽

Vol 5 (1) ◽

Cited By ~ 14

Author(s):

Matthew R. Olm ◽

Alexander Crits-Christoph ◽

Spencer Diamond ◽

Adi Lavy ◽

Paula B. Matheus Carnevali ◽

...

Keyword(s):

Bacterial Diversity ◽

Ribosomal Proteins ◽

Large Scale ◽

Bacterial Species ◽

Bacterial Genome ◽

16S Rrna Genes ◽

Rrna Genes ◽

Species Discrimination ◽

Bacterial Genomes ◽

Discrimination Power

ABSTRACT Longstanding questions relate to the existence of naturally distinct bacterial species and genetic approaches to distinguish them. Bacterial genomes in public databases form distinct groups, but these databases are subject to isolation and deposition biases. To avoid these biases, we compared 5,203 bacterial genomes from 1,457 environmental metagenomic samples to test for distinct clouds of diversity and evaluated metrics that could be used to define the species boundary. Bacterial genomes from the human gut, soil, and the ocean all exhibited gaps in whole-genome average nucleotide identities (ANI) near the previously suggested species threshold of 95% ANI. While genome-wide ratios of nonsynonymous and synonymous nucleotide differences (dN/dS) decrease until ANI values approach ∼98%, two methods for estimating homologous recombination approached zero at ∼95% ANI, supporting breakdown of recombination due to sequence divergence as a species-forming force. We evaluated 107 genome-based metrics for their ability to distinguish species when full genomes are not recovered. Full-length 16S rRNA genes were least useful, in part because they were underrecovered from metagenomes. However, many ribosomal proteins displayed both high metagenomic recoverability and species discrimination power. Taken together, our results verify the existence of sequence-discrete microbial species in metagenome-derived genomes and highlight the usefulness of ribosomal genes for gene-level species discrimination. IMPORTANCE There is controversy about whether bacterial diversity is clustered into distinct species groups or exists as a continuum. To address this issue, we analyzed bacterial genome databases and reports from several previous large-scale environment studies and identified clear discrete groups of species-level bacterial diversity in all cases. Genetic analysis further revealed that quasi-sexual reproduction via horizontal gene transfer is likely a key evolutionary force that maintains bacterial species integrity. We next benchmarked over 100 metrics to distinguish these bacterial species from each other and identified several genes encoding ribosomal proteins with high species discrimination power. Overall, the results from this study provide best practices for bacterial species delineation based on genome content and insight into the nature of bacterial species population genetics.

Download Full-text

Bactopia: a flexible pipeline for complete analysis of bacterial genomes

10.1101/2020.02.28.969394 ◽

2020 ◽

Author(s):

Robert A. Petit ◽

Timothy D. Read

Keyword(s):

Standard Procedure ◽

Bacterial Species ◽

Bacterial Genome ◽

Complete Analysis ◽

Comparative Genomic ◽

Bacterial Genomes ◽

Analysis Pipeline ◽

Genomic Analyses ◽

Conserved Genes ◽

Downstream Analysis

AbstractSequencing of bacterial genomes using Illumina technology has become such a standard procedure that often data are generated faster than can be conveniently analyzed. We created a new series of pipelines called Bactopia, built using Nextflow workflow software, to provide efficient comparative genomic analyses for bacterial species or genera. Bactopia consists of a dataset setup step (Bactopia Datasets; BaDs) where a series of customizable datasets are created for the species of interest; the Bactopia Analysis Pipeline (BaAP), which performs quality control, genome assembly and several other functions based on the available datasets and outputs the processed data to a structured directory format; and a series of Bactopia Tools (BaTs) that perform specific post-processing on some or all of the processed data. BaTs include pan-genome analysis, computing average nucleotide identity between samples, extracting and profiling the 16S genes and taxonomic classification using highly conserved genes. It is expected that the number of BaTs will increase to fill specific applications in the future. As a demonstration, we performed an analysis of 1,664 public Lactobacillus genomes, focusing on L. crispatus, a species that is a common part of the human vaginal microbiome. Bactopia is an open source system that can scale from projects as small as one bacterial genome to thousands that allows for great flexibility in choosing comparison datasets and options for downstream analysis. Bactopia code can be accessed at https://www.github.com/bactopia/bactopia.

Download Full-text

A Universal, Genomewide GuideFinder for CRISPR/Cas9 Targeting in Microbial Genomes

mSphere ◽

10.1128/msphere.00086-20 ◽

2020 ◽

Vol 5 (1) ◽

Author(s):

Michelle Spoto ◽

Changhui Guan ◽

Elizabeth Fleming ◽

Julia Oh

Keyword(s):

Gene Function ◽

Large Scale ◽

Essential Gene ◽

Bacterial Species ◽

Bacterial Genome ◽

Model Organisms ◽

Design Parameters ◽

Bacterial Genomes ◽

Wide Range ◽

User Friendly

ABSTRACT The CRISPR/Cas system has significant potential to facilitate gene editing in a variety of bacterial species. CRISPR interference (CRISPRi) and CRISPR activation (CRISPRa) represent modifications of the CRISPR/Cas9 system utilizing a catalytically inactive Cas9 protein for transcription repression and activation, respectively. While CRISPRi and CRISPRa have tremendous potential to systematically investigate gene function in bacteria, few programs are specifically tailored to identify guides in draft bacterial genomes genomewide. Furthermore, few programs offer open-source code with flexible design parameters for bacterial targeting. To address these limitations, we created GuideFinder, a customizable, user-friendly program that can design guides for any annotated bacterial genome. GuideFinder designs guides from NGG protospacer-adjacent motif (PAM) sites for any number of genes by the use of an annotated genome and FASTA file input by the user. Guides are filtered according to user-defined design parameters and removed if they contain any off-target matches. Iteration with lowered parameter thresholds allows the program to design guides for genes that did not produce guides with the more stringent parameters, one of several features unique to GuideFinder. GuideFinder can also identify paired guides for targeting multiplicity, whose validity we tested experimentally. GuideFinder has been tested on a variety of diverse bacterial genomes, finding guides for 95% of genes on average. Moreover, guides designed by the program are functionally useful—focusing on CRISPRi as a potential application—as demonstrated by essential gene knockdown in two staphylococcal species. Through the large-scale generation of guides, this open-access software will improve accessibility to CRISPR/Cas studies of a variety of bacterial species. IMPORTANCE With the explosion in our understanding of human and environmental microbial diversity, corresponding efforts to understand gene function in these organisms are strongly needed. CRISPR/Cas9 technology has revolutionized interrogation of gene function in a wide variety of model organisms. Efficient CRISPR guide design is required for systematic gene targeting. However, existing tools are not adapted for the broad needs of microbial targeting, which include extraordinary species and subspecies genetic diversity, the overwhelming majority of which is characterized by draft genomes. In addition, flexibility in guide design parameters is important to consider the wide range of factors that can affect guide efficacy, many of which can be species and strain specific. We designed GuideFinder, a customizable, user-friendly program that addresses the limitations of existing software and that can design guides for any annotated bacterial genome with numerous features that facilitate guide design in a wide variety of microorganisms.

Download Full-text

Genome-wide DNA methylation profiles of porcine ovaries in estrus and proestrus

Physiological Genomics ◽

10.1152/physiolgenomics.00052.2017 ◽

2018 ◽

Vol 50 (9) ◽

pp. 714-723 ◽

Cited By ~ 1

Author(s):

Xiaolong Zhou ◽

Songbai Yang ◽

Feifei Yan ◽

Ke He ◽

Ayong Zhao

Keyword(s):

Dna Methylation ◽

Epigenetic Modification ◽

Cpg Islands ◽

Biological Processes ◽

Hormone Regulation ◽

Methylated Dna ◽

Coding Regions ◽

Genome Wide ◽

Differentially Methylated Genes ◽

Flanking Regions

DNA methylation is an important epigenetic modification involved in the estrous cycle and the regulation of reproduction. Here, we investigated the genome-wide profiles of DNA methylation in porcine ovaries in proestrus and estrus using methylated DNA immunoprecipitation sequencing. The results showed that DNA methylation was enriched in intergenic and intron regions. The methylation levels of coding regions were higher than those of the 5′- and 3′-flanking regions of genes. There were 4,813 differentially methylated regions (DMRs) of CpG islands in the estrus vs. proestrus ovarian genomes. Additionally, 3,651 differentially methylated genes (DMGs) were identified in pigs in estrus and proestrus. The DMGs were significantly enriched in biological processes and pathways related to reproduction and hormone regulation. We identified 90 DMGs associated with regulating reproduction in pigs. Our findings can serve as resources for DNA methylome research focused on porcine ovaries and further our understanding of epigenetically regulated reproduction in mammals.

Download Full-text

Novel metrics for quantifying bacterial genome composition skews

10.1101/176370 ◽

2017 ◽

Author(s):

Lena M. Joesch-Cohen ◽

Max Robinson ◽

Neda Jabbari ◽

Christopher Lausted ◽

Gustavo Glusman

Keyword(s):

Gene Annotation ◽

Bacterial Species ◽

Bacterial Genome ◽

Gc Content ◽

Bacterial Genomes ◽

Genome Composition ◽

Single Genome ◽

A Genome ◽

Dna Strands ◽

Interactive Visualizations

AbstractBackgroundBacterial genomes have characteristic compositional skews, which are differences in nucleotide frequency between the leading and lagging DNA strands across a segment of a genome. It is thought that these strand asymmetries arise as a result of mutational biases and selective constraints, particularly for energy efficiency. Analysis of compositional skews in a diverse set of bacteria provides a comparative context in which mutational and selective environmental constraints can be studied. These analyses typically require finished and well-annotated genomic sequences.ResultsWe present three novel metrics for examining genome composition skews; all three metrics can be computed for unfinished or partially-annotated genomes. The first two metrics, (dot-skew and cross-skew) depend on sequence and gene annotation of a single genome, while the third metric (residual skew) highlights unusual genomes by subtracting a GC content-based model of a library of genome sequences. We applied these metrics to all 7738 available bacterial genomes, including partial drafts, and identified outlier species. A number of these outliers (i.e., Borrelia, Ehrlichia, Kinetoplastibacterium, and Phytoplasma) display similar skew patterns despite only distant phylogenetic relationship. While unrelated, some of the outlier bacterial species share lifestyle characteristics, in particular intracellularity and biosynthetic dependence on their hosts.ConclusionsOur novel metrics appear to reflect the effects of biosynthetic constraints and adaptations to life within one or more hosts on genome composition. We provide results for each analyzed genome, software and interactive visualizations at http://db.systemsbiology.net/gestalt/skew_metrics.

Download Full-text

Lineage-Specific Distribution of Insertion Sequence Excision Enhancer in Enterotoxigenic Escherichia coli Isolated from Swine

Applied and Environmental Microbiology ◽

10.1128/aem.03696-13 ◽

2013 ◽

Vol 80 (4) ◽

pp. 1394-1402 ◽

Cited By ~ 7

Author(s):

Masahiro Kusumoto ◽

Dai Fukamizu ◽

Yoshitoshi Ogura ◽

Eiji Yoshida ◽

Fumiko Yamamoto ◽

...

Keyword(s):

Escherichia Coli ◽

Bacterial Species ◽

Bacterial Genome ◽

Enterotoxigenic Escherichia Coli ◽

Content Type ◽

E Coli ◽

Genomic Deletions ◽

Specific Distribution ◽

Multiple Copies ◽

Genomic Locations

ABSTRACTInsertion sequences (ISs) are the simplest transposable elements and are widely distributed in bacteria; however, they also play important roles in genome evolution. We recently identified a protein called IS excision enhancer (IEE) in enterohemorrhagicEscherichia coli(EHEC) O157. IEE promotes the excision of IS elements belonging to the IS3family, such as IS629, as well as several other families. IEE-mediated IS excision generates various genomic deletions that lead to the diversification of the bacterial genome. IEE has been found in a broad range of bacterial species; however, among sequencedE. colistrains, IEE is primarily found in EHEC isolates. In this study, we investigated non-EHEC pathogenicE. colistrains isolated from domestic animals and found that IEE is distributed in specific lineages of enterotoxigenicE. coli(ETEC) strains of serotypes O139 or O149 isolated from swine. Theieegene is located within integrative elements that are similar to SpLE1 of EHEC O157. Alliee-positive ETEC lineages also contained multiple copies of IS629, a preferred substrate of IEE, and their genomic locations varied significantly between strains, as observed in O157. These data suggest that IEE may have been transferred among EHEC and ETEC in swine via SpLE1 or SpLE1-like integrative elements. In addition, IS629is actively moving in the ETEC O139 and O149 genomes and, as in EHEC O157, is promoting the diversification of these genomes in combination with IEE.

Download Full-text

Bactopia: a Flexible Pipeline for Complete Analysis of Bacterial Genomes

mSystems ◽

10.1128/msystems.00190-20 ◽

2020 ◽

Vol 5 (4) ◽

Author(s):

Robert A. Petit ◽

Timothy D. Read

Keyword(s):

Open Source ◽

Genome Analysis ◽

Bacterial Species ◽

Bacterial Genome ◽

Complete Analysis ◽

Comparative Genomic ◽

Data Sets ◽

Bacterial Genomes ◽

Data Set ◽

Content Type

ABSTRACT Sequencing of bacterial genomes using Illumina technology has become such a standard procedure that often data are generated faster than can be conveniently analyzed. We created a new series of pipelines called Bactopia, built using Nextflow workflow software, to provide efficient comparative genomic analyses for bacterial species or genera. Bactopia consists of a data set setup step (Bactopia Data Sets [BaDs]), which creates a series of customizable data sets for the species of interest, the Bactopia Analysis Pipeline (BaAP), which performs quality control, genome assembly, and several other functions based on the available data sets and outputs the processed data to a structured directory format, and a series of Bactopia Tools (BaTs) that perform specific postprocessing on some or all of the processed data. BaTs include pan-genome analysis, computing average nucleotide identity between samples, extracting and profiling the 16S genes, and taxonomic classification using highly conserved genes. It is expected that the number of BaTs will increase to fill specific applications in the future. As a demonstration, we performed an analysis of 1,664 public Lactobacillus genomes, focusing on Lactobacillus crispatus, a species that is a common part of the human vaginal microbiome. Bactopia is an open source system that can scale from projects as small as one bacterial genome to ones including thousands of genomes and that allows for great flexibility in choosing comparison data sets and options for downstream analysis. Bactopia code can be accessed at https://www.github.com/bactopia/bactopia. IMPORTANCE It is now relatively easy to obtain a high-quality draft genome sequence of a bacterium, but bioinformatic analysis requires organization and optimization of multiple open source software tools. We present Bactopia, a pipeline for bacterial genome analysis, as an option for processing bacterial genome data. Bactopia also automates downloading of data from multiple public sources and species-specific customization. Because the pipeline is written in the Nextflow language, analyses can be scaled from individual genomes on a local computer to thousands of genomes using cloud resources. As a usage example, we processed 1,664 Lactobacillus genomes from public sources and used comparative analysis workflows (Bactopia Tools) to identify and analyze members of the L. crispatus species.

Download Full-text

An RNase III processed, antisense RNA pair regulates a Campylobacter jejuni colonization factor

10.1101/2021.03.19.434396 ◽

2021 ◽

Author(s):

Sarah L Svensson ◽

Cynthia M. Sharma

Keyword(s):

Campylobacter Jejuni ◽

Stress Responses ◽

Foodborne Pathogen ◽

Rnase E ◽

Rnase Iii ◽

Cis Acting ◽

Colonization Factor ◽

Genome Wide ◽

Intergenic Regions ◽

Genomic Locations

Small RNAs (sRNAs) are emerging as important and diverse post-transcriptional gene expression regulators in bacterial stress responses and virulence. While originally identified mainly in intergenic regions, genome-wide approaches have revealed sRNAs encoded in diverse contexts, such as processed from parental transcripts by RNase E. Despite its well-known roles in rRNA processing, RNA decay, cleavage of sRNA-mRNA duplexes, the role of RNase III in sRNA biogenesis is less well understood. Here, we show that a pair of cis-encoded sRNAs (CJnc190 and CJnc180) are processed by RNase III in the foodborne pathogen Campylobacter jejuni. While CJnc180 processing requires CJnc190, RNase III cleaves an intramolecular duplex in CJnc190, independent of CJnc180. Moreover, we demonstrate that CJnc190 directly represses translation of the colonization factor PtmG by binding its G-rich ribosome binding site, and show that CJnc180 is a cis-acting antagonist of CJnc190, thereby indirectly affecting ptmG regulation. Our results expand the diversity of known genomic locations of bacterial sRNA sponges and highlight a role for bacterial RNase III that parallels miRNA processing by related eukaryotic Dicer and Drosha.

Download Full-text

A fast and agnostic method for bacterial genome-wide association studies: bridging the gap between kmers and genetic events

10.1101/297754 ◽

2018 ◽

Cited By ~ 3

Author(s):

Magali Jaillard ◽

Leandro Lima ◽

Maud Tournoud ◽

Pierre Mahé ◽

Alex van Belkum ◽

...

Keyword(s):

Antibiotic Resistance ◽

Genetic Variants ◽

Association Studies ◽

Bacterial Species ◽

Bacterial Genome ◽

Genome Wide Association ◽

Compact Set ◽

Genome Wide Association Studies ◽

Alignment Free ◽

Genome Wide

AbstractMotivationGenome-wide association study (GWAS) methods applied to bacterial genomes have shown promising results for genetic marker discovery or fine-assessment of marker effect. Recently, alignment-free methods based on kmer composition have proven their ability to explore the accessory genome. However, they lead to redundant descriptions and results which are hard to interpret.MethodsHere, we introduce DBGWAS, an extended kmer-based GWAS method producing interpretable genetic variants associated with pheno-types. Relying on compacted De Bruijn graphs (cDBG), our method gathers cDBG nodes identified by the association model into subgraphs defined from their neighbourhood in the initial cDBG. DBGWAS is fast, alignment-free and only requires a set of contigs and phenotypes. It produces annotated subgraphs representing local polymorphisms as well as mobile genetic elements (MGE) and offers a graphical framework to interpret GWAS results.ResultsWe validated our method using antibiotic resistance phenotypes for three bacterial species. DBGWAS recovered known resistance determinants such as mutations in core genes in Mycobacterium tuberculosis and genes acquired by horizontal transfer in Staphylococcus aureus and Pseudomonas aeruginosa – along with their MGE context. It also enabled us to formulate new hypotheses involving genetic variants not yet described in the antibiotic resistance literature.ConclusionOur novel method proved its efficiency to retrieve any type of phenotype-associated genetic variant without prior knowledge. All experiments were computed in less than two hours and produced a compact set of meaningful subgraphs, thereby outperforming other GWAS approaches and facilitating the interpretation of the results.AvailabilityOpen-source tool available at https://gitlab.com/leoisl/dbgwas

Download Full-text