Prediction and analysis of Metagenomic operons via MetaRon: a Pipeline for Prediction of Metagenomic OpeRons

2020 ◽  
Author(s):  
Syed Shujaat Ali Zaidi ◽  
Masood Ur Rehman Kayani ◽  
Xuegong Zhang ◽  
Imran Haider Shamsi

Abstract Background: Efficient regulation of bacterial genes against the environmental stimulus results in unique operonic organizations. Lack of complete reference and functional information makes metagenomic operon prediction challenging and therefore opens new perspectives on the interpretation of the host-microbe interactions. Methods: Here we present MetaRon (pipeline for the prediction of Metagenomic operons), an open-source pipeline explicitly designed for the metagenomic shotgun sequencing data. It recreates the operonic structure without functional information. MetaRon identifies closely packed co-directional gene clusters with a promoter upstream and downstream of the first and last gene, respectively. Promoter prediction marks the transcriptional unit boundary (TUB) of closely packed co-directional gene clusters.Results: Escherichia coli (E. coli) K-12 MG1655 presents a gold standard for operon prediction. Therefore, MetaRon was initially implemented on two simulated illumina datasets: (1) E. coli MG1655 genome (2) a mixture of E. coli MG1655, Mycobacterium tuberculosis H37Rv and Bacillus subtilis str. 168 genomes. Operons were predicted in the single genome and mixture of genomes with a sensitivity of 97.8% and 93.7%, respectively. In the next phase, operons predicted from E. coli c20 draft genome isolated from chicken gut metagenome achieved a sensitivity of 94.1%. Lastly, the application of MetaRon on 145 paired-end gut metagenome samples identified 1,232,407 unique operons. Conclusion: MetaRon removes two notable limitations of existing methods: (1) dependency on functional information, and (2) liberates the users from enormous metagenomic data management. Current study showed the idea of using operons as subset to represent the whole-metagenome in terms of secondary metabolites and demonstrated its effectiveness in explaining the occurrence of a disease condition. This will significantly reduce the hefty whole-metagenome data to a small more precise data set. Furthermore, metabolic pathways from the operonic sequences were identified in association with the occurrence of type 2 diabetes (T2D). Presumably, this is the first organized effort to predict metagenomic operons and perform a detailed analysis in association with a disease, in this case T2D. The application of MetaRon to metagenome data at diverse scale will be beneficial to understand the gene regulation and therapeutic metagenomics.

BMC Genomics ◽  
2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Syed Shujaat Ali Zaidi ◽  
Masood Ur Rehman Kayani ◽  
Xuegong Zhang ◽  
Younan Ouyang ◽  
Imran Haider Shamsi

Abstract Background Efficient regulation of bacterial genes in response to the environmental stimulus results in unique gene clusters known as operons. Lack of complete operonic reference and functional information makes the prediction of metagenomic operons a challenging task; thus, opening new perspectives on the interpretation of the host-microbe interactions. Results In this work, we identified whole-genome and metagenomic operons via MetaRon (Metagenome and whole-genome opeRon prediction pipeline). MetaRon identifies operons without any experimental or functional information. MetaRon was implemented on datasets with different levels of complexity and information. Starting from its application on whole-genome to simulated mixture of three whole-genomes (E. coli MG1655, Mycobacterium tuberculosis H37Rv and Bacillus subtilis str. 16), E. coli c20 draft genome extracted from chicken gut and finally on 145 whole-metagenome data samples from human gut. MetaRon consistently achieved high operon prediction sensitivity, specificity and accuracy across E. coli whole-genome (97.8, 94.1 and 92.4%), simulated genome (93.7, 75.5 and 88.1%) and E. coli c20 (87, 91 and 88%,), respectively. Finally, we identified 1,232,407 unique operons from 145 paired-end human gut metagenome samples. We also report strong association of type 2 diabetes with Maltose phosphorylase (K00691), 3-deoxy-D-glycero-D-galacto-nononate 9-phosphate synthase (K21279) and an uncharacterized protein (K07101). Conclusion With MetaRon, we were able to remove two notable limitations of existing whole-genome operon prediction methods: (1) generalizability (ability to predict operons in unrelated bacterial genomes), and (2) whole-genome and metagenomic data management. We also demonstrate the use of operons as a subset to represent the trends of secondary metabolites in whole-metagenome data and the role of secondary metabolites in the occurrence of disease condition. Using operonic data from metagenome to study secondary metabolic trends will significantly reduce the data volume to more precise data. Furthermore, the identification of metabolic pathways associated with the occurrence of type 2 diabetes (T2D) also presents another dimension of analyzing the human gut metagenome. Presumably, this study is the first organized effort to predict metagenomic operons and perform a detailed analysis in association with a disease, in this case type 2 diabetes. The application of MetaRon to metagenomic data at diverse scale will be beneficial to understand the gene regulation and therapeutic metagenomics.


2003 ◽  
Vol 185 (5) ◽  
pp. 1634-1641 ◽  
Author(s):  
Luis Izquierdo ◽  
Susana Merino ◽  
Miguel Regué ◽  
Florencia Rodriguez ◽  
Juan M. Tomás

ABSTRACT A recombinant clone encoding enzymes for Klebsiella pneumoniae O12-antigen lipopolysaccharide (LPS) was found when we screened for serum resistance of a cosmid-based genomic library of K. pneumoniae KT776 (O12:K80) introduced into Escherichia coli DH5α. A total of eight open reading frames (ORFs) (wb O12 gene cluster) were necessary to produce K. pneumoniae O12-antigen LPS in E. coli K-12. A complete analysis of the K. pneumoniae wb O12 cluster revealed an interesting coincidence with the wb O4 cluster of Serratia marcescens from ORF5 to ORF8 (or WbbL to WbbA). This prompted us to generate mutants of K. pneumoniae strain KT776 (O12) and to study complementation between the two enterobacterial wb clusters using mutants of S. marcescens N28b (O4) obtained previously. Both wb gene clusters are examples of ABC 2 transporter-dependent pathways for O-antigen heteropolysaccharides. The wzm-wzt genes and the wbbA or wbbB genes were not interchangeable between the two gene clusters despite their high level of similarity. However, introduction of three cognate genes (wzm-wzt-wbbA or wzm-wzt-wbbB) into mutants unable to produce O antigen allowed production of the specific O antigen. The K. pneumoniae O12 WbbL protein performs the same function as WbbL from S. marcescens O4 in either the S. marcescens O4 or E. coli K-12 genetic background.


Microbiology ◽  
2005 ◽  
Vol 151 (2) ◽  
pp. 385-398 ◽  
Author(s):  
Jana Hejnova ◽  
Ulrich Dobrindt ◽  
Radka Nemcova ◽  
Christophe Rusniok ◽  
Alojz Bomba ◽  
...  

Colonization by the commensal Escherichia coli strain A0 34/86 (O83 : K24 : H31) has proved to be safe and efficient in the prophylaxis and treatment of nosocomial infections and diarrhoea of preterm and newborn infants in Czech paediatric clinics over the past three decades. In searching for traits contributing to this beneficial effect related to the gut colonization capacity of the strain, the authors have analysed its genome by DNA–DNA hybridization to E. coli K-12 (MG1655) genomic DNA arrays and to ‘Pathoarrays’, as well as by multiplex PCR, bacterial artificial chromosome (BAC) library cloning and shotgun sequencing. Four hundred and ten E. coli K-12 ORFs were absent from A0 34/86, while 72 out of 456 genes associated with pathogenicity islands of E. coli and Shigella were also detected in E. coli A0 34/86. Furthermore, extraintestinal pathogenic E. coli-related genes involved in iron uptake and adhesion were detected by multiplex PCR, and genes encoding the HlyA and cytotoxic necrotizing factor toxins, together with 21 genes of the uropathogenic E. coli 536 pathogenicity island II, were identified by analysis of 2304 shotgun and 1344 BAC clone sequences of A0 34/86 DNA. Multiple sequence comparisons identified 31 kb of DNA specific for E. coli A0 34/86; some of the genes carried by this DNA may prove to be implicated in the colonization capacity of the strain, enabling it to outcompete pathogens. Among 100 examined BAC clones roughly covering the A0 34/86 genome, one reproducibly conferred on the laboratory strain DH10B an enhanced capacity to persist in the intestine of newborn piglets. Sequencing revealed that this BAC clone carried gene clusters encoding gluconate and mannonate metabolism, adhesion (fim), invasion (ibe) and restriction/modification functions. Hence, the genome of this clinically safe and highly efficient colonizer strain appears to harbour many ‘virulence-associated’ genes. These results highlight the thin line between bacterial ‘virulence’ and ‘fitness' or ‘colonization’ factors, and question the definition of enterobacterial virulence factors.


2002 ◽  
Vol 184 (16) ◽  
pp. 4374-4383 ◽  
Author(s):  
Abel Ferrández ◽  
Andrew C. Hawkins ◽  
Douglas T. Summerfield ◽  
Caroline S. Harwood

ABSTRACT Pseudomonas aeruginosa, a γ-proteobacterium, is motile by means of a single polar flagellum and is chemotactic to a variety of organic compounds and phosphate. P. aeruginosa has multiple homologues of Escherichia coli chemotaxis genes that are organized into five gene clusters. Previously, it was demonstrated that genes in cluster I and cluster V are essential for chemotaxis. A third cluster (cluster II) contains a complete set of che genes, as well as two genes, mcpA and mcpB, encoding methyl-accepting chemotaxis proteins. Mutations were constructed in several of the cluster II che genes and in the mcp genes to examine their possible contributions to P. aeruginosa chemotaxis. A cheB2 mutant was partially impaired in chemotaxis in soft-agar swarm plate assays. Providing cheB2 in trans complemented this defect. Further, overexpression of CheB2 restored chemotaxis to a completely nonchemotactic, cluster I, cheB-deficient strain to near wild-type levels. An mcpA mutant was defective in chemotaxis in media that were low in magnesium. The defect could be relieved by the addition of magnesium to the swarm plate medium. An mcpB mutant was defective in chemotaxis when assayed in dilute rich soft-agar swarm medium or in minimal-medium swarm plates containing any 1 of 60 chemoattractants. The mutant phenotype could be complemented by the addition of mcpB in trans. Overexpression of either McpA or McpB in P. aeruginosa or Escherichia coli resulted in impairment of chemotaxis, and these cells had smooth-swimming phenotypes when observed under the microscope. Expression of P. aeruginosa cheA2, cheB2, or cheW2 in E. coli K-12 completely disrupted wild-type chemotaxis, while expression of cheY2 had no effect. These results indicate that che cluster II genes are expressed in P. aeruginosa and are required for an optimal chemotactic response.


2017 ◽  
Vol 5 (4) ◽  
Author(s):  
Annika Cimdins ◽  
Petra Lüthje ◽  
Fengyang Li ◽  
Irfan Ahmad ◽  
Annelie Brauner ◽  
...  

ABSTRACT Strains of Escherichia coli exhibit diverse biofilm formation capabilities. E. coli K-12 expresses the red, dry, and rough (rdar) morphotype below 30°C, whereas clinical isolates frequently display the rdar morphotype semiconstitutively. We sequenced the genomes of eight E. coli strains to subsequently investigate the molecular basis of semiconstitutive rdar morphotype expression.


2003 ◽  
Vol 185 (5) ◽  
pp. 1659-1671 ◽  
Author(s):  
Emilisa Frirdich ◽  
Buko Lindner ◽  
Otto Holst ◽  
Chris Whitfield

ABSTRACT The waa gene cluster is responsible for the biosynthesis of the lipopolysaccharide (LPS) core region in Escherichia coli and Salmonella. Homologs of the waaZ gene product are encoded by the waa gene clusters of Salmonella enterica and E. coli strains with the K-12 and R2 core types. Overexpression of WaaZ in E. coli and S. enterica led to a modified LPS structure showing core truncations and (where relevant) to a reduction in the amount of O-polysaccharide side chains. Mass spectrometry and nuclear magnetic resonance spectroscopy were used to determine the predominant LPS structures in an E. coli isolate with an R1 core (waaZ is lacking from the type R1 waa gene cluster) with a copy of the waaZ gene added on a plasmid. Novel truncated LPS structures, lacking up to 3 hexoses from the outer core, resulted from WaaZ overexpression. The truncated molecules also contained a KdoIII residue not normally found in the R1 core.


1998 ◽  
Vol 66 (9) ◽  
pp. 4305-4312 ◽  
Author(s):  
Michael McClelland ◽  
Richard K. Wilson

ABSTRACT Raw sequence data representing the majority of a bacterial genome can be obtained at a tiny fraction of the cost of a completed sequence. To demonstrate the utility of such a resource, 870 single-stranded M13 clones were sequenced from a shotgun library of the Salmonella typhi Ty2 genome. The sequence reads averaged over 400 bases and sampled the genome with an average spacing of once every 5,000 bases. A total of 339,243 bases of unique sequence was generated (approximately 7% representation). The sample of 870 sequences was compared to the complete Escherichia coli K-12 genome and to the rest of the GenBank database, which can also be considered a collection of sampled sequences. Despite the incomplete S. typhidata set, interesting categories could easily be discerned. Sixteen percent of the sequences determined from S. typhi had close homologs among known Salmonella sequences (P < 1e −40 in BlastX or BlastN), reflecting the proportion of these genomes that have been sequenced previously; 277 sequences (32%) had no apparent orthologs in the complete E. coli K-12 genome (P > 1e −20), of which 155 sequences (18%) had no close similarities to any sequence in the database (P> 1e −5). Eight of the 277 sequences had similarities to genes in other strains of E. coli or plasmids, and six sequences showed evidence of novel phage lysogens or sequence remnants of phage integrations, including a member of the lambda family (P < 1e −15). Twenty-three sample sequences had a significantly closer similarity a sequence in the database from organisms other than the E. coli/Salmonella clade (which includes Shigella andCitrobacter). These sequences are new candidate lateral transfer events to the S. typhi lineage or deletions on the E. coli K-12 lineage. Eleven putative junctions of insertion/deletion events greater than 100 bp were observed in the sample, indicating that well over 150 such events may distinguishS. typhi from E. coli K-12. The need for automatic methods to more effectively exploit sample sequences is discussed.


2017 ◽  
Author(s):  
Alberto Santos-Zavaleta ◽  
Mishael Sánchez-Pérez ◽  
Heladia Salgado ◽  
David A. Velázquez-Ramírez ◽  
Socorro Gama-Castro ◽  
...  

ABSTRACTOur understanding of the regulation of gene expression has been strongly benefited by the availability of high throughput technologies that enable questioning the whole genome for the binding of specific transcription factors and expression profiles. In the case of genome models, such asEscherichia coliK-12, this knowledge needs to be integrated with the legacy of accumulated genetics and molecular biology pre-genomic knowledge in order to attain deeper levels in the understanding of their biology. In spite of the several repositories and curated databases, there is no effort, nor electronic site yet, to comprehensively integrate the available knowledge from all these different sources around the regulation of gene expression ofE. coliK-12. In this paper, we describe a first effort to expand RegulonDB, the database containing the rich legacy of decades of classic molecular biology experiments supporting what we know about gene regulation and operon organization inE. coliK-12, to include the genome-wide data set collections from 25 ChIP and 18 gSELEX publications, respectively, in addition to around 60 expression profiles used in their curation. Three essential features for the integration of this information coming from different methodological approaches are; first, a controlled vocabulary within an ontology for precisely defining growth conditions, second, the criteria to separate elements with enough evidence to consider them involved in gene regulation from isolated sites, and third, an expanded computational model supporting this knowledge. Altogether, this constitutes the basis for adequately gathering and enabling the comparisons and integration strongly needed to manage and access such wealth of knowledge. This version of RegulonBD is a first step toward what should become the unifying access point for current and future knowledge on gene regulation inE. coliK-12. Furthermore, this model platform and associated methodologies and criteria, can well be emulated for gathering knowledge on other microbial organisms.


2017 ◽  
Vol 5 (27) ◽  
Author(s):  
Daniela Dimitrova ◽  
Kathleen C. Engelbrecht ◽  
Catherine Putonti ◽  
David W. Koenig ◽  
Alan J. Wolfe

ABSTRACT Here, we present the draft genome sequence of Escherichia coli ATCC 10798. E. coli ATCC 10798 is a K-12 strain, one of the most well-studied model microorganisms. The size of the genome was 4,685,496 bp, with a G+C content of 50.70%. This assembly consists of 62 contigs and the F plasmid.


Sign in / Sign up

Export Citation Format

Share Document