scholarly journals Comparison of Sample Sequences of the Salmonella typhiGenome to the Sequence of the Complete Escherichia coliK-12 Genome

1998 ◽  
Vol 66 (9) ◽  
pp. 4305-4312 ◽  
Author(s):  
Michael McClelland ◽  
Richard K. Wilson

ABSTRACT Raw sequence data representing the majority of a bacterial genome can be obtained at a tiny fraction of the cost of a completed sequence. To demonstrate the utility of such a resource, 870 single-stranded M13 clones were sequenced from a shotgun library of the Salmonella typhi Ty2 genome. The sequence reads averaged over 400 bases and sampled the genome with an average spacing of once every 5,000 bases. A total of 339,243 bases of unique sequence was generated (approximately 7% representation). The sample of 870 sequences was compared to the complete Escherichia coli K-12 genome and to the rest of the GenBank database, which can also be considered a collection of sampled sequences. Despite the incomplete S. typhidata set, interesting categories could easily be discerned. Sixteen percent of the sequences determined from S. typhi had close homologs among known Salmonella sequences (P < 1e −40 in BlastX or BlastN), reflecting the proportion of these genomes that have been sequenced previously; 277 sequences (32%) had no apparent orthologs in the complete E. coli K-12 genome (P > 1e −20), of which 155 sequences (18%) had no close similarities to any sequence in the database (P> 1e −5). Eight of the 277 sequences had similarities to genes in other strains of E. coli or plasmids, and six sequences showed evidence of novel phage lysogens or sequence remnants of phage integrations, including a member of the lambda family (P < 1e −15). Twenty-three sample sequences had a significantly closer similarity a sequence in the database from organisms other than the E. coli/Salmonella clade (which includes Shigella andCitrobacter). These sequences are new candidate lateral transfer events to the S. typhi lineage or deletions on the E. coli K-12 lineage. Eleven putative junctions of insertion/deletion events greater than 100 bp were observed in the sample, indicating that well over 150 such events may distinguishS. typhi from E. coli K-12. The need for automatic methods to more effectively exploit sample sequences is discussed.

2020 ◽  
Author(s):  
Syed Shujaat Ali Zaidi ◽  
Masood Ur Rehman Kayani ◽  
Xuegong Zhang ◽  
Imran Haider Shamsi

Abstract Background: Efficient regulation of bacterial genes against the environmental stimulus results in unique operonic organizations. Lack of complete reference and functional information makes metagenomic operon prediction challenging and therefore opens new perspectives on the interpretation of the host-microbe interactions. Methods: Here we present MetaRon (pipeline for the prediction of Metagenomic operons), an open-source pipeline explicitly designed for the metagenomic shotgun sequencing data. It recreates the operonic structure without functional information. MetaRon identifies closely packed co-directional gene clusters with a promoter upstream and downstream of the first and last gene, respectively. Promoter prediction marks the transcriptional unit boundary (TUB) of closely packed co-directional gene clusters.Results: Escherichia coli (E. coli) K-12 MG1655 presents a gold standard for operon prediction. Therefore, MetaRon was initially implemented on two simulated illumina datasets: (1) E. coli MG1655 genome (2) a mixture of E. coli MG1655, Mycobacterium tuberculosis H37Rv and Bacillus subtilis str. 168 genomes. Operons were predicted in the single genome and mixture of genomes with a sensitivity of 97.8% and 93.7%, respectively. In the next phase, operons predicted from E. coli c20 draft genome isolated from chicken gut metagenome achieved a sensitivity of 94.1%. Lastly, the application of MetaRon on 145 paired-end gut metagenome samples identified 1,232,407 unique operons. Conclusion: MetaRon removes two notable limitations of existing methods: (1) dependency on functional information, and (2) liberates the users from enormous metagenomic data management. Current study showed the idea of using operons as subset to represent the whole-metagenome in terms of secondary metabolites and demonstrated its effectiveness in explaining the occurrence of a disease condition. This will significantly reduce the hefty whole-metagenome data to a small more precise data set. Furthermore, metabolic pathways from the operonic sequences were identified in association with the occurrence of type 2 diabetes (T2D). Presumably, this is the first organized effort to predict metagenomic operons and perform a detailed analysis in association with a disease, in this case T2D. The application of MetaRon to metagenome data at diverse scale will be beneficial to understand the gene regulation and therapeutic metagenomics.


2015 ◽  
Author(s):  
Ivan Sovic ◽  
Kresimir Krizanovic ◽  
Karolj Skala ◽  
Mile Sikic

Recent emergence of nanopore sequencing technology set a challenge for the established assembly methods not optimized for the combination of read lengths and high error rates of nanopore reads. In this work we assessed how existing de novo assembly methods perform on these reads. We benchmarked three non-hybrid (in terms of both error correction and scaffolding) assembly pipelines as well as two hybrid assemblers which use third generation sequencing data to scaffold Illumina assemblies. Tests were performed on several publicly available MinION and Illumina datasets of E. coli K-12, using several sequencing coverages of nanopore data (20x, 30x, 40x and 50x). We attempted to assess the quality of assembly at each of these coverages, to estimate the requirements for closed bacterial genome assembly. Results show that hybrid methods are highly dependent on the quality of NGS data, but much less on the quality and coverage of nanopore data and perform relatively well on lower nanopore coverages. Furthermore, when coverage is above 40x, all non-hybrid methods correctly assemble the E. coli genome, even a non-hybrid method tailored for Pacific Bioscience reads. While it requires higher coverage compared to a method designed particularly for nanopore reads, its running time is significantly lower.


2021 ◽  
Vol 12 ◽  
Author(s):  
Steven P.T. Hooton ◽  
Alexander C.W. Pritchard ◽  
Karishma Asiani ◽  
Charlotte J. Gray-Hammerton ◽  
Dov J. Stekel ◽  
...  

Salmonella Typhimurium carrying the multidrug resistance (MDR) plasmid pMG101 was isolated from three burns patients in Boston United States in 1973. pMG101 was transferrable into other Salmonella spp. and Escherichia coli hosts and carried what was a novel and unusual combination of AMR genes and silver resistance. Previously published short-read DNA sequence of pMG101 showed that it was a 183.5Kb IncHI plasmid, where a Tn7-mediated transposition of pco/sil resistance genes into the chromosome of the E. coli K-12 J53 host strain had occurred. We noticed differences in streptomycin resistance and plasmid size between two stocks of E. coli K-12 J53 pMG101 we possessed, which had been obtained from two different laboratories (pMG101-A and pMG101-B). Long-read sequencing (PacBio) of the two strains unexpectedly revealed plasmid and chromosomal rearrangements in both. pMG101-A is a non-transmissible 383Kb closed-circular plasmid consisting of an IncHI2 plasmid sequence fused to an IncFI/FIIA plasmid. pMG101-B is a mobile closed-circular 154 Kb IncFI/FIIA plasmid. Sequence identity of pMG101-B with the fused IncFI/IncFIIA region of pMG101-A was &gt;99%. Assembled host sequence reads of pMG101-B showed Tn7-mediated transposition of pco/sil into the E. coli J53 chromosome between yhiM and yhiN. Long read sequence data in combination with laboratory experiments have demonstrated large scale changes in pMG101. Loss of conjugation function and movement of resistance genes into the chromosome suggest that even under long-term laboratory storage, mobile genetic elements such as transposons and insertion sequences can drive the evolution of plasmids and host. This study emphasises the importance of utilising long read sequencing technologies of plasmids and host strains at the earliest opportunity.


2014 ◽  
Author(s):  
Josh Quick ◽  
Aaron Quinlan ◽  
Nicholas Loman

Background: The MinION™ is a new, portable single-molecule sequencer developed by Oxford Nanopore Technologies. It measures four inches in length and is powered from the USB 3.0 port of a laptop computer. By measuring the change in current produced when DNA strands translocate through and interact with a charged protein nanopore the device is able to deduce the underlying nucleotide sequence. Findings: We present a read dataset from whole-genome shotgun sequencing of the model organism Escherichia coli K-12 substr. MG1655 generated on a MinION™ device during the early-access MinION Access Program (MAP). Sequencing runs of the MinION™ are presented, one generated using R7 chemistry (released in July 2014) and one using R7.3 (released in September 2014). Conclusions: Base-called sequence data are provided to demonstrate the nature of data produced by the MinION™ platform and to encourage the development of customised methods for alignment, consensus and variant calling, de novo assembly and scaffolding. FAST5 files containing event data within the HDF5 container format are provided to assist with the development of improved base-calling methods. Datasets are provided through the GigaDB database at http://gigadb.org/dataset/100102


1999 ◽  
Vol 67 (2) ◽  
pp. 772-781 ◽  
Author(s):  
Christos Stathopoulos ◽  
David L. Provence ◽  
Roy Curtiss

ABSTRACT We reported earlier that a single gene, tsh, isolated from a strain of avian pathogenic Escherichia coli (APEC) was sufficient to confer on E. coli K-12 a hemagglutinin-positive phenotype and that the deduced sequence of the Tsh protein shared homology to the serine-type immunoglobulin A (IgA) proteases of Neisseria gonorrhoeae and Haemophilus influenzae. In this report we show that E. coli K-12 containing the recombinant tsh gene produced two proteins, a 106-kDa extracellular protein and a 33-kDa outer membrane protein, and was also able to agglutinate chicken erythrocytes. N-terminal sequence data indicated that the 106-kDa protein, designated Tshs, was derived from the N-terminal end of Tsh after the removal of a 52-amino-acid N-terminal signal peptide, while the 33-kDa protein, designated Tshβ, was derived from the C-terminal end of Tsh starting at residue N1101. The Tshsdomain contains the 7-amino-acid serine protease motif that includes the active-site serine (S259), found also in the secreted domains of the IgA proteases. However, site-directed mutagenesis of S259 did not abolish the hemagglutinin activity or the extracellular secretion of Tshs indicating that host-directed proteolysis was mediating the release of Tshs. Studies with an E. coli K-12ompT mutant strain showed that the surface protease OmpT was not needed for the secretion of Tshs. Tsh belongs to a subclass of the IgA protease family, which also includes EspC of enteropathogenic E. coli, EspP of enterohemorragic E. coli, and SepA and VirG of Shigella flexneri, which seem to involve a host endopeptidase to achieve extracellular release of their N-terminal domains. In proteolytic studies conducted in vitro, Tshs did not cleave the substrate of the IgA proteases, human IgA1 or chicken IgA, and did not show proteolytic activity in a casein-based assay. Correlation of Tsh expression and hemagglutination activity appears to be a very complex phenomenon, influenced by strain and environmental conditions. Nevertheless, for both APEC and recombinant E. coli K-12 strains containing thetsh gene, it was only the whole bacterial cells and not the cell-free supernatants that could confer hemagglutinin activity. Our results provide insights into the expression, secretion, and proteolytic features of the Tsh protein, which belongs to the growing family of gram-negative bacterial extracellular virulence factors, named autotransporters, which utilize a self-mediated mechanism to achieve export across the bacterial cell envelope.


2003 ◽  
Vol 185 (6) ◽  
pp. 1831-1840 ◽  
Author(s):  
Ulrich Dobrindt ◽  
Franziska Agerer ◽  
Kai Michaelis ◽  
Andreas Janka ◽  
Carmen Buchrieser ◽  
...  

ABSTRACT Genomes of prokaryotes differ significantly in size and DNA composition. Escherichia coli is considered a model organism to analyze the processes involved in bacterial genome evolution, as the species comprises numerous pathogenic and commensal variants. Pathogenic and nonpathogenic E. coli strains differ in the presence and absence of additional DNA elements contributing to specific virulence traits and also in the presence and absence of additional genetic information. To analyze the genetic diversity of pathogenic and commensal E. coli isolates, a whole-genome approach was applied. Using DNA arrays, the presence of all translatable open reading frames (ORFs) of nonpathogenic E. coli K-12 strain MG1655 was investigated in 26 E. coli isolates, including various extraintestinal and intestinal pathogenic E. coli isolates, 3 pathogenicity island deletion mutants, and commensal and laboratory strains. Additionally, the presence of virulence-associated genes of E. coli was determined using a DNA “pathoarray” developed in our laboratory. The frequency and distributional pattern of genomic variations vary widely in different E. coli strains. Up to 10% of the E. coli K-12-specific ORFs were not detectable in the genomes of the different strains. DNA sequences described for extraintestinal or intestinal pathogenic E. coli are more frequently detectable in isolates of the same origin than in other pathotypes. Several genes coding for virulence or fitness factors are also present in commensal E. coli isolates. Based on these results, the conserved E. coli core genome is estimated to consist of at least 3,100 translatable ORFs. The absence of K-12-specific ORFs was detectable in all chromosomal regions. These data demonstrate the great genome heterogeneity and genetic diversity among E. coli strains and underline the fact that both the acquisition and deletion of DNA elements are important processes involved in the evolution of prokaryotes.


2017 ◽  
Author(s):  
Alberto Santos-Zavaleta ◽  
Mishael Sánchez-Pérez ◽  
Heladia Salgado ◽  
David A. Velázquez-Ramírez ◽  
Socorro Gama-Castro ◽  
...  

ABSTRACTOur understanding of the regulation of gene expression has been strongly benefited by the availability of high throughput technologies that enable questioning the whole genome for the binding of specific transcription factors and expression profiles. In the case of genome models, such asEscherichia coliK-12, this knowledge needs to be integrated with the legacy of accumulated genetics and molecular biology pre-genomic knowledge in order to attain deeper levels in the understanding of their biology. In spite of the several repositories and curated databases, there is no effort, nor electronic site yet, to comprehensively integrate the available knowledge from all these different sources around the regulation of gene expression ofE. coliK-12. In this paper, we describe a first effort to expand RegulonDB, the database containing the rich legacy of decades of classic molecular biology experiments supporting what we know about gene regulation and operon organization inE. coliK-12, to include the genome-wide data set collections from 25 ChIP and 18 gSELEX publications, respectively, in addition to around 60 expression profiles used in their curation. Three essential features for the integration of this information coming from different methodological approaches are; first, a controlled vocabulary within an ontology for precisely defining growth conditions, second, the criteria to separate elements with enough evidence to consider them involved in gene regulation from isolated sites, and third, an expanded computational model supporting this knowledge. Altogether, this constitutes the basis for adequately gathering and enabling the comparisons and integration strongly needed to manage and access such wealth of knowledge. This version of RegulonBD is a first step toward what should become the unifying access point for current and future knowledge on gene regulation inE. coliK-12. Furthermore, this model platform and associated methodologies and criteria, can well be emulated for gathering knowledge on other microbial organisms.


2006 ◽  
Vol 3 (1) ◽  
pp. 1-11
Author(s):  
Will Rosellini ◽  
Frank McEachern

AbstractA database is defined as “any organized collection of information” and can include any number of different categories, from clients list, to phonebooks to nucleotide sequence data for E. Coli. As the speed and storage capacity of next generation computers continue to Moore's law, doubling every 12–18 months, databases are becoming vital tools to extract information that to date has gone unnoticed. Nowhere is this more applicable or more evident than in the burgeoning field of bioinformatics, the science of applying computers to biological problems. Bioinformaticians are a loose consort of biologists, physicists, chemists, mathematicians who also understand principles of computer science. These Bioinformaticians invest substantial resources in the form of money, time, and personnel in gathering information, verifying the accuracy of that information, and bringing it together in one location. Until very recently, scientists who fit this definition have largely been academic researchers, but with the mapping of the Human Genome completed, these scientists have now begun to become more and more prevalent in commercial settings. To date, these academic scientists have participated in open source data sharing to speed scientific progress, but as the cost of development increases and commercial entities continue to exploit bioinformatics tools for the production of new drug candidates this compiled information has become the topic of hot debate. The difficulty associated with a discussion of such a highly technical subject both intellectual property and genomics, coupled with a paradigm shift in the science makes for a very difficult and perhaps insurmountable barrier. In spite of these difficulties, one should not be discouraged, as the innovation at this level is the beginning of the end of death and disease in human beings.


2001 ◽  
Vol 183 (23) ◽  
pp. 6943-6946 ◽  
Author(s):  
L. SaiSree ◽  
Manjula Reddy ◽  
J. Gowrishankar

ABSTRACT The radiation sensitivity of Escherichia coli B was first described more than 50 years ago, and the genetic locus responsible for the trait was subsequently identified aslon (encoding Lon protease). We now show that bothE. coli B and the first reported E. coliK-12 lon mutant, AB1899, carry IS186insertions in opposite orientations at a single site in thelon promoter region and that this site represents a natural hot spot for transposition of the insertion sequence (IS) element. Our analysis of deposited sequence data for a number of other IS186 insertion sites permitted the deductions that (i) the consensus target site sequence for IS186transposition is 5′-(G)≥4(N)3–6(C)≥4-3′, (ii) the associated host sequence duplication varies within the range of 6 to 12 bp and encompasses the N(3–6) sequence, and (iii) in a majority of instances, at least one end of the duplication is at the G-N (or N-C) junction. IS186-related sequences were absent in closely related bacterium Salmonella entericaserovar Typhimurium, indicating that this IS element is a recent acquisition in the evolutionary history of E. coli.


2020 ◽  
Author(s):  
Barbara Zehentner ◽  
Zachary Ardern ◽  
Michaela Kreitmeier ◽  
Siegfried Scherer ◽  
Klaus Neuhaus

SUMMARYThe genetic code allows six reading frames at a double-stranded DNA locus, and many open reading frames (ORFs) overlap extensively with ORFs of annotated genes (e.g., at least 30 bp or having an embedded ORF). Currently, bacterial genome annotation systematically discards embedded overlapping ORFs of genes (OLGs) due to an assumed information-content constraint, and, consequently, very few OLGs are known. Here we use strand-specific RNAseq and ribosome profiling, detecting about 200 embedded or partially overlapping ORFs of gene candidates in the pathogen E. coli O157:H7 EDL933. These are typically short, many of them show clear promoter motifs as determined by Cappable-seq, indistinguishable from those of annotated genes, and are expressed at a low level. We could express most of them as stable proteins, and 49 displayed a potential phenotype. Ribosome profiling analyses in three other E. coli strains predicted between 84 and 190 embedded antisense OLGs per strain except in E. coli K-12, which is an atypical lab strain. We also found evidence of homology to annotated genes for 100 to 300 OLGs per E. coli strain investigated. Based on this evidence we suggest that bacterial OLGs deserve attention with respect to genome annotation and coding complexity of bacterial genomes. Such sequences may constitute an important coding reserve, opening up new research in genetics and evolutionary biology.


Sign in / Sign up

Export Citation Format

Share Document