scholarly journals RFPlasmid: Predicting plasmid sequences from short read assembly data using machine learning

Author(s):  
Linda van der Graaf van Bloois ◽  
Jaap A. Wagenaar ◽  
Aldert L. Zomer

AbstractAntimicrobial resistance (AMR) genes in bacteria are often carried on plasmids and these plasmids can transfer AMR genes between bacteria. For molecular epidemiology purposes and risk assessment, it is important to know if the genes are located on highly transferable plasmids or in the more stable chromosomes. However, draft whole genome sequences are fragmented, making it difficult to discriminate plasmid and chromosomal contigs. Current methods that predict plasmid sequences from draft genome sequences rely on single features, like k-mer composition, circularity of the DNA molecule, copy number or sequence identity to plasmid replication genes, all of which have their drawbacks, especially when faced with large single copy plasmids, which often carry resistance genes. With our newly developed prediction tool RFPlasmid, we use a combination of multiple features, including k-mer composition and databases with plasmid and chromosomal marker proteins, to predict if the likely source of a contig is plasmid or chromosomal. The tool RFPlasmid supports models for 17 different bacterial species, including Campylobacter, E. coli, and Salmonella, and has a species agnostic model for metagenomic assemblies or unsupported organisms. RFPlasmid is available both as standalone tool and via web interface.

2021 ◽  
Vol 7 (11) ◽  
Author(s):  
Linda van der Graaf-van Bloois ◽  
Jaap A. Wagenaar ◽  
Aldert L. Zomer

Antimicrobial-resistance (AMR) genes in bacteria are often carried on plasmids and these plasmids can transfer AMR genes between bacteria. For molecular epidemiology purposes and risk assessment, it is important to know whether the genes are located on highly transferable plasmids or in the more stable chromosomes. However, draft whole-genome sequences are fragmented, making it difficult to discriminate plasmid and chromosomal contigs. Current methods that predict plasmid sequences from draft genome sequences rely on single features, like k-mer composition, circularity of the DNA molecule, copy number or sequence identity to plasmid replication genes, all of which have their drawbacks, especially when faced with large single-copy plasmids, which often carry resistance genes. With our newly developed prediction tool RFPlasmid, we use a combination of multiple features, including k-mer composition and databases with plasmid and chromosomal marker proteins, to predict whether the likely source of a contig is plasmid or chromosomal. The tool RFPlasmid supports models for 17 different bacterial taxa, including Campylobacter , Escherichia coli and Salmonella , and has a taxon agnostic model for metagenomic assemblies or unsupported organisms. RFPlasmid is available both as a standalone tool and via a web interface.


2017 ◽  
Vol 5 (44) ◽  
Author(s):  
Yohei Kumagai ◽  
Susumu Yoshizawa ◽  
Keiji Nakamura ◽  
Yoshitoshi Ogura ◽  
Tetsuya Hayashi ◽  
...  

ABSTRACT Pseudomonas aeruginosa is one of the most common model bacterial species, and genomes of hundreds of strains of this species have been sequenced to date. However, currently there is only one available genome of an oceanic isolate. Here, we report two complete and six draft genome sequences of P. aeruginosa isolates from the open ocean.


2019 ◽  
Vol 8 (33) ◽  
Author(s):  
Antony T. Vincent ◽  
Alain Le Breton ◽  
Alex Bernatchez ◽  
Cynthia Gagné-Thivierge ◽  
Valérie E. Paquet ◽  
...  

The bacterial species Aeromonas salmonicida officially has five subspecies. A large majority of the currently available sequences come from Aeromonas salmonicida subsp. salmonicida, which causes furunculosis in salmonids. We present the genomic sequences of four Aeromonas salmonicida subsp. achromogenes strains. This will help increase the robustness of genomic analyses for this subspecies.


Author(s):  
Angelina A. Kislichkina ◽  
Mikhail E. Platonov ◽  
Yury P. Skryabin ◽  
Angelika A. Sizova ◽  
Lidia A. Shishkina ◽  
...  

Yersinia kristensenii is one of the Yersinia enterocolitica -like bacterial species, which are considered nonpathogenic to humans. In this work, we reported the draft genome sequences of six Yersinia kristensenii strains. These draft genomes will help to better characterize Yersinia kristensenii at the genomic level.


2016 ◽  
Vol 4 (4) ◽  
Author(s):  
Brock A. Arivett ◽  
Dave C. Ream ◽  
Steven E. Fiester ◽  
Destaalem Kidane ◽  
Luis A. Actis

Members of theEscherichia colibacterial family have been grouped as ESKAPE (Enterococcus faecium,Staphylococcus aureus,Klebsiella pneumoniae,Acinetobacter baumannii,Pseudomonas aeruginosa, andEnterobacterspecies) pathogens because of their extensive drug resistance phenotypes and increasing threat to human health. The genomes of six extended-spectrum β-lactamase (ESBL)-producingE. colistrains isolated from wounded military personnel were sequenced and annotated.


2018 ◽  
Vol 6 (21) ◽  
Author(s):  
Aixia Xu ◽  
James R. Johnson ◽  
Shiowshuh Sheen ◽  
David S. Needleman ◽  
Christopher Sommers

ABSTRACT Potential extraintestinal pathogenic Escherichia coli strains DP254, WH333, WH398, F356, FEX675, and FEX725 were isolated from retail chicken meat products. Here, we report the draft genome sequences for these six E. coli isolates, which are currently being used in food safety research.


2017 ◽  
Vol 5 (11) ◽  
Author(s):  
Soo Jin Jeon ◽  
Federico Cunha ◽  
Amber Ginn ◽  
KwangCheol Casey Jeong ◽  
Klibs N. Galvão

ABSTRACT Escherichia coli is involved in the pathogenicity of metritis in cows. We report here the genome sequences of E. coli strains isolated at calving from the uterus, vagina, vulva, and rectoanal junction of a dairy cow that later developed metritis. The genomic similarities will give an insight into phylogenetic relationships among strains.


2021 ◽  
Vol 9 (3) ◽  
pp. 608
Author(s):  
Dustin A. Therrien ◽  
Kranti Konganti ◽  
Jason J. Gill ◽  
Brian W. Davis ◽  
Andrew E. Hillhouse ◽  
...  

In 2013, the U.S. Department of Agriculture Food Safety and Inspection Service (USDA-FSIS) began transitioning to whole genome sequencing (WGS) for foodborne disease outbreak- and recall-associated isolate identification of select bacterial species. While WGS offers greater precision, certain hurdles must be overcome before widespread application within the food industry is plausible. Challenges include diversity of sequencing platform outputs and lack of standardized bioinformatics workflows for data analyses. We sequenced DNA from USDA-FSIS approved, non-pathogenic E. coli surrogates and a derivative group of rifampicin-resistant mutants (rifR) via both Oxford Nanopore MinION and Illumina MiSeq platforms to generate and annotate complete genomes. Genome sequences from each clone were assembled separately so long-read, short-read, and combined sequence assemblies could be directly compared. The combined sequence data approach provides more accurate completed genomes. The genomes from these isolates were verified to lack functional key E. coli elements commonly associated with pathogenesis. Genetic alterations known to confer rifR were also identified. As the food industry adopts WGS within its food safety programs, these data provide completed genomes for commonly used surrogate strains, with a direct comparison of sequence platforms and assembly strategies relevant to research/testing workflows applicable for both processors and regulators.


2021 ◽  
Vol 10 (23) ◽  
Author(s):  
Michael J. Sikorski ◽  
Tracy H. Hazen ◽  
Gopi Vyas ◽  
Jane M. Michalski ◽  
David A. Rasko

There are six described pathotypes of Escherichia coli that cause significant clinical illness in humans. Enteroinvasive E. coli (EIEC) strains have been shown to be separated into three phylogenomic clades. To add to a limited body of EIEC genomic data, we report two high-quality draft genomes representing different EIEC phylogenomic clades.


2019 ◽  
Author(s):  
Kaleb Abram ◽  
Zulema Udaondo ◽  
Carissa Bleker ◽  
Visanu Wanchai ◽  
Trudy M. Wassenaar ◽  
...  

ABSTRACTThe explosion of microbial genome sequences in public databases allows for large-scale population genomic studies of bacterial species, such as Escherichia coli. In this study, we examine and classify more than one hundred thousand E. coli and Shigella genomes. After removing outliers, a semi-automated Mash-based analysis of 10,667 assembled genomes reveals 14 distinct phylogroups. A representative genome or medoid identified for each phylogroup serves as a proxy to classify more than 95,000 unassembled genomes. This analysis shows that most sequenced E. coli genomes belong to 4 phylogroups (A, C, B1 and E2(O157)). Authenticity of the 14 phylogroups described is supported by pangenomic and phylogenetic analyses, which show differences in gene preservation between phylogroups. A phylogenetic tree constructed with 2,613 single copy core genes along with a matrix of phylogenetic profiles is used to confirm that the 14 phylogroups change at different rates of gene gain/loss/duplication. The methodology used in this work is able to identify previously uncharacterized phylogroups in E. coli species. Some of these new phylogroups harbor clonal strains that have undergone a process of genomic adaptation to the acquisition of new genomic elements related to virulence or antibiotic resistance. This is, to our knowledge, the largest E. coli genome dataset analyzed to date and provides valuable insights into the population structure of the species.


Sign in / Sign up

Export Citation Format

Share Document