Long-read-sequenced reference genomes of the seven major lineages of enterotoxigenic Escherichia coli (ETEC) circulating in modern time
AbstractBackgroundEnterotoxigenic Escherichia coli (ETEC) is an enteric pathogen responsible for the majority of diarrheal cases worldwide. ETEC infections are estimated to cause 80,000 fatalities per year, with the highest burden, ca 75 million cases per year, amongst children under five years of age in resource poor countries. It is also the leading cause of diarrhoea in travellers. Previous large-scale sequencing studies have found seven major ETEC lineages currently in circulation world-wide.MethodsHere we have used PacBio long read sequencing in combination with Illumina sequencing to create high quality complete reference genomes for each of these lineages with manually curated chromosomes and plasmids. The plasmids carrying ETEC virulence genes were compared to other available long-read sequenced ETEC strains using blastn.ResultsThe ETEC reference strains harbour between two and five plasmids, including virulence, antibiotic resistance and phage-plasmids. The virulence plasmids carrying the colonization factors are highly conserved as shown by comparison with plasmids with other ETEC strains and confirm that the plasmids and chromosomes of ETEC are both crucial for ETEC virulence and superiority as pathogens.ConclusionWe confirm that the major ETEC lineages all harbor conserved plasmids that have been associated to their respective background genomes for decades. The in-depth analysis of gene content and order and correct annotations of plasmids will help to elucidate other plasmids with and without virulence factors in related bacterial species. These reference genomes allow for rapid and accurate comparison between different ETEC strains and these data will form the foundation of ETEC genomics research for years to come.