Comparative Pathogenomics of
Escherichia coli
: Polyvalent Vaccine Target Identification Through Virulome Analysis
Comparative genomics of bacterial pathogens has been useful for revealing potential virulence factors. Escherichia coli is a significant cause of human morbidity and mortality worldwide but can also exist as a commensal in the human gastrointestinal tract. With many sequenced genomes, it has served as a model organism for comparative genomic studies to understand the link between genetic content and potential for virulence. To date, however, no comprehensive analysis of its complete “virulome” has been performed for the purpose of identifying universal or pathotype-specific targets for vaccine development. Here, we describe the construction of a pathotype database of 107 well-characterized completely sequenced pathogenic and non-pathogenic E. coli strains, which we annotated for major virulence factors (VFs). Data are cross referenced for patterns against pathotype, phylogroup, and sequence type and results verified against all 1,348 complete E. coli chromosomes in the NCBI RefSeq database. Our results demonstrate that phylogroup drives many of the “pathotype-associated” VFs, and ExPEC-associated VFs are found predominantly within the B2/D/F/G phylogenetic clade, suggesting these phylogroups are more adapted to infect human hosts. Finally, we used this information to propose polyvalent vaccine targets with specificity towards extraintestinal strains, targeting key invasive strategies including immune evasion (group 2 capsule), iron acquisition (FyuA, IutA, Sit), adherence (SinH, Afa, Pap, Sfa, Iha), and toxins (Usp, Sat, Vat, Cdt, Cnf1, HlyA). While many of these targets have been proposed before, this work is the first to examine their pathotype and phylogroup distribution and how they may be targeted together to prevent disease.