Towards routine employment of computational tools for antimicrobial resistance determination via high-throughput sequencing
Antimicrobial resistance (AMR) is a growing threat to public health and farming at large. Without appropriate interventions, it can lead to millions of deaths per year and substantial economic loss worldwide. In clinical and veterinary practice, a timely characterization of the antibiotic susceptibility profile of bacterial infections is a crucial step in optimizing treatment. Fast turnaround of AMR testing is also needed in food safety and infection control surveillance (e.g., contamination of healthcare or long-term nursing facilities). High-throughput sequencing is a promising option for clinical point-of-care and ecological surveillance, opening the opportunity to develop genotyping-based AMR determination as a possibly faster alternative to phenotypic testing. In the present work, we compare the performance of state-of-the-art methods for detection of AMR from high-throughput sequencing data in healthcare settings. We consider five complementary computational approaches --alignment (AMRPlusPlus), deep learning (DeepARG), k-mer genomic signatures (KARGA, ResFinder), and hidden Markov models (Meta-MARC). We use an extensive collection of clinical studies never employed for model training. To do so, we assemble data from multiple, independent AMR high-throughput sequencing experiments collected in a variety of hospital settings, comprising of 585 isolates with a available AMR resistance profiles determined by phenotypic tests across nine antibiotic classes. We show how the prediction landscape of AMR classifiers is highly heterogeneous, with balanced accuracy varying from 0.4 to 0.92. Although some algorithms---ResFinder, KARGA, and AMRPlusPlus-- exhibit overall better balanced accuracy than others, the high per-AMR-class variance and related findings suggest that: (1) all algorithms might be subject to sampling bias present both in data repositories used for training and experimental/clinical settings; and (2) a portion of clinical samples might contain uncharacterized AMR genes that the algorithms---mostly trained on known AMR genes---fail to generalize upon. These results lead us to formulate practical advice for software configuration and application, as well as give suggestions for future study design to further develop AMR prediction tools from proof-of-concept to bedside.