Comparison Of Multi-locus Sequence Typing software For next generation sequencing data
ABSTRACTMulti-locus sequence typing (MLST) is a widely used method for categorising bacteria. Increasingly MLST is being performed using next generation sequencing data by reference labs and for clinical diagnostics. Many software applications have been developed to calculate sequence types from NGS data; however, there has been no comprehensive review to date on these methods. We have compared six of these applications against real and simulated data and present results on: 1. the accuracy of each method against traditional typing methods, 2. the performance on real outbreak datasets, 3. in the impact of contamination and varying depth of coverage, and 4. the computational resource requirements.DATA SUMMARYSimulated reads for datasets testing coverage and mixed samples have been deposited in Figshare; DOI:https://doi.org/10.6084/m9.figshare.4602301.vlOutbreak databases are available from Github; url -https://github.com/WGS-standards-and-analysis/datasetsDocker containers used to run each of the applications are available from Github; url –https://tinyurl.com/z7ks2ftAccession numbers for the data used in this paper are available in the Supplementary material.We confirm all supporting data, code and protocols have been provided within the article or through supplementary data files. ☒IMPACT STATEMENTSequence typing is rapidly transitioning from traditional sequencing methods to using whole genome sequencing. A number ofin silicoprediction methods have been developed on anad hocbasis and aim to replicate Multi-locus sequence typing (MLST). This is the first study to comprehensively evaluate multiple MLST software applications on real validated datasets and on common simulated difficult cases. It will give researchers a clearer understanding of the accuracy, limitations and computational performance of the methods they use, and will assist future researchers to choose the most appropriate method for their experimental goals.