Background:
Since bacteria are the earliest known organisms, there has been significant interest in their
variety and biology, most certainly concerning human health. Recent advances in Metagenomics sequencing
(mNGS), a culture-independent sequencing technology have facilitated an accelerated development in clinical
microbiology and our understanding of pathogens.
Objective:
For the implementation of mNGS in routine clinical practice to become feasible, a practical and scalable
strategy for the study of mNGS data is essential. This study presents a robust automated pipeline to analyze clinical
metagenomic data for pathogen identification and classification.
Method:
The proposed Clin-mNGS pipeline is an integrated, open-source, scalable, reproducible, and user-friendly
framework scripted using the Snakemake workflow management software. The implementation avoids the hassle of
manual installation and configuration of the multiple command-line tools and dependencies. The approach directly
screens pathogens from clinical raw reads and generates consolidated reports for each sample.
Results:
The pipeline is demonstrated using publicly available data and is tested on a desktop Linux system and a
High-performance cluster. The study compares variability in results from different tools and versions. The versions
of the tools are made user modifiable. The pipeline results in quality check, filtered reads, host subtraction,
assembled contigs, assembly metrics, relative abundances of bacterial species, antimicrobial resistance genes,
plasmid finding, and virulence factors identification. The results obtained from the pipeline are evaluated based on
sensitivity and positive predictive value.
Conclusion:
Clin-mNGS is an automated Snakemake pipeline validated for the analysis of microbial clinical
metagenomics reads to perform taxonomic classification and antimicrobial resistance prediction.