Graphical illustration for explaining mass spectrum fingerprinting in microbial identification
Pattern recognition is a common approach for identifying an unknown entity from a set of known objects curated in a database – and find use in various data processing applications such as microbial identification. Whether matrix-assisted laser desorption/ionization time-of-flight mass spectrometry (MALDI-TOF MS) or electrospray ionization tandem mass spectrometry (ESI MS/MS), mass spectrometry techniques are increasingly used for identifying microbes in the research and clinical settings via species- or strain-specific mass spectrum signatures. Although the existence of unique biomarkers - such as ribosomal proteins - underpins mass spectrometry-enabled microbial identification, lack of corresponding genome or proteome information in publicly accessible databases for a large fraction of extant microbes significantly hamper biomarker (and species) assignment. Nevertheless, the reproducible generation of species-specific mass spectrum across different growth and environmental conditions opens up the possibility of identifying unknown microbes via comparing peak positions between mass spectra, without requiring knowledge of biomarker molecular identities. Thus, the mass spectrum fingerprinting (or pattern recognition) approach circumvents the need for biomarker information. Alignment of as many mass peaks as possible (particularly, those of phylogenetic significance) between spectra is the basis of mass spectrum fingerprinting. In contrast, variation in gene expression and metabolism (and hence, biomolecules’ abundances) with environmental and nutritional factors, meant that alignment of peak intensities, though desired, is not a strict requirement for identification. With large diversity of biomolecules present in each microbial species, mass spectrometry-based microbial identification is inherently data-intensive; thereby, requiring statistical tools and computational implementation of the pattern recognition approach, which is incorporated in software packages of microbial typing instruments. Nevertheless, relegation of algorithmic details of pattern recognition to the backend of software obfuscates the approach’s conceptual underpinnings and hinders students’ understanding. More important, mathematics-centric approaches for explaining the conceptual basis of pattern recognition, though useful, are generally less pedagogically accessible to life science students relative to visual illustration techniques. This short primer describes a simple graphical illustration (featuring three examples common in mass spectrometry-based biotyping workflows) that attempts to explain the conceptual underpinnings of mass spectrum fingerprinting, and highlights caveats for avoiding misidentification.