Graphical illustration for explaining mass spectrum fingerprinting in microbial identification
Pattern recognition is commonly used for identifying an unknown entity from a set of known objects curated in a database – and find use in various applications such as fingerprint matching and microbial identification. Whether matrix-assisted laser desorption/ionization time-of-flight mass spectrometry (MALDI-TOF MS) or electrospray ionization tandem mass spectrometry (ESI MS/MS), mass spectrometry is increasingly used in identifying microbes in the research and clinical settings via species- or strain-specific mass spectrum signatures. Although the existence of unique biomarkers - such as ribosomal proteins - underpins mass spectrometry-based microbial identification, absence of corresponding genome or proteome information in publicly accessible databases for a large fraction of extant microbes significantly hamper biomarker (and species) assignment. Nevertheless, the reproducible generation of species-specific mass spectrum across different growth and environmental conditions opens up the possibility of identifying unknown microbes via comparing peak positions between mass spectra, without biomarker identities. Thus, the mass spectrum fingerprinting (pattern recognition) approach circumvents the need for biomarker information, where alignment of as many mass peaks as possible (particularly, those of phylogenetic significance) between spectra is the basis for identification. In contrast, variation in gene expression and metabolism (and biomolecules’ abundances) with environmental and nutritional factors, meant that alignment of peak intensities, though desired, is not a strict requirement in species annotation. With large diversity of biomolecules present in each microbial species, mass spectrometry-based microbial identification is inherently data-intensive, which requires statistical tools and computers for implementing pattern recognition. Nevertheless, relegation of algorithmic details to the backend of software obfuscates the approach’s conceptual underpinnings and hinders students’ understanding. More important, mathematics-centric approaches for explaining the conceptual basis of pattern recognition, though useful, are generally less pedagogically accessible to students relative to visual illustration techniques. This short primer describes a simple graphical illustration (featuring three examples common in mass spectrometry-based biotyping workflows) that attempts to explain the conceptual underpinnings of mass spectrum fingerprinting, and highlights caveats for avoiding misidentifications.