Bacterial species identification from MALDI-TOF mass spectra through data analysis and machine learning

2011 ◽  
Vol 34 (1) ◽  
pp. 20-29 ◽  
Author(s):  
Katrien De Bruyne ◽  
Bram Slabbinck ◽  
Willem Waegeman ◽  
Paul Vauterin ◽  
Bernard De Baets ◽  
...  
2022 ◽  
Author(s):  
Caroline Weis ◽  
Aline Cuénod ◽  
Bastian Rieck ◽  
Olivier Dubuis ◽  
Susanne Graf ◽  
...  

2020 ◽  
Vol 36 (Supplement_1) ◽  
pp. i30-i38
Author(s):  
Caroline Weis ◽  
Max Horn ◽  
Bastian Rieck ◽  
Aline Cuénod ◽  
Adrian Egli ◽  
...  

Abstract Motivation Microbial species identification based on matrix-assisted laser desorption ionization time-of-flight (MALDI-TOF) mass spectrometry (MS) has become a standard tool in clinical microbiology. The resulting MALDI-TOF mass spectra also harbour the potential to deliver prediction results for other phenotypes, such as antibiotic resistance. However, the development of machine learning algorithms specifically tailored to MALDI-TOF MS-based phenotype prediction is still in its infancy. Moreover, current spectral pre-processing typically involves a parameter-heavy chain of operations without analyzing their influence on the prediction results. In addition, classification algorithms lack quantification of uncertainty, which is indispensable for predictions potentially influencing patient treatment. Results We present a novel prediction method for antimicrobial resistance based on MALDI-TOF mass spectra. First, we compare the complex conventional pre-processing to a new approach that exploits topological information and requires only a single parameter, namely the number of peaks of a spectrum to keep. Second, we introduce PIKE, the peak information kernel, a similarity measure specifically tailored to MALDI-TOF mass spectra which, combined with a Gaussian process classifier, provides well-calibrated uncertainty estimates about predictions. We demonstrate the utility of our approach by predicting antibiotic resistance of three clinically highly relevant bacterial species. Our method consistently outperforms competitor approaches, while demonstrating improved performance and security by rejecting out-of-distribution samples, such as bacterial species that are not represented in the training data. Ultimately, our method could contribute to an earlier and precise antimicrobial treatment in clinical patient care. Availability and implementation We make our code publicly available as an easy-to-use Python package under https://github.com/BorgwardtLab/maldi_PIKE.


2018 ◽  
Author(s):  
Wenfa Ng

Microbes are identified based on their distinguishing characteristics such as gene sequence or metabolic profile. Nucleic acid approaches such as 16S rRNA gene sequencing provide the gold standard method for microbial identification in the contemporary era. However, mass spectrometry-based microbial identification is gaining credence through ease of use, speed, and reliability. Specifically, matrix-assisted laser desorption/ionization time-of-flight mass spectrometry (MALDI-TOF MS) has been used in identifying bacteria, fungus, molds and archaea to the species level with high accuracy. The approach relies on the existence of unique mass spectrum fingerprint for individual microbial species. By comparing the mass spectrum of an unknown microbe with that catalogued in a reference database of known microorganisms, microbes could be identified through mass spectrum fingerprinting. However, the approach lacks fundamental biological basis given the relative difficulty in assigning specific protein to particular mass peak in the profiled mass spectrum, which hampers a deeper understanding of the mass spectrum obtained. This study seeks to examine the existence of conserved mass peaks in MALDI-TOF mass spectra of bacterial strains belonging to the same species in open access data from SpectraBank. Results revealed that conserved mass peaks existed for all bacterial species examined (Bacillus subtilis, Bacillus thuringiensis, Carnobacterium maltaaromaticum, Escherichia coli, Proteus vulgaris, Pseudomonas fluorescens, Pseudomonas fragi, Pseudomonas putida, Pseudomonas syringae, Serratia marcescens, Serratia proteamaculans, Staphylococcus aureus, and Stenotrophomonas maltophilia). Large number of conserved mass peaks such as that of E. coli might suggest more closely-related strains of a species though functional annotation of the mass peaks is required to provide deeper understanding of the mechanisms underlying the conservation of specific proteins. On the other hand, strains of S. aureus and P. putida had the least number of conserved mass peaks. Presence of conserved mass peaks in the genus Pseudomonas and Serratia provided further evidence that MALDI-TOF MS microbial identification had a biological basis in identification of microbial species to the genus level. On the other hand, it also highlighted that a subset of proteins could define the taxonomical boundary between the species and genus level. Overall, existence of conserved mass peaks in strains of the same bacterial species provided evidence of a firm biological basis in the mass spectrum fingerprinting approach of MALDI-TOF MS microbial identification. This could help identify specific species in mass spectrum of single or multiple microbial species. Further functional annotation of the conserved mass peaks could illuminate in greater detail the biological mysteries of why certain proteins are conserved in specific genus and species.


2020 ◽  
Author(s):  
Wenfa Ng

AbstractAlthough MALDI-TOF mass spectrometry based microbial identification has achieved a level of accuracy that facilitate its use in classifying microbes to the species and strain level, questions remain on the identities of the mass peaks profiled from individual microbial species. Specifically, in the popular approach of comparing the mass spectrum of known and unknown microbes for identification purposes, the identities of the mass peaks were not taken into consideration. This study sought to determine if ribosomal proteins could account for some of the mass peaks profiled in MALDI-TOF mass spectra of different bacterial species. Using calculated molecular mass of ribosomal proteins for annotating mass peaks in bacterial species’ MALDI-TOF mass spectra downloaded from the SpectraBank database, this study revealed that ribosomal proteins could account for the low molecular weight mass peaks of <10000 Da. However, contrary to published reports, ribosomal proteins could not account for most of the mass peaks profiled. In particular, the data revealed that between 1 and 6 ribosomal protein mass peaks could be annotated in each mass spectrum. Annotated ribosomal proteins were S16, S17, S18, S20 and S21 from the small ribosome subunit, and L27, L28, L29, L30, L31, L31 Type B, L32, L33, L34, L35 and L36 from the large ribosome subunit. The ribosomal proteins with the most number of mass peak annotations were L36 and L29, with L34, L33, and L31 completing the list of ribosomal proteins with large number of annotations. Given the highly conserved nature of most ribosomal proteins, possible phylogenetic significance of the annotated ribosomal proteins were investigated through reconstruction of maximum likelihood phylogenetic trees. Results revealed that except for ribosomal protein L34, L31, L36 and S18, all annotated ribosomal proteins hold phylogenetic significance under the criteria of recapitulation of phylogenetic cluster groups present in the phylogeny of 16S rRNA. Phylogenetic significance of the annotated ribosomal proteins was further verified by the phylogenetic tree constructed based on the concatenated amino acid sequence of L29, S16, S20, S17, L27 and L35. Finally, analysis of the structure of the annotated ribosomal proteins did not reveal a high conservation of structure of the ribosomal proteins. Collectively, small low molecular weight (<10000 Da) ribosomal proteins could annotate some of the mass peaks in MALDI-TOF mass spectra of various bacterial species, and most of the ribosomal proteins hold phylogenetic significance. However, structural analysis did not identify a conserved structure for the annotated ribosomal proteins. Annotation of ribosomal protein mass peaks in MALDI-TOF mass spectra highlighted the deep biological basis inherent in the mass spectrometry-based microbial identification method. Subject areas biochemistry, biotechnology, microbiology, evolution, ecologySignificance of the workWhile MALDI-TOF MS has been successfully used in identification of different microbes to the species and strain level through the comparison of mass spectra of known and unknown microbes, the approach (known as mass spectrum fingerprinting) remains lacking in the biological basis that underpins the technique. This study sought to uncover some of the biological basis that underpins MALDI-TOF MS microbial identification through the annotation of profiled mass peaks with ribosomal proteins. Previous studies have linked different ribosomal proteins to mass peaks in MALDI-TOF mass spectra of bacteria; however, broad spectrum verification of the finding across multiple species across different genera remain lacking. Using a collection of MALDI-TOF mass spectra of 110 bacterial species and strains catalogued in SpectraBank, this study sought to annotate ribosomal protein mass peaks in the mass spectra. Results revealed that small, low molecular weight ribosomal proteins of molecular mass < 10000 Da could annotate between 1 and 6 mass peaks in the catalogued mass spectra. This was smaller than the number of ribosomal proteins mass peaks postulated by previous studies. Overall, 16 ribosomal proteins (S16, S17, S18, S20, S21, L27, L28, L29, L30, L31, L31 Type B, L32, L33, L34, L35, and L36) were annotated with the most number of mass peaks annotations coming from L36 and L29. Reconstruction of phylogenetic trees of the annotated ribosomal proteins revealed that most of the ribosomal proteins hold phylogenetic significance with respect to the phylogeny of 16S rRNA. This provided further evidence that a deep biological basis is present in the approach of using mass spectrometry profiling of biomolecules for identifying bacterial species.HighlightsRibosomal protein mass peaks were annotated in MALDI-TOF mass spectra of bacterial species across multiple genera.Annotated ribosomal proteins were S16, S17, S18, S20, S21 for the small ribosome subunit, and L27, L28, L29, L30, L31, L31 Type B, L32, L33, L34, L35, L36 for the large ribosome subunit.Between 1 and 6 ribosomal protein mass peaks were annotated per mass spectrum, a number significantly lower than that implied by other studies.Annotated ribosomal proteins were small, low molecular weight ribosomal proteins of molecular mass < 10000 Da.Phylogenetic tree reconstruction revealed the phylogenetic significance of most annotated ribosomal proteins except ribosomal protein L34, L31, L36 and S18.Multi-locus sequence typing of L29, S16, S20, S17, L27 and L35 further showed the phylogenetic significance of ribosomal proteins in recapitulating the phylogeny of 16S rRNA.Structural analysis of annotated ribosomal proteins did not find conserved structure. Thus, the reasons for the annotation of particular ribosomal proteins over others remain unknown.


2021 ◽  
Vol 9 (2) ◽  
pp. 416
Author(s):  
Charles Dumolin ◽  
Charlotte Peeters ◽  
Evelien De Canck ◽  
Nico Boon ◽  
Peter Vandamme

Culturomics-based bacterial diversity studies benefit from the implementation of MALDI-TOF MS to remove genomically redundant isolates from isolate collections. We previously introduced SPeDE, a novel tool designed to dereplicate spectral datasets at an infraspecific level into operational isolation units (OIUs) based on unique spectral features. However, biological and technical variation may result in methodology-induced differences in MALDI-TOF mass spectra and hence provoke the detection of genomically redundant OIUs. In the present study, we used three datasets to analyze to which extent hierarchical clustering and network analysis allowed to eliminate redundant OIUs obtained through biological and technical sample variation and to describe the diversity within a set of spectra obtained from 134 unknown soil isolates. Overall, network analysis based on unique spectral features in MALDI-TOF mass spectra enabled a superior selection of genomically diverse OIUs compared to hierarchical clustering analysis and provided a better understanding of the inter-OIU relationships.


2019 ◽  
Vol 18 (12) ◽  
pp. 2492-2505 ◽  
Author(s):  
Florence Roux-Dalvai ◽  
Clarisse Gotti ◽  
Mickaël Leclercq ◽  
Marie-Claude Hélie ◽  
Maurice Boissinot ◽  
...  

2017 ◽  
Vol 53 (2) ◽  
pp. 162-171 ◽  
Author(s):  
Andrea R. Kelley ◽  
Madeline E. Colley ◽  
George Perry ◽  
Stephan B.H. Bach

Sign in / Sign up

Export Citation Format

Share Document