PeakBot: Machine learning based chromatographic peak picking
AbstractMotivationChromatographic peak picking is among the first steps in software pipelines for processing LC-HRMS datasets in untargeted metabolomics applications. Its performance is crucial for the holistic detection of all metabolic features as well as their relative quantification for statistical analysis and metabolite identification. Unfortunately, random noise, non-baseline separated compounds and unspecific background signals complicate this task.ResultsA machine-learning framework entitled PeakBot was developed for detecting chromatographic peaks in LC-HRMS profile-mode data. It first detects all local signal maxima in a chromatogram, which are then extracted as super-sampled standardized areas (retention time vs. m/z). These are subsequently inspected by a custom-trained convolutional neural network that forms the basis of PeakBot’s architecture. The model reports if the respective local maximum is the apex of a chromatographic peak or not as well as its peak center and bounding box.In independent training and validation datasets used for development, PeakBot achieved a high performance with respect to discriminating between chromatographic peaks and background signals (F1 score of 0.99). A comparison of different sets of reference features showed that at least 100 reference features (including isotopologs) should be provided to achieve high-quality results for detecting new chromatographic peaks.PeakBot is implemented in Python (3.8) and uses the TensorFlow (2.4.1) package for machine-learning related tasks. It has been tested on Linux and Windows OSs.AvailabilityThe framework is available free of charge for non-commercial use (CC BY-NC-SA). It is available at https://github.com/christophuv/[email protected] informationSupplementary data are available at Bioinformatics online.