MaSS-Simulator: A highly configurable MS/MS simulator for generating test datasets for big data algorithms
AbstractMass Spectrometry (MS) based proteomics has become an essential tool in the study of proteins. The big data from MS machines has led to the development of novel serial and parallel algorithmic tools. However, the absence of data benchmarks and ground truth makes the algorithmic integrity testing and reproducibility a challenging problem. To this end, we present MaSS-Simulator, which is an easy to use simulator and can be configured to generate MS/MS datasets for a wide variety of conditions with known ground truths. MaSS-Simulator offers a large number of configuration options to simulate control datasets with desired properties thus enabling rigorous and large scale algorithmic testing. We assessed 8,031 spectra generated by MaSS-Simulator by comparing them against the experimentally generated spectra of same peptides. Our results showed that MaSS-Simulator generated spectra were very close to the real-experimental spectra and had a relative-error distribution centered around 25%. In contrast the theoretical spectra for same peptides had relative-error distribution centered around 150%. Source code, executables and a user manual can be downloaded from https://github.com/pcdslab/MaSS-Simulator