A workflow to integrate pre-processing, analysis and comparison of MALDI-ToF mass spectra in GeenaR
Many large-scale proteomics studies have been performed in the last years, and this field of investigation is expanding up. If the analysis of any single spectrum can be performed by tools already made available along with the mass spectrometry (MS) instrumentation, comparison of spectra on a large scale represents a complex aspect of the analysis and interpretation of the study. Recently, we developed Geena 2, a tool for the automation of different steps in the MALDI/ToF MS data analysis. Integration of further tools can be performed, in order to improve some aspects of the whole workflow: the input of more data formats, the implementation of new algorithms for data cleaning, the graphical visualization and the reporting of the results, the use of advanced statistics for the comparison of mass spectra. For this motivations, we are now developing GeenaR, a new robust web tool for pre-processing, analysing, visualizing and comparing a set of MALDI-ToF mass spectra. The aim of this work is the presentation of on-going developments. The first results will be presented at the conference. GeenaR is being written in PHP, Perl (from Geena 2) and R languages. The R packages used are MALDIquant and MALDIquantForeign for mass spectra pre-processing and analysis, OrgMassSpecR for mass spectra comparison, dendextend and pvclust for clustering, and sda and crossval for variable selection. The system is being implemented in a LAMP (Linux, Apache, MySQL, PHP) environment. Proper interfaces between PHP on one side and perl and R on the other are then implemented. The aim of GeenaR is to provide to the users a wider range of statistical methods and graphical results, without making it more difficult to use for researchers with little expertise in programming. In order to achieve this goal, we have taken advantage of the availability of several packages, written in R language, for mass spectrometry statistical that are going to be integrated in the system. The complete pipeline of GeenaR includes some features already available in Geena 2 plus others under development thanks to the integration of the R environment. In fact, an original set of heuristic algorithms is already available in Geena 2. In particular, they are the identification of isotopic peaks by taking into account molecular weight of signals and the related trend of abundances; the normalization on the basis of a reference standard molecule; the peak selection by means of a threshold line, built by linearly interpolating values provided for given m/z values; the alignment, by selecting the nearest peaks, within a limited m/z difference, in the different mass spectra. By means of some R packages, GeenaR adds new statistical methods which are highly relevant for mass spectra analysis. (Abstract truncated at 3,000 characters - the full version is available in the pdf file)