A workflow to integrate pre-processing, analysis and comparison of MALDI-ToF mass spectra in GeenaR

Many large-scale proteomics studies have been performed in the last years, and this field of investigation is expanding up. If the analysis of any single spectrum can be performed by tools already made available along with the mass spectrometry (MS) instrumentation, comparison of spectra on a large scale represents a complex aspect of the analysis and interpretation of the study. Recently, we developed Geena 2, a tool for the automation of different steps in the MALDI/ToF MS data analysis. Integration of further tools can be performed, in order to improve some aspects of the whole workflow: the input of more data formats, the implementation of new algorithms for data cleaning, the graphical visualization and the reporting of the results, the use of advanced statistics for the comparison of mass spectra. For this motivations, we are now developing GeenaR, a new robust web tool for pre-processing, analysing, visualizing and comparing a set of MALDI-ToF mass spectra. The aim of this work is the presentation of on-going developments. The first results will be presented at the conference. GeenaR is being written in PHP, Perl (from Geena 2) and R languages. The R packages used are MALDIquant and MALDIquantForeign for mass spectra pre-processing and analysis, OrgMassSpecR for mass spectra comparison, dendextend and pvclust for clustering, and sda and crossval for variable selection. The system is being implemented in a LAMP (Linux, Apache, MySQL, PHP) environment. Proper interfaces between PHP on one side and perl and R on the other are then implemented. The aim of GeenaR is to provide to the users a wider range of statistical methods and graphical results, without making it more difficult to use for researchers with little expertise in programming. In order to achieve this goal, we have taken advantage of the availability of several packages, written in R language, for mass spectrometry statistical that are going to be integrated in the system. The complete pipeline of GeenaR includes some features already available in Geena 2 plus others under development thanks to the integration of the R environment. In fact, an original set of heuristic algorithms is already available in Geena 2. In particular, they are the identification of isotopic peaks by taking into account molecular weight of signals and the related trend of abundances; the normalization on the basis of a reference standard molecule; the peak selection by means of a threshold line, built by linearly interpolating values provided for given m/z values; the alignment, by selecting the nearest peaks, within a limited m/z difference, in the different mass spectra. By means of some R packages, GeenaR adds new statistical methods which are highly relevant for mass spectra analysis. (Abstract truncated at 3,000 characters - the full version is available in the pdf file)

Download Full-text

Suitable reporting for the reproducible research: an added value in the analysis of proteomics data

10.7287/peerj.preprints.2823 ◽

2017 ◽

Author(s):

Eugenio Del Prete ◽

Angelo Facchiano ◽

Aldo Profumo ◽

Claudia Angelini ◽

Paolo Romano

Keyword(s):

Mass Spectra ◽

Computer Code ◽

Added Value ◽

Reproducible Research ◽

Proteomics Data ◽

Statistical Validation ◽

Current Output ◽

Maldi Tof ◽

Maldi Tof Mass Spectra ◽

R Packages

Introduction Computational reproducibility refers to the possibility of reconstructing all the steps of a workflow that connects raw data, processed data and results: it is a fundamental issue in the omic studies because of the complex and high-dimensional nature of the involved data. The analysis of omics data needs to exploit multi-step workflows including pre-processing, elaboration, statistical validation, interpretation and presentation. Although some analysis platforms are able to ensure computational reproducibility for different omics studies, they do not provide explicit information about the executed code. The availability of the code increases the quality of research in terms of transparency and knowledge transfer. Moreover, it allows other researchers to reproduce the results in a local system, make a comparison among the results and re-use computer code for analyzing different dataset. Methods Geena 2 is a robust web tool for MALDI-ToF mass spectra pre-processing. Its main output is the list of common peaks identified by aligning average spectra originated from groups of replicates from different samples. Intermediate results are also made available. GeenaR is an extension of Geena 2 still under development. Its objective is the integration in the platform of some R libraries, which may provide advanced statistical analyses, thus enriching the current output. It is noteworthy that many R packages follow the reproducible research philosophy.For the aims of GeenaR, the following R packages and tools have been considered: R-Markdown, knitr and spin. The implementation of these resources on an existing web platform can be an added value for its reporting features, since it improves the creation of a report about the work carried out, especially with reference to the code. Results and Discussion One of the aims of both Geena 2 and GeenaR is facilitating the users in analyzing MALDI-ToF mass spectra by providing a web-interface that allows to upload data, select different algorithms and parameters, execute the analysis in order to obtain results according to a specific demand. Thanks to the novel reproducible research module implemented in GeenaR, the system generates a report containing all the steps performed. More in details, the report will provide: date and time of the execution, the R libraries used for the process, chunks of code for main elaborations, selected parameters (either by the users or by the system), uploaded data in MALDIquant ‘Mass Spectrum’ class type, numerical and graphical results, short explanation about the workflow, version of the system and of the packages. GeenaR generates the results in a compressed archive, with separated log and graphical results, and a report, both in R-Markdown and in HTML format. It is important to underline strongly that reproducible research is not an optional, but a fundamental component of a good computational practice, which becomes essential in computational biology.

Download Full-text

Characterization of aliphatic hyperbranched polyesters by MALDI-TOF mass spectrometry

Hemijska industrija ◽

10.2298/hemind0706333v ◽

2007 ◽

Vol 61 (6) ◽

pp. 333-341

Author(s):

Jasna Vukovic ◽

Slobodan Jovanovic ◽

Manfred Lechner

Keyword(s):

Mass Spectrometry ◽

Mass Spectra ◽

Degree Of Polymerization ◽

Side Reactions ◽

Maldi Tof Ms ◽

Hyperbranched Polyesters ◽

Maldi Tof ◽

Maldi Tof Mass Spectra ◽

Tof Ms

In this work, MALDI-TOF mass spectrometry was used for the characterization of aliphatic hyperbranched polyesters (AHBP), synthesized from 2,2-bis(hydroxymethyl)propionic acid (bis-MPA) and di-trimethylolpropane. From the obtained results it was concluded that it was not possible to take complete advantages of MALDI-TOF MS in this particular case, since the AHBP used in this work were polydisperse. The intensity of the signals from the high mass tail of these samples (pseudo generation higher than four) was underestimated and insufficient to distinguish it from the baseline and to use it for the analysis of the spectra. As a consequence of that, lower values of the Mn were obtained. At the same time, Mw were also underestimated, which led to very low values of the polydispersity index. On the other hand, it was possible to obtain molar masses of individual molecules from the MALDI-TOF mass spectra of AHBP and to qualitatively determine the extent of cyclization (side reactions) at each degree of polymerization. Using the adequate set of equations and results obtained from MALDI-TOF mass spectra of AHBP, every signal from the spectra was identified. The obtained results show that formation of poly(bis-MPA), intramolecular esterification and intramolecular etherification occurred as side reactions during the synthesis of these polyesters. The relative amount of the cycles increases with the number of pseudo generation (from the second up to the fifth pseudo generation). It was also observed that the relative proportion of the signals which represent cyclic structures increases with the increasing degree of polymerization. In this work the basic principles of MALDI-TOF MS are also presented, as well as, a review of adequate published articles.

Download Full-text

Suitable reporting for the reproducible research: an added value in the analysis of proteomics data

10.7287/peerj.preprints.2823v1 ◽

2017 ◽

Author(s):

Eugenio Del Prete ◽

Angelo Facchiano ◽

Aldo Profumo ◽

Claudia Angelini ◽

Paolo Romano

Keyword(s):

Mass Spectra ◽

Computer Code ◽

Added Value ◽

Reproducible Research ◽

Proteomics Data ◽

Statistical Validation ◽

Current Output ◽

Maldi Tof ◽

Maldi Tof Mass Spectra ◽

R Packages

Introduction Computational reproducibility refers to the possibility of reconstructing all the steps of a workflow that connects raw data, processed data and results: it is a fundamental issue in the omic studies because of the complex and high-dimensional nature of the involved data. The analysis of omics data needs to exploit multi-step workflows including pre-processing, elaboration, statistical validation, interpretation and presentation. Although some analysis platforms are able to ensure computational reproducibility for different omics studies, they do not provide explicit information about the executed code. The availability of the code increases the quality of research in terms of transparency and knowledge transfer. Moreover, it allows other researchers to reproduce the results in a local system, make a comparison among the results and re-use computer code for analyzing different dataset. Methods Geena 2 is a robust web tool for MALDI-ToF mass spectra pre-processing. Its main output is the list of common peaks identified by aligning average spectra originated from groups of replicates from different samples. Intermediate results are also made available. GeenaR is an extension of Geena 2 still under development. Its objective is the integration in the platform of some R libraries, which may provide advanced statistical analyses, thus enriching the current output. It is noteworthy that many R packages follow the reproducible research philosophy.For the aims of GeenaR, the following R packages and tools have been considered: R-Markdown, knitr and spin. The implementation of these resources on an existing web platform can be an added value for its reporting features, since it improves the creation of a report about the work carried out, especially with reference to the code. Results and Discussion One of the aims of both Geena 2 and GeenaR is facilitating the users in analyzing MALDI-ToF mass spectra by providing a web-interface that allows to upload data, select different algorithms and parameters, execute the analysis in order to obtain results according to a specific demand. Thanks to the novel reproducible research module implemented in GeenaR, the system generates a report containing all the steps performed. More in details, the report will provide: date and time of the execution, the R libraries used for the process, chunks of code for main elaborations, selected parameters (either by the users or by the system), uploaded data in MALDIquant ‘Mass Spectrum’ class type, numerical and graphical results, short explanation about the workflow, version of the system and of the packages. GeenaR generates the results in a compressed archive, with separated log and graphical results, and a report, both in R-Markdown and in HTML format. It is important to underline strongly that reproducible research is not an optional, but a fundamental component of a good computational practice, which becomes essential in computational biology.

Download Full-text

MALDI-TOF mass spectrometry for sub-typing of Streptococcus pneumoniae

BMC Microbiology ◽

10.1186/s12866-020-02052-7 ◽

2020 ◽

Vol 20 (1) ◽

Author(s):

Sivkheng Kann ◽

Sena Sao ◽

Chanleakhena Phoeung ◽

Youlet By ◽

Juliet Bryant ◽

...

Keyword(s):

Mass Spectrometry ◽

Mass Spectrum ◽

Streptococcus Pneumoniae ◽

Random Forest ◽

Hierarchical Clustering ◽

Mass Spectra ◽

Discriminatory Ability ◽

Maldi Tof Mass Spectrometry ◽

Maldi Tof ◽

Maldi Tof Mass Spectra

Abstract Background Serotyping of Streptococcus pneumoniae is important for monitoring of vaccine impact. Unfortunately, conventional and molecular serotyping is expensive and technically demanding. This study aimed to determine the ability of matrix-assisted laser desorption-ionisation time-of-flight (MALDI-TOF) mass spectrometry to discriminate between pneumococcal serotypes and genotypes (defined by global pneumococcal sequence cluster, GPSC). In this study, MALDI-TOF mass spectra were generated for a diverse panel of whole genome sequenced pneumococcal isolates using the bioMerieux VITEK MS in clinical diagnostic (IVD) mode. Discriminatory mass peaks were identified and hierarchical clustering was performed to visually assess discriminatory ability. Random forest and classification and regression tree (CART) algorithms were used to formally determine how well serotypes and genotypes were identified by MALDI-TOF mass spectrum. Results One hundred and ninety-nine pneumococci, comprising 16 serotypes and non-typeable isolates from 46 GPSC, were analysed. In the primary experiment, hierarchical clustering revealed poor congruence between MALDI-TOF mass spectrum and serotype. The correct serotype was identified from MALDI-TOF mass spectrum in just 14.6% (random forest) or 35.4% (CART) of 130 isolates. Restricting the dataset to the nine dominant GPSC (61 isolates / 13 serotypes), discriminatory ability improved slightly: the correct serotype was identified in 21.3% (random forest) and 41.0% (CART). Finally, analysis of 69 isolates of three dominant serotype-genotype pairs (6B-GPSC1, 19F-GPSC23, 23F-GPSC624) resulted in the correct serotype identification in 81.1% (random forest) and 94.2% (CART) of isolates. Conclusions This work suggests that MALDI-TOF is not a useful technique for determination of pneumococcal serotype. MALDI-TOF mass spectra appear more associated with isolate genotype, which may still have utility for future pneumococcal surveillance activities.

Download Full-text

GeenaR: A Web Tool for Reproducible MALDI-TOF Analysis

Frontiers in Genetics ◽

10.3389/fgene.2021.635814 ◽

2021 ◽

Vol 12 ◽

Author(s):

Eugenio Del Prete ◽

Angelo Facchiano ◽

Aldo Profumo ◽

Claudia Angelini ◽

Paolo Romano

Keyword(s):

Mass Spectrometry ◽

Mass Spectra ◽

Reproducible Research ◽

Strong Impact ◽

Web Tool ◽

Maldi Tof ◽

Maldi Tof Mass Spectra ◽

Applied Technology ◽

User Friendly ◽

Different Sources

Mass spectrometry is a widely applied technology with a strong impact in the proteomics field. MALDI-TOF is a combined technology in mass spectrometry with many applications in characterizing biological samples from different sources, such as the identification of cancer biomarkers, the detection of food frauds, the identification of doping substances in athletes’ fluids, and so on. The massive quantity of data, in the form of mass spectra, are often biased and altered by different sources of noise. Therefore, extracting the most relevant features that characterize the samples is often challenging and requires combining several computational methods. Here, we present GeenaR, a novel web tool that provides a complete workflow for pre-processing, analyzing, visualizing, and comparing MALDI-TOF mass spectra. GeenaR is user-friendly, provides many different functionalities for the analysis of the mass spectra, and supports reproducible research since it produces a human-readable report that contains function parameters, results, and the code used for processing the mass spectra. First, we illustrate the features available in GeenaR. Then, we describe its internal structure. Finally, we prove its capabilities in analyzing oncological datasets by presenting two case studies related to ovarian cancer and colorectal cancer. GeenaR is available at http://proteomics.hsanmartino.it/geenar/.

Download Full-text

Network Analysis Based on Unique Spectral Features Enables an Efficient Selection of Genomically Diverse Operational Isolation Units

Microorganisms ◽

10.3390/microorganisms9020416 ◽

2021 ◽

Vol 9 (2) ◽

pp. 416

Author(s):

Charles Dumolin ◽

Charlotte Peeters ◽

Evelien De Canck ◽

Nico Boon ◽

Peter Vandamme

Keyword(s):

Network Analysis ◽

Hierarchical Clustering ◽

Mass Spectra ◽

Spectral Features ◽

Maldi Tof ◽

Maldi Tof Mass Spectra ◽

Diversity Studies ◽

Technical Sample ◽

Efficient Selection ◽

Selection Of

Culturomics-based bacterial diversity studies benefit from the implementation of MALDI-TOF MS to remove genomically redundant isolates from isolate collections. We previously introduced SPeDE, a novel tool designed to dereplicate spectral datasets at an infraspecific level into operational isolation units (OIUs) based on unique spectral features. However, biological and technical variation may result in methodology-induced differences in MALDI-TOF mass spectra and hence provoke the detection of genomically redundant OIUs. In the present study, we used three datasets to analyze to which extent hierarchical clustering and network analysis allowed to eliminate redundant OIUs obtained through biological and technical sample variation and to describe the diversity within a set of spectra obtained from 134 unknown soil isolates. Overall, network analysis based on unique spectral features in MALDI-TOF mass spectra enabled a superior selection of genomically diverse OIUs compared to hierarchical clustering analysis and provided a better understanding of the inter-OIU relationships.

Download Full-text