scholarly journals GeenaR: A Web Tool for Reproducible MALDI-TOF Analysis

2021 ◽  
Vol 12 ◽  
Author(s):  
Eugenio Del Prete ◽  
Angelo Facchiano ◽  
Aldo Profumo ◽  
Claudia Angelini ◽  
Paolo Romano

Mass spectrometry is a widely applied technology with a strong impact in the proteomics field. MALDI-TOF is a combined technology in mass spectrometry with many applications in characterizing biological samples from different sources, such as the identification of cancer biomarkers, the detection of food frauds, the identification of doping substances in athletes’ fluids, and so on. The massive quantity of data, in the form of mass spectra, are often biased and altered by different sources of noise. Therefore, extracting the most relevant features that characterize the samples is often challenging and requires combining several computational methods. Here, we present GeenaR, a novel web tool that provides a complete workflow for pre-processing, analyzing, visualizing, and comparing MALDI-TOF mass spectra. GeenaR is user-friendly, provides many different functionalities for the analysis of the mass spectra, and supports reproducible research since it produces a human-readable report that contains function parameters, results, and the code used for processing the mass spectra. First, we illustrate the features available in GeenaR. Then, we describe its internal structure. Finally, we prove its capabilities in analyzing oncological datasets by presenting two case studies related to ovarian cancer and colorectal cancer. GeenaR is available at http://proteomics.hsanmartino.it/geenar/.

2017 ◽  
Author(s):  
Eugenio Del Prete ◽  
Angelo Facchiano ◽  
Aldo Profumo ◽  
Claudia Angelini ◽  
Paolo Romano

Introduction Computational reproducibility refers to the possibility of reconstructing all the steps of a workflow that connects raw data, processed data and results: it is a fundamental issue in the omic studies because of the complex and high-dimensional nature of the involved data. The analysis of omics data needs to exploit multi-step workflows including pre-processing, elaboration, statistical validation, interpretation and presentation. Although some analysis platforms are able to ensure computational reproducibility for different omics studies, they do not provide explicit information about the executed code. The availability of the code increases the quality of research in terms of transparency and knowledge transfer. Moreover, it allows other researchers to reproduce the results in a local system, make a comparison among the results and re-use computer code for analyzing different dataset. Methods Geena 2 is a robust web tool for MALDI-ToF mass spectra pre-processing. Its main output is the list of common peaks identified by aligning average spectra originated from groups of replicates from different samples. Intermediate results are also made available. GeenaR is an extension of Geena 2 still under development. Its objective is the integration in the platform of some R libraries, which may provide advanced statistical analyses, thus enriching the current output. It is noteworthy that many R packages follow the reproducible research philosophy.For the aims of GeenaR, the following R packages and tools have been considered: R-Markdown, knitr and spin. The implementation of these resources on an existing web platform can be an added value for its reporting features, since it improves the creation of a report about the work carried out, especially with reference to the code. Results and Discussion One of the aims of both Geena 2 and GeenaR is facilitating the users in analyzing MALDI-ToF mass spectra by providing a web-interface that allows to upload data, select different algorithms and parameters, execute the analysis in order to obtain results according to a specific demand. Thanks to the novel reproducible research module implemented in GeenaR, the system generates a report containing all the steps performed. More in details, the report will provide: date and time of the execution, the R libraries used for the process, chunks of code for main elaborations, selected parameters (either by the users or by the system), uploaded data in MALDIquant ‘Mass Spectrum’ class type, numerical and graphical results, short explanation about the workflow, version of the system and of the packages. GeenaR generates the results in a compressed archive, with separated log and graphical results, and a report, both in R-Markdown and in HTML format. It is important to underline strongly that reproducible research is not an optional, but a fundamental component of a good computational practice, which becomes essential in computational biology.


2007 ◽  
Vol 61 (6) ◽  
pp. 333-341
Author(s):  
Jasna Vukovic ◽  
Slobodan Jovanovic ◽  
Manfred Lechner

In this work, MALDI-TOF mass spectrometry was used for the characterization of aliphatic hyperbranched polyesters (AHBP), synthesized from 2,2-bis(hydroxymethyl)propionic acid (bis-MPA) and di-trimethylolpropane. From the obtained results it was concluded that it was not possible to take complete advantages of MALDI-TOF MS in this particular case, since the AHBP used in this work were polydisperse. The intensity of the signals from the high mass tail of these samples (pseudo generation higher than four) was underestimated and insufficient to distinguish it from the baseline and to use it for the analysis of the spectra. As a consequence of that, lower values of the Mn were obtained. At the same time, Mw were also underestimated, which led to very low values of the polydispersity index. On the other hand, it was possible to obtain molar masses of individual molecules from the MALDI-TOF mass spectra of AHBP and to qualitatively determine the extent of cyclization (side reactions) at each degree of polymerization. Using the adequate set of equations and results obtained from MALDI-TOF mass spectra of AHBP, every signal from the spectra was identified. The obtained results show that formation of poly(bis-MPA), intramolecular esterification and intramolecular etherification occurred as side reactions during the synthesis of these polyesters. The relative amount of the cycles increases with the number of pseudo generation (from the second up to the fifth pseudo generation). It was also observed that the relative proportion of the signals which represent cyclic structures increases with the increasing degree of polymerization. In this work the basic principles of MALDI-TOF MS are also presented, as well as, a review of adequate published articles.


2017 ◽  
Author(s):  
Eugenio Del Prete ◽  
Angelo Facchiano ◽  
Aldo Profumo ◽  
Claudia Angelini ◽  
Paolo Romano

Introduction Computational reproducibility refers to the possibility of reconstructing all the steps of a workflow that connects raw data, processed data and results: it is a fundamental issue in the omic studies because of the complex and high-dimensional nature of the involved data. The analysis of omics data needs to exploit multi-step workflows including pre-processing, elaboration, statistical validation, interpretation and presentation. Although some analysis platforms are able to ensure computational reproducibility for different omics studies, they do not provide explicit information about the executed code. The availability of the code increases the quality of research in terms of transparency and knowledge transfer. Moreover, it allows other researchers to reproduce the results in a local system, make a comparison among the results and re-use computer code for analyzing different dataset. Methods Geena 2 is a robust web tool for MALDI-ToF mass spectra pre-processing. Its main output is the list of common peaks identified by aligning average spectra originated from groups of replicates from different samples. Intermediate results are also made available. GeenaR is an extension of Geena 2 still under development. Its objective is the integration in the platform of some R libraries, which may provide advanced statistical analyses, thus enriching the current output. It is noteworthy that many R packages follow the reproducible research philosophy.For the aims of GeenaR, the following R packages and tools have been considered: R-Markdown, knitr and spin. The implementation of these resources on an existing web platform can be an added value for its reporting features, since it improves the creation of a report about the work carried out, especially with reference to the code. Results and Discussion One of the aims of both Geena 2 and GeenaR is facilitating the users in analyzing MALDI-ToF mass spectra by providing a web-interface that allows to upload data, select different algorithms and parameters, execute the analysis in order to obtain results according to a specific demand. Thanks to the novel reproducible research module implemented in GeenaR, the system generates a report containing all the steps performed. More in details, the report will provide: date and time of the execution, the R libraries used for the process, chunks of code for main elaborations, selected parameters (either by the users or by the system), uploaded data in MALDIquant ‘Mass Spectrum’ class type, numerical and graphical results, short explanation about the workflow, version of the system and of the packages. GeenaR generates the results in a compressed archive, with separated log and graphical results, and a report, both in R-Markdown and in HTML format. It is important to underline strongly that reproducible research is not an optional, but a fundamental component of a good computational practice, which becomes essential in computational biology.


2016 ◽  
Author(s):  
Eugenio Del Prete ◽  
Angelo Facchiano ◽  
Aldo Profumo ◽  
Claudia Angelini ◽  
Paolo Romano

Many large-scale proteomics studies have been performed in the last years, and this field of investigation is expanding up. If the analysis of any single spectrum can be performed by tools already made available along with the mass spectrometry (MS) instrumentation, comparison of spectra on a large scale represents a complex aspect of the analysis and interpretation of the study. Recently, we developed Geena 2, a tool for the automation of different steps in the MALDI/ToF MS data analysis. Integration of further tools can be performed, in order to improve some aspects of the whole workflow: the input of more data formats, the implementation of new algorithms for data cleaning, the graphical visualization and the reporting of the results, the use of advanced statistics for the comparison of mass spectra. For this motivations, we are now developing GeenaR, a new robust web tool for pre-processing, analysing, visualizing and comparing a set of MALDI-ToF mass spectra. The aim of this work is the presentation of on-going developments. The first results will be presented at the conference. GeenaR is being written in PHP, Perl (from Geena 2) and R languages. The R packages used are MALDIquant and MALDIquantForeign for mass spectra pre-processing and analysis, OrgMassSpecR for mass spectra comparison, dendextend and pvclust for clustering, and sda and crossval for variable selection. The system is being implemented in a LAMP (Linux, Apache, MySQL, PHP) environment. Proper interfaces between PHP on one side and perl and R on the other are then implemented. The aim of GeenaR is to provide to the users a wider range of statistical methods and graphical results, without making it more difficult to use for researchers with little expertise in programming. In order to achieve this goal, we have taken advantage of the availability of several packages, written in R language, for mass spectrometry statistical that are going to be integrated in the system. The complete pipeline of GeenaR includes some features already available in Geena 2 plus others under development thanks to the integration of the R environment. In fact, an original set of heuristic algorithms is already available in Geena 2. In particular, they are the identification of isotopic peaks by taking into account molecular weight of signals and the related trend of abundances; the normalization on the basis of a reference standard molecule; the peak selection by means of a threshold line, built by linearly interpolating values provided for given m/z values; the alignment, by selecting the nearest peaks, within a limited m/z difference, in the different mass spectra. By means of some R packages, GeenaR adds new statistical methods which are highly relevant for mass spectra analysis. (Abstract truncated at 3,000 characters - the full version is available in the pdf file)


2020 ◽  
Vol 20 (1) ◽  
Author(s):  
Sivkheng Kann ◽  
Sena Sao ◽  
Chanleakhena Phoeung ◽  
Youlet By ◽  
Juliet Bryant ◽  
...  

Abstract Background Serotyping of Streptococcus pneumoniae is important for monitoring of vaccine impact. Unfortunately, conventional and molecular serotyping is expensive and technically demanding. This study aimed to determine the ability of matrix-assisted laser desorption-ionisation time-of-flight (MALDI-TOF) mass spectrometry to discriminate between pneumococcal serotypes and genotypes (defined by global pneumococcal sequence cluster, GPSC). In this study, MALDI-TOF mass spectra were generated for a diverse panel of whole genome sequenced pneumococcal isolates using the bioMerieux VITEK MS in clinical diagnostic (IVD) mode. Discriminatory mass peaks were identified and hierarchical clustering was performed to visually assess discriminatory ability. Random forest and classification and regression tree (CART) algorithms were used to formally determine how well serotypes and genotypes were identified by MALDI-TOF mass spectrum. Results One hundred and ninety-nine pneumococci, comprising 16 serotypes and non-typeable isolates from 46 GPSC, were analysed. In the primary experiment, hierarchical clustering revealed poor congruence between MALDI-TOF mass spectrum and serotype. The correct serotype was identified from MALDI-TOF mass spectrum in just 14.6% (random forest) or 35.4% (CART) of 130 isolates. Restricting the dataset to the nine dominant GPSC (61 isolates / 13 serotypes), discriminatory ability improved slightly: the correct serotype was identified in 21.3% (random forest) and 41.0% (CART). Finally, analysis of 69 isolates of three dominant serotype-genotype pairs (6B-GPSC1, 19F-GPSC23, 23F-GPSC624) resulted in the correct serotype identification in 81.1% (random forest) and 94.2% (CART) of isolates. Conclusions This work suggests that MALDI-TOF is not a useful technique for determination of pneumococcal serotype. MALDI-TOF mass spectra appear more associated with isolate genotype, which may still have utility for future pneumococcal surveillance activities.


2016 ◽  
Author(s):  
Eugenio Del Prete ◽  
Angelo Facchiano ◽  
Aldo Profumo ◽  
Claudia Angelini ◽  
Paolo Romano

Many large-scale proteomics studies have been performed in the last years, and this field of investigation is expanding up. If the analysis of any single spectrum can be performed by tools already made available along with the mass spectrometry (MS) instrumentation, comparison of spectra on a large scale represents a complex aspect of the analysis and interpretation of the study. Recently, we developed Geena 2, a tool for the automation of different steps in the MALDI/ToF MS data analysis. Integration of further tools can be performed, in order to improve some aspects of the whole workflow: the input of more data formats, the implementation of new algorithms for data cleaning, the graphical visualization and the reporting of the results, the use of advanced statistics for the comparison of mass spectra. For this motivations, we are now developing GeenaR, a new robust web tool for pre-processing, analysing, visualizing and comparing a set of MALDI-ToF mass spectra. The aim of this work is the presentation of on-going developments. The first results will be presented at the conference. GeenaR is being written in PHP, Perl (from Geena 2) and R languages. The R packages used are MALDIquant and MALDIquantForeign for mass spectra pre-processing and analysis, OrgMassSpecR for mass spectra comparison, dendextend and pvclust for clustering, and sda and crossval for variable selection. The system is being implemented in a LAMP (Linux, Apache, MySQL, PHP) environment. Proper interfaces between PHP on one side and perl and R on the other are then implemented. The aim of GeenaR is to provide to the users a wider range of statistical methods and graphical results, without making it more difficult to use for researchers with little expertise in programming. In order to achieve this goal, we have taken advantage of the availability of several packages, written in R language, for mass spectrometry statistical that are going to be integrated in the system. The complete pipeline of GeenaR includes some features already available in Geena 2 plus others under development thanks to the integration of the R environment. In fact, an original set of heuristic algorithms is already available in Geena 2. In particular, they are the identification of isotopic peaks by taking into account molecular weight of signals and the related trend of abundances; the normalization on the basis of a reference standard molecule; the peak selection by means of a threshold line, built by linearly interpolating values provided for given m/z values; the alignment, by selecting the nearest peaks, within a limited m/z difference, in the different mass spectra. By means of some R packages, GeenaR adds new statistical methods which are highly relevant for mass spectra analysis. (Abstract truncated at 3,000 characters - the full version is available in the pdf file)


2021 ◽  
Vol 9 (2) ◽  
pp. 416
Author(s):  
Charles Dumolin ◽  
Charlotte Peeters ◽  
Evelien De Canck ◽  
Nico Boon ◽  
Peter Vandamme

Culturomics-based bacterial diversity studies benefit from the implementation of MALDI-TOF MS to remove genomically redundant isolates from isolate collections. We previously introduced SPeDE, a novel tool designed to dereplicate spectral datasets at an infraspecific level into operational isolation units (OIUs) based on unique spectral features. However, biological and technical variation may result in methodology-induced differences in MALDI-TOF mass spectra and hence provoke the detection of genomically redundant OIUs. In the present study, we used three datasets to analyze to which extent hierarchical clustering and network analysis allowed to eliminate redundant OIUs obtained through biological and technical sample variation and to describe the diversity within a set of spectra obtained from 134 unknown soil isolates. Overall, network analysis based on unique spectral features in MALDI-TOF mass spectra enabled a superior selection of genomically diverse OIUs compared to hierarchical clustering analysis and provided a better understanding of the inter-OIU relationships.


Molecules ◽  
2019 ◽  
Vol 24 (12) ◽  
pp. 2226 ◽  
Author(s):  
Alexander O. Chizhov ◽  
Yury E. Tsvetkov ◽  
Nikolay E. Nifantiev

Modern mass spectrometry, including electrospray and MALDI, is applied for analysis and structure elucidation of carbohydrates. Cyclic oligosaccharides isolated from different sources (bacteria and plants) have been known for decades and some of them (cyclodextrins and their derivatives) are widely used in drug design, as food additives, in the construction of nanomaterials, etc. The peculiarities of the first- and second-order mass spectra of cyclic oligosaccharides (natural, synthetic and their derivatives and modifications: cyclodextrins, cycloglucans, cyclofructans, cyclooligoglucosamines, etc.) are discussed in this minireview.


2017 ◽  
Vol 53 (2) ◽  
pp. 162-171 ◽  
Author(s):  
Andrea R. Kelley ◽  
Madeline E. Colley ◽  
George Perry ◽  
Stephan B.H. Bach

2007 ◽  
Vol 79 (4) ◽  
pp. 1639-1645 ◽  
Author(s):  
Alena Krupková ◽  
Jan Čermák ◽  
Zuzana Walterová ◽  
Jiří Horský

Sign in / Sign up

Export Citation Format

Share Document