scholarly journals ColocML: machine learning quantifies co-localization between mass spectrometry images

2020 ◽  
Vol 36 (10) ◽  
pp. 3215-3224 ◽  
Author(s):  
Katja Ovchinnikova ◽  
Lachlan Stuart ◽  
Alexander Rakhlin ◽  
Sergey Nikolenko ◽  
Theodore Alexandrov

Abstract Motivation Imaging mass spectrometry (imaging MS) is a prominent technique for capturing distributions of molecules in tissue sections. Various computational methods for imaging MS rely on quantifying spatial correlations between ion images, referred to as co-localization. However, no comprehensive evaluation of co-localization measures has ever been performed; this leads to arbitrary choices and hinders method development. Results We present ColocML, a machine learning approach addressing this gap. With the help of 42 imaging MS experts from nine laboratories, we created a gold standard of 2210 pairs of ion images ranked by their co-localization. We evaluated existing co-localization measures and developed novel measures using term frequency–inverse document frequency and deep neural networks. The semi-supervised deep learning Pi model and the cosine score applied after median thresholding performed the best (Spearman 0.797 and 0.794 with expert rankings, respectively). We illustrate these measures by inferring co-localization properties of 10 273 molecules from 3685 public METASPACE datasets. Availability and implementation https://github.com/metaspace2020/coloc. Supplementary information Supplementary data are available at Bioinformatics online.

2019 ◽  
Author(s):  
Katja Ovchinnikova ◽  
Alexander Rakhlin ◽  
Lachlan Stuart ◽  
Sergey Nikolenko ◽  
Theodore Alexandrov

AbstractMotivationImaging mass spectrometry (imaging MS) is a prominent technique for capturing distributions of molecules in tissue sections. Various computational methods for imaging MS rely on quantifying spatial correlations between ion images, referred to as co-localization. However, no comprehensive evaluation of co-localization measures has ever been performed; this leads to arbitrary choices and hinders method development.ResultsWe present ColocAI, an artificial intelligence approach addressing this gap. With the help of 42 imaging MS experts from 9 labs, we created a gold standard of 2210 pairs of ion images ranked by their co-localization. We evaluated existing co-localization measures and developed novel measures using tf-idf and deep neural networks. The semi-supervised deep learning Pi model and the cosine score applied after median thresholding performed the best (Spearman 0.797 and 0.794 with expert rankings respectively). We illustrate these measures by inferring co-localization properties of 10273 molecules from 3685 public METASPACE datasets.Availability and Implementationhttps://github.com/metaspace2020/[email protected]


2020 ◽  
Vol 20 (1) ◽  
pp. 841-857
Author(s):  
Malena Manzi ◽  
Martín Palazzo ◽  
María Elena Knott ◽  
Pierre Beauseroy ◽  
Patricio Yankilevich ◽  
...  

2020 ◽  
Author(s):  
Leonoor E.M. Tideman ◽  
Lukasz G. Migas ◽  
Katerina V. Djambazova ◽  
Nathan Heath Patterson ◽  
Richard M. Caprioli ◽  
...  

AbstractThe search for molecular species that are differentially expressed between biological states is an important step towards discovering promising biomarker candidates. In imaging mass spectrometry (IMS), performing this search manually is often impractical due to the large size and high-dimensionality of IMS datasets. Instead, we propose an interpretable machine learning workflow that automatically identifies biomarker candidates by their mass-to-charge ratios, and that quantitatively estimates their relevance to recognizing a given biological class using Shapley additive explanations (SHAP). The task of biomarker candidate discovery is translated into a feature ranking problem: given a classification model that assigns pixels to different biological classes on the basis of their mass spectra, the molecular species that the model uses as features are ranked in descending order of relative predictive importance such that the top-ranking features have a higher likelihood of being useful biomarkers. Besides providing the user with an experiment-wide measure of a molecular species’ biomarker potential, our workflow delivers spatially localized explanations of the classification model’s decision-making process in the form of a novel representation called SHAP maps. SHAP maps deliver insight into the spatial specificity of biomarker candidates by highlighting in which regions of the tissue sample each feature provides discriminative information and in which regions it does not. SHAP maps also enable one to determine whether the relationship between a biomarker candidate and a biological state of interest is correlative or anticorrelative. Our automated approach to estimating a molecular species’ potential for characterizing a user-provided biological class, combined with the untargeted and multiplexed nature of IMS, allows for the rapid screening of thousands of molecular species and the obtention of a broader biomarker candidate shortlist than would be possible through targeted manual assessment. Our biomarker candidate discovery workflow is demonstrated on mouse-pup and rat kidney case studies.HighlightsOur workflow automates the discovery of biomarker candidates in imaging mass spectrometry data by using state-of-the-art machine learning methodology to produce a shortlist of molecular species that are differentially expressed with regards to a user-provided biological class.A model interpretability method called Shapley additive explanations (SHAP), with observational Shapley values, enables us to quantify the local and global predictive importance of molecular species with respect to recognizing a user-provided biological class.By providing spatially localized explanations for a classification model’s decision-making process, SHAP maps deliver insight into the spatial specificity of biomarker candidates and enable one to determine whether (and where) the relationship between a biomarker candidate and the class of interest is correlative or anticorrelative.


Molecules ◽  
2019 ◽  
Vol 24 (21) ◽  
pp. 3837 ◽  
Author(s):  
Seong-Eun Park ◽  
Seung-Ho Seo ◽  
Eun-Ju Kim ◽  
Dae-Hun Park ◽  
Kyung-Mok Park ◽  
...  

The purpose of this study was to analyze metabolic differences of ginseng berries according to cultivation age and ripening stage using gas chromatography-mass spectrometry (GC-MS)-based metabolomics method. Ginseng berries were harvested every week during five different ripening stages of three-year-old and four-year-old ginseng. Using identified metabolites, a random forest machine learning approach was applied to obtain predictive models for the classification of cultivation age or ripening stage. Principal component analysis (PCA) score plot showed a clear separation by ripening stage, indicating that continuous metabolic changes occurred until the fifth ripening stage. Three-year-old ginseng berries had higher levels of valine, glutamic acid, and tryptophan, but lower levels of lactic acid and galactose than four-year-old ginseng berries at fully ripened stage. Metabolic pathways affected by different cultivation age were involved in amino acid metabolism pathways. A random forest machine learning approach extracted some important metabolites for predicting cultivation age or ripening stage with low error rate. This study demonstrates that different cultivation ages or ripening stages of ginseng berry can be successfully discriminated using a GC-MS-based metabolomic approach together with random forest analysis.


2020 ◽  
Vol 3 (1) ◽  
pp. 61-87 ◽  
Author(s):  
Theodore Alexandrov

Spatial metabolomics is an emerging field of omics research that has enabled localizing metabolites, lipids, and drugs in tissue sections, a feat considered impossible just two decades ago. Spatial metabolomics and its enabling technology—imaging mass spectrometry—generate big hyperspectral imaging data that have motivated the development of tailored computational methods at the intersection of computational metabolomics and image analysis. Experimental and computational developments have recently opened doors to applications of spatial metabolomics in life sciences and biomedicine. At the same time, these advances have coincided with a rapid evolution in machine learning, deep learning, and artificial intelligence, which are transforming our everyday life and promise to revolutionize biology and healthcare. Here, we introduce spatial metabolomics through the eyes of a computational scientist, review the outstanding challenges, provide a look into the future, and discuss opportunities granted by the ongoing convergence of human and artificial intelligence.


Author(s):  
Charan Lokku

Abstract: To avoid fraudulent Job postings on the internet, we target to minimize the number of such frauds through the Machine Learning approach to predict the chances of a job being fake so that the candidate can stay alert and make informed decisions if required. The model will use NLP to analyze the sentiments and pattern in the job posting and TF-IDF vectorizer for feature extraction. In this model, we are going to use Synthetic Minority Oversampling Technique (SMOTE) to balance the data and for classification, we used Random Forest to predict output with high accuracy, even for the large dataset it runs efficiently, and it enhances the accuracy of the model and prevents the overfitting issue. The final model will take in any relevant job posting data and produce a result determining whether the job is real or fake. Keywords: Natural Language Processing (NLP), Term Frequency-Inverse Document Frequency (TF-IDF), Synthetic Minority Oversampling Technique (SMOTE), Random Forest.


Metabolites ◽  
2021 ◽  
Vol 11 (8) ◽  
pp. 477
Author(s):  
Don D. Nguyen ◽  
Veronika Saharuka ◽  
Vitaly Kovalev ◽  
Lachlan Stuart ◽  
Massimo Del Prete ◽  
...  

Metabolite annotation from imaging mass spectrometry (imaging MS) data is a difficult undertaking that is extremely resource intensive. Here, we adapted METASPACE, cloud software for imaging MS metabolite annotation and data interpretation, to quickly annotate microbial specialized metabolites from high-resolution and high-mass accuracy imaging MS data. Compared with manual ion image and MS1 annotation, METASPACE is faster and, with the appropriate database, more accurate. We applied it to data from microbial colonies grown on agar containing 10 diverse bacterial species and showed that METASPACE was able to annotate 53 ions corresponding to 32 different microbial metabolites. This demonstrates METASPACE to be a useful tool to annotate the chemistry and metabolic exchange factors found in microbial interactions, thereby elucidating the functions of these molecules.


2021 ◽  
Author(s):  
Tobias Greisager Rehfeldt ◽  
Konrad Krawczyk ◽  
Mathias Bøgebjerg ◽  
Veit Schwämmle ◽  
Richard Röttger

AbstractMotivationLiquid-chromatography mass-spectrometry (LC-MS) is the established standard for analyzing the proteome in biological samples by identification and quantification of thousands of proteins. Machine learning (ML) promises to considerably improve the analysis of the resulting data, however, there is yet to be any tool that mediates the path from raw data to modern ML applications. More specifically, ML applications are currently hampered by three major limitations: (1) absence of balanced training data with large sample size; (2) unclear definition of sufficiently information-rich data representations for e.g. peptide identification; (3) lack of benchmarking of ML methods on specific LC-MS problems.ResultsWe created the MS2AI pipeline that automates the process of gathering vast quantities of mass spectrometry (MS) data for large scale ML applications. The software retrieves raw data from either in-house sources or from the proteomics identifications database, PRIDE. Subsequently, the raw data is stored in a standardized format amenable for ML encompassing MS1/MS2 spectra and peptide identifications. This tool bridges the gap between MS and AI, and to this effect we also present an ML application in the form of a convolutional neural network for the identification of oxidized peptides.AvailabilityAn open source implementation of the software can be found freely available for non-commercial use at https://gitlab.com/roettgerlab/[email protected] informationSupplementary data are available at Bioinformatics online.


Sign in / Sign up

Export Citation Format

Share Document