scholarly journals yHydra: Deep Learning enables an Ultra Fast Open Search by Jointly Embedding MS/MS Spectra and Peptides of Mass Spectrometry-based Proteomics

2021 ◽  
Author(s):  
Tom Altenburg ◽  
Thilo Muth ◽  
Bernhard Y. Renard

AbstractMass spectrometry-based proteomics allows to study all proteins of a sample on a molecular level. The ever increasing complexity and amount of proteomics MS-data requires powerful and yet efficient computational and statistical analysis. In particular, most recent bottom-up MS-based proteomics studies consider either a diverse pool of post-translational modifications, employ large databases – as in metaproteomics or proteogenomics, contain multiple isoforms of proteins, include unspecific cleavage sites or even combinations thereof and thus face a computationally challenging situation regarding protein identification. In order to cope with resulting large search spaces, we present a deep learning approach that jointly embeds MS/MS spectra and peptides into the same vector space such that embeddings can be compared easily and interchangeable by using euclidean distances. In contrast to existing spectrum embedding techniques, ours are learned jointly with their respective peptides and thus remain meaningful. By visualizing the learned manifold of both spectrum and peptide embeddings in correspondence to their physicochemical properties our approach becomes easily interpretable. At the same time, our joint embeddings blur the lines between spectra and protein sequences, providing a powerful framework for peptide identification. In particular, we build an open search, which allows to search multiple ten-thousands of spectra against millions of peptides within seconds. yHydra achieves identification rates that are compatible with MSFragger. Due to the open search, delta masses are assigned to each identification which allows to unrestrictedly characterize post-translational modifications. Meaningful joint embeddings allow for faster open searches and generally make downstream analysis efficient and convenient for example for integration with other omics types.Availabilityupon [email protected]

2019 ◽  
Author(s):  
Katja Ovchinnikova ◽  
Vitaly Kovalev ◽  
Lachlan Stuart ◽  
Theodore Alexandrov

AbstractMotivationImaging mass spectrometry (imaging MS) is a powerful technology for revealing localizations of hundreds of molecules in tissue sections. However, imaging MS data is polluted with off-sample ions caused by caused by sample preparation, particularly by the MALDI matrix application. The presence of the off-sample ion images confounds and hinders metabolite identification and downstream analysis.ResultsWe created a high-quality gold standard of 23238 manually tagged ion images from 87 public datasets from the METASPACE knowledge base. We developed several machine and deep learning methods for recognizing off-sample ion images. Deep residual learning performed the best with the F1 score of 0.97. Spatio-molecular biclustering method achieved the F1 scores of 0.96 and 0.93 in semi- and fully-automated scenarios, respectively. Molecular co-localization method achieved the F1 score of 0.90. We investigated the clusters of the DHB matrix, the most common MALDI matrix, and characterized parameters of a clusters combinatorial model. This work addresses an important issue in imaging MS and illustrates how public data, modern web technologies, and machine and deep learning open novel avenues in imaging MS.Availability and ImplementationData and source code are available at: https://github.com/metaspace2020/[email protected]


Author(s):  
Jue-Liang Hsu ◽  
Shu-Hui Chen

Stable-isotope reductive dimethylation, a cost-effective, simple, robust, reliable and easy-to- multiplex labelling method, is widely applied to quantitative proteomics using liquid chromatography-mass spectrometry. This review focuses on biological applications of stable-isotope dimethyl labelling for a large-scale comparative analysis of protein expression and post-translational modifications based on its unique properties of the labelling chemistry. Some other applications of the labelling method for sample preparation and mass spectrometry-based protein identification and characterization are also summarized. This article is part of the themed issue ‘Quantitative mass spectrometry’.


Bioanalysis ◽  
2021 ◽  
Author(s):  
Shulei Liu ◽  
Benjamin L Schulz

Mass spectrometry (MS) is a powerful technique for protein identification, quantification and characterization that is widely applied in biochemical studies, and which can provide data on the quantity, structural integrity and post-translational modifications of proteins. It is therefore a versatile and widely used analytic tool for quality control of biopharmaceuticals, especially in quantifying host-cell protein impurities, identifying post-translation modifications and structural characterization of biopharmaceutical proteins. Here, we summarize recent advances in MS-based analyses of these key quality attributes of the biopharmaceutical development and manufacturing processes.


2018 ◽  
Author(s):  
Arun Devabhaktuni ◽  
Niclas Olsson ◽  
Carlos Gonzales ◽  
Keith Rawson ◽  
Kavya Swaminathan ◽  
...  

SummaryThousands of protein post-translational modifications (PTMs) dynamically impact nearly all cellular functions. Mass spectrometry is well suited to PTM identification, but proteome-scale analyses are biased towards PTMs with existing enrichment methods. To measure the full landscape of PTM regulation, software must overcome two fundamental challenges: intractably large search spaces and difficulty distinguishing correct from incorrect identifications. Here, we describe TagGraph, software that overcomes both challenges with a string-based search method orders of magnitude faster than current approaches, and probabilistic validation model optimized for PTM assignments. When applied to a human proteome map, TagGraph tripled confident identifications while revealing thousands of modification types on nearly one million sites spanning the proteome. We expand known sites by orders of magnitude for highly abundant yet understudied PTMs such as proline hydroxylation, and derive tissue-specific insight into these PTMs’ roles. TagGraph expands our ability to survey the full landscape of PTM function and regulation.


2019 ◽  
Author(s):  
Toan K. Phung ◽  
Lucia F Zacchi ◽  
Benjamin L. Schulz

AbstractData Independent Acquisition (DIA) Mass Spectrometry (MS) workflows allow unbiased measurement of all detectable peptides from complex proteomes, but require ion libraries for interrogation of peptides of interest. These DIA ion libraries can be theoretical or built from peptide identification data from Data Dependent Acquisition (DDA) MS workflows. However, DDA libraries derived from empirical data rely on confident peptide identification, which can be challenging for peptides carrying complex post-translational modifications. Here, we present DIALib, software to automate the construction of peptide and glycopeptide Data Independent Acquisition ion Libraries. We show that DIALib theoretical ion libraries can identify and measure diverse N- and O-glycopeptides from yeast and mammalian glycoproteins without prior knowledge of the glycan structures present. We present proof-of-principle data from a moderately complex yeast cell wall glycoproteome and a simple mixture of mammalian glycoproteins. We also show that DIALib libraries consisting only of glycan oxonium ions can quickly and easily provide a global compositional glycosylation profile of the detectable “oxoniome” of glycoproteomes. DIALib will help enable DIA glycoproteomics as a complementary analytical approach to DDA glycoproteomics.


2020 ◽  
Vol 64 (1) ◽  
pp. 97-110
Author(s):  
Christian Sibbersen ◽  
Mogens Johannsen

Abstract In living systems, nucleophilic amino acid residues are prone to non-enzymatic post-translational modification by electrophiles. α-Dicarbonyl compounds are a special type of electrophiles that can react irreversibly with lysine, arginine, and cysteine residues via complex mechanisms to form post-translational modifications known as advanced glycation end-products (AGEs). Glyoxal, methylglyoxal, and 3-deoxyglucosone are the major endogenous dicarbonyls, with methylglyoxal being the most well-studied. There are several routes that lead to the formation of dicarbonyl compounds, most originating from glucose and glucose metabolism, such as the non-enzymatic decomposition of glycolytic intermediates and fructosyl amines. Although dicarbonyls are removed continuously mainly via the glyoxalase system, several conditions lead to an increase in dicarbonyl concentration and thereby AGE formation. AGEs have been implicated in diabetes and aging-related diseases, and for this reason the elucidation of their structure as well as protein targets is of great interest. Though the dicarbonyls and reactive protein side chains are of relatively simple nature, the structures of the adducts as well as their mechanism of formation are not that trivial. Furthermore, detection of sites of modification can be demanding and current best practices rely on either direct mass spectrometry or various methods of enrichment based on antibodies or click chemistry followed by mass spectrometry. Future research into the structure of these adducts and protein targets of dicarbonyl compounds may improve the understanding of how the mechanisms of diabetes and aging-related physiological damage occur.


2020 ◽  
Vol 64 (1) ◽  
pp. 135-153 ◽  
Author(s):  
Lauren Elizabeth Smith ◽  
Adelina Rogowska-Wrzesinska

Abstract Post-translational modifications (PTMs) are integral to the regulation of protein function, characterising their role in this process is vital to understanding how cells work in both healthy and diseased states. Mass spectrometry (MS) facilitates the mass determination and sequencing of peptides, and thereby also the detection of site-specific PTMs. However, numerous challenges in this field continue to persist. The diverse chemical properties, low abundance, labile nature and instability of many PTMs, in combination with the more practical issues of compatibility with MS and bioinformatics challenges, contribute to the arduous nature of their analysis. In this review, we present an overview of the established MS-based approaches for analysing PTMs and the common complications associated with their investigation, including examples of specific challenges focusing on phosphorylation, lysine acetylation and redox modifications.


2018 ◽  
Author(s):  
Zhiwu An ◽  
Fuzhou Gong ◽  
Yan Fu

We have developed PTMiner, a first software tool for automated, confident filtering, localization and annotation of protein post-translational modifications identified by open (mass-tolerant) search of large tandem mass spectrometry datasets. The performance of the software was validated on carefully designed simulation data. <br>


Sign in / Sign up

Export Citation Format

Share Document