scholarly journals When less is more - Endogenous tagging with TurboID increases the sensitivity of proximity labelling-based experiments

2021 ◽  
Author(s):  
Alexander Stockhammer ◽  
Laila Benz ◽  
Christian Freund ◽  
Benno Kuropka ◽  
Francesca Bottanelli

In recent years, proximity labelling has established itself as an unbiased and powerful approach to map the interactome of specific proteins. Generally, protein fusions with labelling enzymes are transiently overexpressed to perform these experiments. Using a pipeline for the rapid generation CRISPR-Cas9 knock-ins (KIs) based on antibiotic selection, we were able to compare the performance of commonly used labelling enzymes when endogenously expressed. We found TurboID and its shorter variant miniTurboID to be superior above other labelling enzymes at physiological expression levels. Endogenous tagging of the μ subunit of the AP-1 complex increased the sensitivity for detection of interactors in a proximity labelling experiment and resulted in a more comprehensive mass spectrometry data set. We were able to identify several known interactors of the complex and cargo proteins that simple overexpression of a labelling enzyme fusion protein could not reveal. Our approach greatly simplifies the execution of proximity labelling experiments for proteins in their native cellular environment and allows going from CRISPR transfection to mass spectrometry data in just over a month.

2015 ◽  
Author(s):  
Qiang Kou ◽  
Si Wu ◽  
Nikola Tolić ◽  
Ljiljana Pasa-Tolić ◽  
Xiaowen Liu

Although proteomics has made rapid progress in the past decade, researchers are still in the early stage of exploring the world of complex proteoforms, which are protein products with various primary structure alterations resulting from gene mutations, alternative splicing, post-translational modifications, and other biological processes. Proteoform identification is essential to mapping proteoforms to their biological functions as well as discovering novel proteoforms and new protein functions. Top-down mass spectrometry is the method of choice for identifying complex proteoforms because it provides a "bird view" of intact proteoforms. The combinatorial explosion of possible proteoforms, which may result in billions of possible proteoforms for one protein, makes proteoform identification a challenging computational problem. Here we propose a new data structure, called the mass graph, for efficiently representing proteoforms. In addition, we design mass graph alignment algorithms for proteoform identification by top-down mass spectrometry. Experiments on a histone H4 mass spectrometry data set showed that the proposed methods outperformed MS-Align-E in identifying complex proteoforms.


2019 ◽  
Vol 14 ◽  
Author(s):  
Pingan He ◽  
Longao Hou ◽  
Hong Tao ◽  
Qi Dai ◽  
Yuhua Yao

Backgroud: The impact of cancer in the society has created the necessity of new and faster theoretical models for the early diagnosis of cancer. Methods: In the work, A mass spectrometry (MS) data analysis method based on star-like graph of protein and support vector machine (SVM) was proposed and applied to the ovarian cancer early classification in the MS data set. Firstly, the MS data is reduced and transformed into the corresponding protein sequence. And then, the topological indexes of the star-like graph are calculated to describe each MS data of cancer sample. Finally, the SVM model is suggested to classify the MS data. Results: Using independent training and testing experiments 10 times to evaluate the ovarian cancer detection models. The average prediction accuracy, sensitivity, and specificity of the model were 96.45%, 96.88%, and 95.67%, respectively, for [0,1] normalization data. and the model were 94.43%, 96.25%, and 91.11%, respectively, for [-1,1] normalization data. Conclusion: The model combined with the SELDI-TOF-MS technology had a prospect in early clinical detection and diagnosis of ovarian cancer.


Author(s):  
Alexia Kakourou ◽  
Werner Vach ◽  
Simone Nicolardi ◽  
Yuri van der Burgt ◽  
Bart Mertens

AbstractMass spectrometry based clinical proteomics has emerged as a powerful tool for high-throughput protein profiling and biomarker discovery. Recent improvements in mass spectrometry technology have boosted the potential of proteomic studies in biomedical research. However, the complexity of the proteomic expression introduces new statistical challenges in summarizing and analyzing the acquired data. Statistical methods for optimally processing proteomic data are currently a growing field of research. In this paper we present simple, yet appropriate methods to preprocess, summarize and analyze high-throughput MALDI-FTICR mass spectrometry data, collected in a case-control fashion, while dealing with the statistical challenges that accompany such data. The known statistical properties of the isotopic distribution of the peptide molecules are used to preprocess the spectra and translate the proteomic expression into a condensed data set. Information on either the intensity level or the shape of the identified isotopic clusters is used to derive summary measures on which diagnostic rules for disease status allocation will be based. Results indicate that both the shape of the identified isotopic clusters and the overall intensity level carry information on the class outcome and can be used to predict the presence or absence of the disease.


2007 ◽  
Vol 3 ◽  
pp. 117693510700300 ◽  
Author(s):  
Masaru Ushijima ◽  
Satoshi Miyata ◽  
Shinto Eguchi ◽  
Masanori Kawakita ◽  
Masataka Yoshimoto ◽  
...  

We propose a method for biomarker discovery from mass spectrometry data, improving the common peak approach developed by Fushiki et al. ( BMC Bioinformatics, 7:358, 2006). The common peak method is a simple way to select the sensible peaks that are shared with many subjects among all detected peaks by combining a standard spectrum alignment and kernel density estimates. The key idea of our proposed method is to apply the common peak approach to each class label separately. Hence, the proposed method gains more informative peaks for predicting class labels, while minor peaks associated with specific subjects are deleted correctly. We used a SELDI-TOF MS data set from laser microdissected cancer tissues for predicting the treatment effects of neoadjuvant therapy using an anticancer drug on breast cancer patients. The AdaBoost algorithm is adopted for pattern recognition, based on the set of candidate peaks selected by the proposed method. The analysis gives good performance in the sense of test errors for classifying the class labels for a given feature vector of selected peak values.


Author(s):  
Allegra T. Aron ◽  
Emily Gentry ◽  
Kerry L. McPhail ◽  
Louis Felix Nothias ◽  
Mélissa Nothias-Esposito ◽  
...  

Herein, we present a protocol for the use of Global Natural Products Social (GNPS) Molecular Networking, an interactive online chemistry-focused mass spectrometry data curation and analysis infrastructure. The goal of GNPS is to provide as much chemical insight for an untargeted tandem mass spectrometry data set as possible and to connect this chemical insight to the underlying biological questions a user wishers to address. This can be performed within one experiment or at the repository scale. GNPS not only serves as a public data repository for untargeted tandem mass spectrometry data with the sample information (metadata), it also captures community knowledge that is disseminated via living data across all public data. One or the main analysis tools used by the GNPS community is molecular networking. Molecular networking creates a structured data table that reflects the chemical space from tandem mass spectrometry experiments via computing the relationships of the tandem mass spectra through spectral similarity. This protocol provides step-by-step instructions for creating reproducible high-quality molecular networks. For training purposes, the reader is led through the protocol from recalling a public data set and its sample information to creating and interpreting a molecular network. Each data analysis job can be shared or cloned to disseminate the knowledge gained, thus propagating information that can lead to the discovery of molecules, metabolic pathways, and ecosystem/community interactions.


2015 ◽  
Vol 11 (2) ◽  
Author(s):  
Weiping Ma ◽  
Yang Feng ◽  
Kani Chen ◽  
Zhiliang Ying

AbstractMotivated by modeling and analysis of mass-spectrometry data, a semi- and nonparametric model is proposed that consists of linear parametric components for individual location and scale and a nonparametric regression function for the common shape. A multi-step approach is developed that simultaneously estimates the parametric components and the nonparametric function. Under certain regularity conditions, it is shown that the resulting estimators is consistent and asymptotic normal for the parametric part and achieve the optimal rate of convergence for the nonparametric part when the bandwidth is suitably chosen. Simulation results are presented to demonstrate the effectiveness and finite-sample performance of the method. The method is also applied to a SELDI-TOF mass spectrometry data set from a study of liver cancer patients.


2012 ◽  
Vol 9 (1) ◽  
pp. 1-11 ◽  
Author(s):  
Dennis Trede ◽  
Jan Hendrik Kobarg ◽  
Janina Oetjen ◽  
Herbert Thiele ◽  
Peter Maass ◽  
...  

Summary In the last decade, matrix-assisted laser desorption/ionization (MALDI) imaging mass spectrometry (IMS), also called as MALDI-imaging, has proven its potential in proteomics and was successfully applied to various types of biomedical problems, in particular to histopathological label-free analysis of tissue sections. In histopathology, MALDI-imaging is used as a general analytic tool revealing the functional proteomic structure of tissue sections, and as a discovery tool for detecting new biomarkers discriminating a region annotated by an experienced histologist, in particular, for cancer studies.A typical MALDI-imaging data set contains 108 to 109 intensity values occupying more than 1 GB. Analysis and interpretation of such huge amount of data is a mathematically, statistically and computationally challenging problem. In this paper we overview some computational methods for analysis of MALDI-imaging data sets. We discuss the importance of data preprocessing, which typically includes normalization, baseline removal and peak picking, and hightlight the importance of image denoising when visualizing IMS data.


2019 ◽  
Vol 117 (2) ◽  
pp. 1015-1020 ◽  
Author(s):  
Giulia Vecchi ◽  
Pietro Sormanni ◽  
Benedetta Mannini ◽  
Andrea Vandelli ◽  
Gian Gaetano Tartaglia ◽  
...  

To function effectively proteins must avoid aberrant aggregation, and hence they are expected to be expressed at concentrations safely below their solubility limits. By analyzing proteome-wide mass spectrometry data of Caenorhabditis elegans, however, we show that the levels of about three-quarters of the nearly 4,000 proteins analyzed in adult animals are close to their intrinsic solubility limits, indeed exceeding them by about 10% on average. We next asked how aging and functional self-assembly influence these solubility limits. We found that despite the fact that the total quantity of proteins within the cellular environment remains approximately constant during aging, protein aggregation sharply increases between days 6 and 12 of adulthood, after the worms have reproduced, as individual proteins lose their stoichiometric balances and the cellular machinery that maintains solubility undergoes functional decline. These findings reveal that these proteins are highly prone to undergoing concentration-dependent phase separation, which on aging is rationalized in a decrease of their effective solubilities, in particular for proteins associated with translation, growth, reproduction, and the chaperone system.


Sign in / Sign up

Export Citation Format

Share Document