AN ITERATIVE ALGORITHM TO QUANTIFY FACTORS INFLUENCING PEPTIDE FRAGMENTATION DURING TANDEM MASS SPECTROMETRY

2007 ◽  
Vol 05 (02a) ◽  
pp. 297-311 ◽  
Author(s):  
CHUNGONG YU ◽  
YU LIN ◽  
SHIWEI SUN ◽  
JINJIN CAI ◽  
JINGFEN ZHANG ◽  
...  

In protein identification by tandem mass spectrometry, it is critical to accurately predict the theoretical spectrum for a peptide sequence. To date, the widely-used database searching methods adopted simple statistical models for predicting. For some peptide, these models usually yield a theoretical spectrum with a significant deviation from the experimental one. In this paper, in order to derive an improved predicting model, we utilized a non-linear programming model to quantify the factors impacting peptide fragmentation. Then, an iterative algorithm was proposed to solve this optimization problem. Upon a training set of 1803 spectra, the experimental result showed a good agreement with some known principles about peptide fragmentation, such as the tendency to cleave at the middle of peptide, and Pro's preference of the N-terminal cleavage. Moreover, upon a testing set of 941 spectra, comparison of the predicted spectra against the experimental ones showed that this method can generate reasonable predictions. The results in this paper can offer help to both database searching and de novo methods.

Author(s):  
Haipeng Wang

Protein identification (sequencing) by tandem mass spectrometry is a fundamental technique for proteomics which studies structures and functions of proteins in large scale and acts as a complement to genomics. Analysis and interpretation of vast amounts of spectral data generated in proteomics experiments present unprecedented challenges and opportunities for data mining in areas such as data preprocessing, peptide-spectrum matching, results validation, peptide fragmentation pattern discovery and modeling, and post-translational modification (PTM) analysis. This article introduces the basic concepts and terms of protein identification and briefly reviews the state-of-the-art relevant data mining applications. It also outlines challenges and future potential hot spots in this field.


2005 ◽  
Vol 11 (2) ◽  
pp. 161-167 ◽  
Author(s):  
Kenton D. Juhlin ◽  
Dionne D. Swift ◽  
Martin P. Lacey ◽  
Paul E. Correa ◽  
Thomas W. Keough

Many laboratories identify proteins by searching tandem mass spectrometry data against genomic or protein sequence databases. These database searches typically use the measured peptide masses or the derived peptide sequence and, in this paper, we focus on the latter. We study the minimum peptide sequence data requirements for definitive protein identification from protein sequence databases. Accurate mass measurements are not needed for definitive protein identification, even when a limited amount of sequence data is available for searching. This information has implications for the mass spectrometry performance (and cost), data base search strategies and proteomics research.


Amino Acids ◽  
2021 ◽  
Author(s):  
Magdalena Widgren Sandberg ◽  
Jakob Bunkenborg ◽  
Stine Thyssen ◽  
Martin Villadsen ◽  
Thomas Kofoed

AbstractGranulocyte-macrophage colony-stimulating factor (GM-CSF) is a cytokine and a white blood cell growth factor that has found usage as a therapeutic protein. During analysis of different fermentation batches of GM-CSF recombinantly expressed in E. coli, a covalent modification was identified on the protein by intact mass spectrometry. The modification gave a mass shift of + 70 Da and peptide mapping analysis demonstrated that it located to the protein N-terminus and lysine side chains. The chemical composition of C4H6O was found to be the best candidate by peptide fragmentation using tandem mass spectrometry. The modification likely contains a carbonyl group, since the mass of the modification increased by 2 Da by reduction with borane pyridine complex and it reacted with 2,4-dinitrophenylhydrazine. On the basis of chemical and tandem mass spectrometry fragmentation behavior, the modification could be attributed to crotonaldehyde, a reactive compound formed during lipid peroxidation. A low recorded oxygen pressure in the reactor during protein expression could be linked to the formation of this compound. This study shows the importance of maintaining full control over all reaction parameters during recombinant protein production.


2007 ◽  
Vol 259 (1-3) ◽  
pp. 161-173 ◽  
Author(s):  
Zee-Yong Park ◽  
Rovshan Sadygov ◽  
Judy M. Clark ◽  
John I. Clark ◽  
John R. Yates

Sign in / Sign up

Export Citation Format

Share Document