Data Mining in Protein Identification by Tandem Mass Spectrometry

Author(s):  
Haipeng Wang

Protein identification (sequencing) by tandem mass spectrometry is a fundamental technique for proteomics which studies structures and functions of proteins in large scale and acts as a complement to genomics. Analysis and interpretation of vast amounts of spectral data generated in proteomics experiments present unprecedented challenges and opportunities for data mining in areas such as data preprocessing, peptide-spectrum matching, results validation, peptide fragmentation pattern discovery and modeling, and post-translational modification (PTM) analysis. This article introduces the basic concepts and terms of protein identification and briefly reviews the state-of-the-art relevant data mining applications. It also outlines challenges and future potential hot spots in this field.

2007 ◽  
Vol 05 (02a) ◽  
pp. 297-311 ◽  
Author(s):  
CHUNGONG YU ◽  
YU LIN ◽  
SHIWEI SUN ◽  
JINJIN CAI ◽  
JINGFEN ZHANG ◽  
...  

In protein identification by tandem mass spectrometry, it is critical to accurately predict the theoretical spectrum for a peptide sequence. To date, the widely-used database searching methods adopted simple statistical models for predicting. For some peptide, these models usually yield a theoretical spectrum with a significant deviation from the experimental one. In this paper, in order to derive an improved predicting model, we utilized a non-linear programming model to quantify the factors impacting peptide fragmentation. Then, an iterative algorithm was proposed to solve this optimization problem. Upon a training set of 1803 spectra, the experimental result showed a good agreement with some known principles about peptide fragmentation, such as the tendency to cleave at the middle of peptide, and Pro's preference of the N-terminal cleavage. Moreover, upon a testing set of 941 spectra, comparison of the predicted spectra against the experimental ones showed that this method can generate reasonable predictions. The results in this paper can offer help to both database searching and de novo methods.


1999 ◽  
Vol 121 (1) ◽  
pp. 7-12 ◽  
Author(s):  
D. Figeys ◽  
R. Aebersold

The comprehensive analysis of biological systems requires a combination of genomic and proteomic efforts. The large-scale application of current genomic technologies provides complete genomic DNA sequences, sequence tags for expressed genes (EST’s), and quantitative profiles of expressed genes at the mRNA level. In contrast, protein analytical technology lacks the sensitivity and the sample throughput for the systematic analysis of all the proteins expressed by a tissue or cell. The sensitivity of protein analysis technology is primarily limited by the loss of analytes, due to adsorption to surfaces, and sample contamination during handling. Here we summarize our work on the development and use of microfabricated fluidic systems for the manipulation of minute amounts of peptides and delivery to an electrospray ionization tandem mass spectrometer. New data are also presented that further demonstrate the potential of these novel approaches. Specifically, we describe the use of microfabricated devices as modules to deliver femtomole amounts of protein digests to the mass spectrometer for protein identification. We also describe the use of a microfabricated module for the generation of solvent gradients at nl/min flow rates for gradient chromatography-tandem mass spectrometry. The use of microfabricated fluidic systems reduces the risk of sample contamination and sample loss due to adsorption to wetted surfaces. The ability to assemble dedicated modular systems and to operate them automatically makes the use of microfabricated systems attractive for the sensitive and large-scale analysis of proteins.


Amino Acids ◽  
2021 ◽  
Author(s):  
Magdalena Widgren Sandberg ◽  
Jakob Bunkenborg ◽  
Stine Thyssen ◽  
Martin Villadsen ◽  
Thomas Kofoed

AbstractGranulocyte-macrophage colony-stimulating factor (GM-CSF) is a cytokine and a white blood cell growth factor that has found usage as a therapeutic protein. During analysis of different fermentation batches of GM-CSF recombinantly expressed in E. coli, a covalent modification was identified on the protein by intact mass spectrometry. The modification gave a mass shift of + 70 Da and peptide mapping analysis demonstrated that it located to the protein N-terminus and lysine side chains. The chemical composition of C4H6O was found to be the best candidate by peptide fragmentation using tandem mass spectrometry. The modification likely contains a carbonyl group, since the mass of the modification increased by 2 Da by reduction with borane pyridine complex and it reacted with 2,4-dinitrophenylhydrazine. On the basis of chemical and tandem mass spectrometry fragmentation behavior, the modification could be attributed to crotonaldehyde, a reactive compound formed during lipid peroxidation. A low recorded oxygen pressure in the reactor during protein expression could be linked to the formation of this compound. This study shows the importance of maintaining full control over all reaction parameters during recombinant protein production.


2008 ◽  
Vol 14 (1) ◽  
pp. 49-59 ◽  
Author(s):  
Eduarda M.P. Silva ◽  
Pedro Domingues ◽  
João P.C. Tomé ◽  
M. Amparo F. Faustino ◽  
M. Graça P.M.S. Neves ◽  
...  

β-Nitroalkenyl meso-tetraphenylporphyrins [β-TPPCHC(NO2)R)], as free-bases and Zn(II) complexes, were studied by electrospray mass spectrometry (ESI-MS). Under this ionisation condition the [M + H]+ ions are formed. The fragmentation pattern of the resulting [M + H]+ ions were studied by electrospray tandem mass spectrometry (ESI-MS/MS). The ESI-MS/MS of β-nitroalkenylporphyrins, either as free-bases or as Zn(II) complexes, show several interesting features, distinct from the typical behaviour of nitro compounds. For the studied compounds, common main fragmentation patterns are observed, namely characteristic losses of NO2•, HNO2, 2OH•, RNO2, RCNO, RCNO2, RCH2NO2, C6H5• plus NO2• and the formation of the protonated macrocycle, [TPP + H]+ or [ZnTPP + H]+. However, depending on the presence or absence of the metal and the nature of the R substituent, important differences are observed on the relative abundances of the ions formed by the same fragmentation pathway. The presence of bromine in the alkenyl group leads to a peculiar behaviour, since the main fragmentation pattern corresponds to the combined elimination of the bromine atom with the typical nitro group fragments. When R = Br, the loss of the nitro group occurs in low relative abundance (11-16%). However, when R = CH3, the relative abundance of the ion due to the loss of HNO2 changes drastically from 100%, observed for the free-base porphyrin, to 29% in the case of the Zn(II) complex. These variations of the relative abundance of the fragment corresponding to the loss of the nitro moiety (typically considered as a diagnostic fragment) can induce to an erroneous interpretation of their MS/MS spectra. Some fragmentations are observed only for the free-base porphyrins, namely the loss of •CH(NO2)R and HNO2 plus C2H2, while the loss of OH•, H2O, OH• plus H2O and RCCH plus H2O is observed only for the complexes. Unusual and unexpected fragmentations are also observed, namely the losses of RCNO, RCNO2 and HNO2 plus C2H2. This work demonstrates that valuable structural information about the β-nitroalkenyl substituents linked to meso-tetraarylporphyrins can be achieved using MS/MS. These results can also be useful for the interpretation of the mass spectra of other nitroalkenyl substituted compounds.


Sign in / Sign up

Export Citation Format

Share Document