Protein Structure Prediction with Mass Spectrometry Data

Author(s):  
Sarah E. Biehn ◽  
Steffen Lindert

Knowledge of protein structure is crucial to our understanding of biological function and is routinely used in drug discovery. High-resolution techniques to determine the three-dimensional atomic coordinates of proteins are available. However, such methods are frequently limited by experimental challenges such as sample quantity, target size, and efficiency. Structural mass spectrometry (MS) is a technique in which structural features of proteins are elucidated quickly and relatively easily. Computational techniques that convert sparse MS data into protein models that demonstrate agreement with the data are needed. This review features cutting-edge computational methods that predict protein structure from MS data such as chemical cross-linking, hydrogen–deuterium exchange, hydroxyl radical protein footprinting, limited proteolysis, ion mobility, and surface-induced dissociation. Additionally, we address future directions for protein structure prediction with sparse MS data. Expected final online publication date for the Annual Review of Physical Chemistry, Volume 73 is April 2022. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.

Biotechnology ◽  
2019 ◽  
pp. 156-184
Author(s):  
Hirak Jyoti Chakraborty ◽  
Aditi Gangopadhyay ◽  
Sayak Ganguli ◽  
Abhijit Datta

The great disagreement between the number of known protein sequences and the number of experimentally determined protein structures indicate an enormous necessity of rapid and accurate protein structure prediction methods. Computational techniques such as comparative modeling, threading and ab initio modelling allow swift protein structure prediction with sufficient accuracy. The three phases of computational protein structure prediction comprise: the pre-modelling analysis phase, model construction and post-modelling refinement. Protein modelling is primarily comparative or ab initio. Comparative or template-based methods such as homology and threading-based modelling require structural templates for constructing the structure of a target sequence. The ab initio is a template-free modelling approach which proceeds by satisfying various physics-based and knowledge-based parameters. The chapter will elaborate on the three phases of modelling, the programs available for performing each, issues, possible solutions and future research areas.


Author(s):  
Hirak Jyoti Chakraborty ◽  
Aditi Gangopadhyay ◽  
Sayak Ganguli ◽  
Abhijit Datta

The great disagreement between the number of known protein sequences and the number of experimentally determined protein structures indicate an enormous necessity of rapid and accurate protein structure prediction methods. Computational techniques such as comparative modeling, threading and ab initio modelling allow swift protein structure prediction with sufficient accuracy. The three phases of computational protein structure prediction comprise: the pre-modelling analysis phase, model construction and post-modelling refinement. Protein modelling is primarily comparative or ab initio. Comparative or template-based methods such as homology and threading-based modelling require structural templates for constructing the structure of a target sequence. The ab initio is a template-free modelling approach which proceeds by satisfying various physics-based and knowledge-based parameters. The chapter will elaborate on the three phases of modelling, the programs available for performing each, issues, possible solutions and future research areas.


2016 ◽  
Vol 1 ◽  
pp. 24 ◽  
Author(s):  
Adam Belsom ◽  
Michael Schneider ◽  
Lutz Fischer ◽  
Mahmoud Mabrouk ◽  
Kolja Stahl ◽  
...  

Determining the structure of a protein by any method requires various contributions from experimental and computational sides. In a recent study, high-density cross-linking/mass spectrometry (HD-CLMS) data in combination with ab initio structure prediction determined the structure of human serum albumin (HSA) domains, with an RMSD to X-ray structure of up to 2.5 Å, or 3.4 Å in the context of blood serum. This paper reports the blind test on the readiness of this technology through the help of Critical Assessment of protein Structure Prediction (CASP). We identified between 201-381 unique residue pairs at an estimated 5% FDR (at link level albeit with missing site assignment precision evaluation), for four target proteins. HD-CLMS proved reliable once crystal structures were released. However, improvements in structure prediction using cross-link data were slight. We identified two reasons for this. Spread of cross-links along the protein sequence and the tightness of the spatial constraints must be improved. However, for the selected targets even ideal contact data derived from crystal structures did not allow modellers to arrive at the observed structure. Consequently, the progress of HD-CLMS in conjunction with computational modeling methods as a structure determination method, depends on advances on both arms of this hybrid approach.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Siyuan Liu ◽  
Tong Wang ◽  
Qijiang Xu ◽  
Bin Shao ◽  
Jian Yin ◽  
...  

Abstract Background Fragment libraries play a key role in fragment-assembly based protein structure prediction, where protein fragments are assembled to form a complete three-dimensional structure. Rich and accurate structural information embedded in fragment libraries has not been systematically extracted and used beyond fragment assembly. Methods To better leverage the valuable structural information for protein structure prediction, we extracted seven types of structural information from fragment libraries. We broadened the usage of such structural information by transforming fragment libraries into protein-specific potentials for gradient-descent based protein folding and encoding fragment libraries as structural features for protein property prediction. Results Fragment libraires improved the accuracy of protein folding and outperformed state-of-the-art algorithms with respect to predicted properties, such as torsion angles and inter-residue distances. Conclusion Our work implies that the rich structural information extracted from fragment libraries can complement sequence-derived features to help protein structure prediction.


Proteins are essential and are present in all life forms and determining its structure is cumbersome, laborious and time consuming. Hence, over 3-4 decades, researchers have been using computational techniques such as template and template free based protein structure prediction from its sequence. This research focuses on developing a conceptual basis for establishing an invariant fragment library which can be used for protein structure prediction. Based on 20 amino acids, fragments can be classified into lengths of 3 to 41 size. Further, they can be classified based on the identical number of amino acids present in the fragment. This encompasses theoretically the number of fragments that can exist and in no way represent the actual possible fragments that can exist in nature. Invariant fragments are ones which are rigid in structure 3-dimensionally and do not change. A formula was arrived at to determine all possible permutations that can exist for length 3 to 41 based on the 20 amino acids. 100 proteins from the Protein Data Bank were downloaded, broken into fragments of 3 to 41 resulting in a total of 6102,102 fragments using Asynchronous Distributed Processing. Then identical fragments in sequence were superimposed and Root Mean Square Deviation (RMSD) values were obtained resulting in roughly 3.2% of the original framgnets.. t-score and z-scores were obtained from which Skewness, Kurtosis and Excess Kurtosis were determined. For invariance, skewness cutoff was set at + 0.1 and using the excess kurtosis, fragments whose distribution were either leptokurtic or platykurtic and were within + 1 standard deviation of the mean value were considered as invariant i.e., if there were no outliers in the distribution and if most of the t-score or z-score values were centered around its average value. Using these cutoff values, fragments were classified and deposited into an invariant fragment library. Roughly 3,81,799 invariant fragments were obtained which is roughly 6.3% of the total number of initial fragments. This would be way less than the number of fragments that one has to either use in homology or de-novo modelling thereby reducing the design space. Further work is underway to set up the entire invariant fragment library which can then be used to predict protein structure by template-based approach.


2016 ◽  
Author(s):  
Adam Belsom ◽  
Michael Schneider ◽  
Lutz Fischer ◽  
Oliver Brock ◽  
Juri Rappsilber

SummaryDetermining the structure of a protein by any method requires varies contributions from experimental and computational sides. In a recent study, high-density cross-linking/mass spectrometry data in combination with ab initio structure prediction by conformational space search determined the structure of human serum albumin (HSA) domains, with an RMSD to X-ray structure of up to 2.53 Å, or 3.38 Å in the context of blood serum. This paper reports the blind test on the readiness of this technology through the help of Critical Assessment of protein Structure Prediction (CASP). We identified between 201-381 unique residue pairs at an estimated 5% FDR (at link level albeit with missing site assignment precision evaluation), for the four proteins that we provided data for. This equates to between 0.63-1.20 proximal residues per residue, which is comparable to that obtained in the HSA study (0.85 links per residue at 5% FDR). Nevertheless, initial results of CASP11 have suggested that improvements in structure prediction using cross-link data are slight. Most significantly, however, CASP11 revealed to us some of the current limitations of cross-linking, spelling out areas in which the method must develop in future: links spread unevenly over sequence and beta sheets both lacked links and suffered from weak definition of observed links over structure. With CASP12 taking place this year and biannually in the future, blind testing low-resolution structure analysis tools is a worthwhile and feasible undertaking. Data are available via ProteomeXchange with identifier PXD003643.The abbreviations used areCLMScross-linking/mass spectrometry;NHSN-hydroxysuccinimide;NMRnuclear magnetic resonance;sulfo-SDAsulfo-NHSdiazirine, sulfosuccinimidyl 4,4’-azipentanoate;FDRfalse discovery rate;MBSmodel-based search;HSAhuman serum albumin;RMSDroot-mean-square deviation;CASPCritical Assessment of protein Structure Prediction;Tristris(hydroxymethyl)aminomethane;PESpolyethersulphone;IAAiodoacetamide;LTQlinear trap quadrupole;MS2tandem MS scan;LC-MSliquid chromatography mass spectrometry;FMfree modelling.


Sign in / Sign up

Export Citation Format

Share Document