scholarly journals Bayesian inference of protein structure from chemical shift data

Author(s):  
Lars A Bratholm ◽  
Anders Steen Christensen ◽  
Thomas Hamelryck ◽  
Jan H Jensen

Protein chemical shifts are routinely used to augment molecular mechanics force fields in protein structure simulations, with weights of the chemical shift restraints determined empirically. These weights, however, might not be an optimal descriptor of a given protein structure and predictive model, and a bias is introduced which might result in incorrect structures. In the inferential structure determination framework, both the unknown structure and the disagreement between experimental and back-calculated data are formulated as a joint probability distribution, thus utilizing the full information content of the data. Here, we present the formulation of such a probability distribution where the error in chemical shift prediction is described by either a Gaussian or Cauchy distribution. The methodology is demonstrated and compared to a set of empirically weighted potentials through Markov chain Monte Carlo simulations of three small proteins (ENHD, Protein G and the SMN Tudor Domain) using the PROFASI force field and the chemical shift predictor CamShift. Using a clustering-criterion for identifying the best structure, together with the addition of a solvent exposure scoring term, the simulations suggests that sampling both the structure and the uncertainties in chemical shift prediction leads more accurate structures compared to conventional methods using empirical determined weights. The Cauchy distribution, using either sampled uncertainties or predetermined weights, did, however, result in overall better convergence to the native fold, suggesting that both types of distribution might be useful in different aspects of the protein structure prediction.

2014 ◽  
Author(s):  
Lars A Bratholm ◽  
Anders Steen Christensen ◽  
Thomas Hamelryck ◽  
Jan H Jensen

Protein chemical shifts are routinely used to augment molecular mechanics force fields in protein structure simulations, with weights of the chemical shift restraints determined empirically. These weights, however, might not be an optimal descriptor of a given protein structure and predictive model, and a bias is introduced which might result in incorrect structures. In the inferential structure determination framework, both the unknown structure and the disagreement between experimental and back-calculated data are formulated as a joint probability distribution, thus utilizing the full information content of the data. Here, we present the formulation of such a probability distribution where the error in chemical shift prediction is described by either a Gaussian or Cauchy distribution. The methodology is demonstrated and compared to a set of empirically weighted potentials through Markov chain Monte Carlo simulations of three small proteins (ENHD, Protein G and the SMN Tudor Domain) using the PROFASI force field and the chemical shift predictor CamShift. Using a clustering-criterion for identifying the best structure, together with the addition of a solvent exposure scoring term, the simulations suggests that sampling both the structure and the uncertainties in chemical shift prediction leads more accurate structures compared to conventional methods using empirical determined weights. The Cauchy distribution, using either sampled uncertainties or predetermined weights, did, however, result in overall better convergence to the native fold, suggesting that both types of distribution might be useful in different aspects of the protein structure prediction.


2021 ◽  
Author(s):  
Ho-min Park ◽  
Yunseol Park ◽  
Joris Vankerschaver ◽  
Arnout Van Messem ◽  
Wesley De Neve ◽  
...  

Protein therapeutics play an important role in controlling the functions and activities of disease-causing proteins in modern medicine. Despite protein therapeutics having several advantages over traditional small-molecule therapeutics, further development has been hindered by drug complexity and delivery issues. However, recent progress in deep learning-based protein structure prediction approaches such as AlphaFold opens new opportunities to exploit the complexity of these macro-biomolecules for highly-specialised design to inhibit, regulate or even manipulate specific disease-causing proteins. Anti-CRISPR proteins are small proteins from bacteriophages that counter-defend against the prokaryotic adaptive immunity of CRISPR-Cas systems. They are unique examples of natural protein therapeutics that have been optimized by the host-parasite evolutionary arms race to inhibit a wide variety of host proteins. Here, we show that these Anti-CRISPR proteins display diverse inhibition mechanisms through accurate structural prediction and functional analysis. We find that these phage-derived proteins are extremely distinct in structure, some of which have no homologues in the current protein structure domain. Furthermore, we find a novel family of Anti-CRISPR proteins which are structurally homologous to the recently-discovered mechanism of manipulating host proteins through enzymatic activity, rather than through direct inference. Using highly accurate structure prediction, we present a wide variety of protein-manipulating strategies of anti-CRISPR proteins for future protein drug design.


2015 ◽  
Vol 32 (6) ◽  
pp. 843-849 ◽  
Author(s):  
Rhys Heffernan ◽  
Abdollah Dehzangi ◽  
James Lyons ◽  
Kuldip Paliwal ◽  
Alok Sharma ◽  
...  

Abstract Motivation: Solvent exposure of amino acid residues of proteins plays an important role in understanding and predicting protein structure, function and interactions. Solvent exposure can be characterized by several measures including solvent accessible surface area (ASA), residue depth (RD) and contact numbers (CN). More recently, an orientation-dependent contact number called half-sphere exposure (HSE) was introduced by separating the contacts within upper and down half spheres defined according to the Cα-Cβ (HSEβ) vector or neighboring Cα-Cα vectors (HSEα). HSEα calculated from protein structures was found to better describe the solvent exposure over ASA, CN and RD in many applications. Thus, a sequence-based prediction is desirable, as most proteins do not have experimentally determined structures. To our best knowledge, there is no method to predict HSEα and only one method to predict HSEβ. Results: This study developed a novel method for predicting both HSEα and HSEβ (SPIDER-HSE) that achieved a consistent performance for 10-fold cross validation and two independent tests. The correlation coefficients between predicted and measured HSEβ (0.73 for upper sphere, 0.69 for down sphere and 0.76 for contact numbers) for the independent test set of 1199 proteins are significantly higher than existing methods. Moreover, predicted HSEα has a higher correlation coefficient (0.46) to the stability change by residue mutants than predicted HSEβ (0.37) and ASA (0.43). The results, together with its easy Cα-atom-based calculation, highlight the potential usefulness of predicted HSEα for protein structure prediction and refinement as well as function prediction. Availability and implementation: The method is available at http://sparks-lab.org. Contact: [email protected] or [email protected] Supplementary information: Supplementary data are available at Bioinformatics online.


2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Sarah E. Biehn ◽  
Steffen Lindert

AbstractHydroxyl radical protein footprinting (HRPF) in combination with mass spectrometry reveals the relative solvent exposure of labeled residues within a protein, thereby providing insight into protein tertiary structure. HRPF labels nineteen residues with varying degrees of reliability and reactivity. Here, we are presenting a dynamics-driven HRPF-guided algorithm for protein structure prediction. In a benchmark test of our algorithm, usage of the dynamics data in a score term resulted in notable improvement of the root-mean-square deviations of the lowest-scoring ab initio models and improved the funnel-like metric Pnear for all benchmark proteins. We identified models with accurate atomic detail for three of the four benchmark proteins. This work suggests that HRPF data along with side chain dynamics sampled by a Rosetta mover ensemble can be used to accurately predict protein structure.


2017 ◽  
Vol 4 (04) ◽  
Author(s):  
Sakshi Chaudhary ◽  
Anil Kumar Singh ◽  
Jeshima Khan Yasin

Metallothioneins are a special group of small proteins capable of detoxifying non-essential metal ions present in excess within a plant cell. Metallothioneins are cysteine-rich diverse classes of heavy metal binding protein molecules which are essential for plant growth.These proteins are present in all taxa, except eubacteria. The similarity in protein sequences provides a basis for the method which predicts structural features of a protein with that of a known protein structure. Structural similarity of entire sequence or large sequence fragment enables prediction and modeling of entire structural domain, while distribution of local features of known protein structure make it possible to predict such features in structure of unknown or uncharacterised proteins.In this study, from available genomic resources metallothionein of pigeonpea was identified, structure of metallothionein was predicted and validated. We have presented a step-wise methodology to model a given protein and to validate the structures.


2021 ◽  
Author(s):  
Mariana Hoyer Moreira ◽  
Fabio C. L. Almeida ◽  
Tatiana Domitrovic ◽  
Fernando L. Palhano

Defensins are small proteins, usually ranging from 4 to 6 kDa, amphipathic, disulfide-rich, and with a small or even absent hydrophobic core. Since a hydrophobic core is generally found in globular proteins that fold in an aqueous solvent, the peculiar fold of defensins can challenge tertiary protein structure predictors. We performed a PDB-wide survey of small proteins (4-6 kDa) to understand the similarities of defensins with other small disulfide-rich proteins. We found no differences when we compared defensins with non-defensins regarding the proportion and exposition to the solvent of apolar, polar, and charged residues. Then we divided all small proteins (4-6 kDa) deposited in PDB into two groups, one group with at least one disulfide bond (bonded, defensins included) and another group without any disulfide bond (unbonded). The group of bonded proteins presented apolar residues more exposed to the solvent than the unbonded group. The ab initio algorithm for tertiary protein structure prediction Robetta was more accurate to predict unbonded than bonded proteins. Our work highlights one more layer of complexity for the tertiary protein prediction structure: small disulfide-rich proteins' ability to fold even with a poor hydrophobic core.


2018 ◽  
Author(s):  
Daniel R. F. Bonetti ◽  
Gesiel Rios Lopes ◽  
Alexandre C. B. Delbem ◽  
Paulo S. L. Souza ◽  
Kalinka C. Branco ◽  
...  

This paper compares the runtime of three distinct parallel algorithms for the evaluation of an ab initio and full-atom approach based on GA and celllist technique, in order to minimize the van der Waals energy. The three parallel algorithms are developed in C and use one of these programming models: MPI, OpenMP or hybrid (MPI+OpenMP). Our preliminary results show that van der Waals Energy are executed faster and with better speedups when using hybrid and more flexible parallel algorithms to predict the structure of larger proteins. We also show that for small proteins the communication of MPI imposes a high overhead for the parallel execution and, thus the OpenMP presents a better relation cost x benefit in such cases.


2007 ◽  
Vol 5 (21) ◽  
pp. 387-396 ◽  
Author(s):  
Glennie Helles

Protein structure prediction is one of the major challenges in bioinformatics today. Throughout the past five decades, many different algorithmic approaches have been attempted, and although progress has been made the problem remains unsolvable even for many small proteins. While the general objective is to predict the three-dimensional structure from primary sequence, our current knowledge and computational power are simply insufficient to solve a problem of such high complexity. Some prediction algorithms do, however, appear to perform better than others, although it is not always obvious which ones they are and it is perhaps even less obvious why that is. In this review, the reported performance results from 18 different recently published prediction algorithms are compared. Furthermore, the general algorithmic settings most likely responsible for the difference in the reported performance are identified, and the specific settings of each of the 18 prediction algorithms are also compared. The average normalized r.m.s.d. scores reported range from 11.17 to 3.48. With a performance measure including both r.m.s.d. scores and CPU time, the currently best-performing prediction algorithm is identified to be the I-TASSER algorithm. Two of the algorithmic settings—protein representation and fragment assembly—were found to have definite positive influence on the running time and the predicted structures, respectively. There thus appears to be a clear benefit from incorporating this knowledge in the design of new prediction algorithms.


2012 ◽  
Vol 10 (03) ◽  
pp. 1242003 ◽  
Author(s):  
JIANLIN CHENG ◽  
JESSE EICKHOLT ◽  
ZHENG WANG ◽  
XIN DENG

After decades of research, protein structure prediction remains a very challenging problem. In order to address the different levels of complexity of structural modeling, two types of modeling techniques — template-based modeling and template-free modeling — have been developed. Template-based modeling can often generate a moderate- to high-resolution model when a similar, homologous template structure is found for a query protein but fails if no template or only incorrect templates are found. Template-free modeling, such as fragment-based assembly, may generate models of moderate resolution for small proteins of low topological complexity. Seldom have the two techniques been integrated together to improve protein modeling. Here we develop a recursive protein modeling approach to selectively and collaboratively apply template-based and template-free modeling methods to model template-covered (i.e. certain) and template-free (i.e. uncertain) regions of a protein. A preliminary implementation of the approach was tested on a number of hard modeling cases during the 9th Critical Assessment of Techniques for Protein Structure Prediction (CASP9) and successfully improved the quality of modeling in most of these cases. Recursive modeling can significantly reduce the complexity of protein structure modeling and integrate template-based and template-free modeling to improve the quality and efficiency of protein structure prediction.


Sign in / Sign up

Export Citation Format

Share Document