All-Atom Four-Body Knowledge-Based Statistical Potentials to Distinguish Native Protein Structures from Nonnative Folds

Recent advances in understanding protein folding have benefitted from coarse-grained representations of protein structures. Empirical energy functions derived from these techniques occasionally succeed in distinguishing native structures from their corresponding ensembles of nonnative folds or decoys which display varying degrees of structural dissimilarity to the native proteins. Here we utilized atomic coordinates of single protein chains, comprising a large diverse training set, to develop and evaluate twelve all-atom four-body statistical potentials obtained by exploring alternative values for a pair of inherent parameters. Delaunay tessellation was performed on the atomic coordinates of each protein to objectively identify all quadruplets of interacting atoms, and atomic potentials were generated via statistical analysis of the data and implementation of the inverted Boltzmann principle. Our potentials were evaluated using benchmarking datasets from Decoys-‘R’-Us, and comparisons were made with twelve other physics- and knowledge-based potentials. Ranking 3rd, our best potential tied CHARMM19 and surpassed AMBER force field potentials. We illustrate how a generalized version of our potential can be used to empirically calculate binding energies for target-ligand complexes, using HIV-1 protease-inhibitor complexes for a practical application. The combined results suggest an accurate and efficient atomic four-body statistical potential for protein structure prediction and assessment.

Download Full-text

KORP: knowledge-based 6D potential for fast protein and loop modeling

Bioinformatics ◽

10.1093/bioinformatics/btz026 ◽

2019 ◽

Vol 35 (17) ◽

pp. 3013-3019 ◽

Cited By ~ 13

Author(s):

José Ramón López-Blanco ◽

Pablo Chacón

Keyword(s):

Structure Prediction ◽

Protein Structures ◽

Joint Probability ◽

Protein Modeling ◽

Supplementary Information ◽

Joint Probability Distribution ◽

Loop Modeling ◽

Statistical Potentials ◽

Knowledge Based ◽

Backbone Atoms

Abstract Motivation Knowledge-based statistical potentials constitute a simpler and easier alternative to physics-based potentials in many applications, including folding, docking and protein modeling. Here, to improve the effectiveness of the current approximations, we attempt to capture the six-dimensional nature of residue–residue interactions from known protein structures using a simple backbone-based representation. Results We have developed KORP, a knowledge-based pairwise potential for proteins that depends on the relative position and orientation between residues. Using a minimalist representation of only three backbone atoms per residue, KORP utilizes a six-dimensional joint probability distribution to outperform state-of-the-art statistical potentials for native structure recognition and best model selection in recent critical assessment of protein structure prediction and loop-modeling benchmarks. Compared with the existing methods, our side-chain independent potential has a lower complexity and better efficiency. The superior accuracy and robustness of KORP represent a promising advance for protein modeling and refinement applications that require a fast but highly discriminative energy function. Availability and implementation http://chaconlab.org/modeling/korp. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Are distance-dependent statistical potentials considering three interacting bodies superior to two-body statistical potentials for protein structure prediction?

Journal of Bioinformatics and Computational Biology ◽

10.1142/s021972001450022x ◽

2014 ◽

Vol 12 (05) ◽

pp. 1450022 ◽

Cited By ~ 3

Author(s):

Hamed Tabatabaei Ghomi ◽

Jared J. Thompson ◽

Markus A. Lill

Keyword(s):

Free Energy ◽

Protein Structure ◽

Protein Structure Prediction ◽

Structure Prediction ◽

Protein Structures ◽

Coarse Grained ◽

Scoring Functions ◽

Statistical Potentials ◽

Three Body ◽

Multi Body

Distance-based statistical potentials have long been used to model condensed matter systems, e.g. as scoring functions in differentiating native-like protein structures from decoys. These scoring functions are based on the assumption that the total free energy of the protein can be calculated as the sum of pairwise free energy contributions derived from a statistical analysis of pair-distribution functions. However, this fundamental assumption has been challenged theoretically. In fact the free energy of a system with N particles is only exactly related to the N-body distribution function. Based on this argument coarse-grained multi-body statistical potentials have been developed to capture higher-order interactions. Having a coarse representation of the protein and using geometric contacts instead of pairwise interaction distances renders these models insufficient in modeling details of multi-body effects. In this study, we investigated if extending distance-dependent pairwise atomistic statistical potentials to corresponding interaction functions that are conditional on a third interacting body, defined as quasi-three-body statistical potentials, could model details of three-body interactions. We also tested if this approach could improve the predictive capabilities of statistical scoring functions for protein structure prediction. We analyzed the statistical dependency between two simultaneous pairwise interactions and showed that there is surprisingly little if any dependency of a third interacting site on pairwise atomistic statistical potentials. Also the protein structure prediction performance of these quasi-three-body potentials is comparable with their corresponding two-body counterparts. The scoring functions developed in this study showed better or comparable performances compared to some widely used scoring functions for protein structure prediction.

Download Full-text

Smotifs as structural local descriptors of supersecondary elements: classification, completeness and applications

Bio-Algorithms and Med-Systems ◽

10.1515/bams-2014-0016 ◽

2014 ◽

Vol 10 (4) ◽

Author(s):

Jaume Bonet ◽

Andras Fiser ◽

Baldo Oliva ◽

Narcis Fernandez-Fuentes

Keyword(s):

Protein Design ◽

Structure Prediction ◽

Protein Structures ◽

Regular Structure ◽

Loop Structure ◽

Apparent Lack ◽

Knowledge Based ◽

Limits Of Knowledge ◽

Folding Dynamics ◽

And Function

AbstractProtein structures are made up of periodic and aperiodic structural elements (i.e., α-helices, β-strands and loops). Despite the apparent lack of regular structure, loops have specific conformations and play a central role in the folding, dynamics, and function of proteins. In this article, we reviewed our previous works in the study of protein loops as local supersecondary structural motifs or Smotifs. We reexamined our works about the structural classification of loops (ArchDB) and its application to loop structure prediction (ArchPRED), including the assessment of the limits of knowledge-based loop structure prediction methods. We finalized this article by focusing on the modular nature of proteins and how the concept of Smotifs provides a convenient and practical approach to decompose proteins into strings of concatenated Smotifs and how can this be used in computational protein design and protein structure prediction.

Download Full-text

Using sequence signatures and kink-turn motifs in knowledge-based statistical potentials for RNA structure prediction

Nucleic Acids Research ◽

10.1093/nar/gkx045 ◽

2017 ◽

Vol 45 (9) ◽

pp. 5414-5422 ◽

Cited By ~ 14

Author(s):

Cigdem Sevim Bayrak ◽

Namhee Kim ◽

Tamar Schlick

Keyword(s):

Rna Structure ◽

Structure Prediction ◽

Statistical Potentials ◽

Rna Structure Prediction ◽

Knowledge Based ◽

Sequence Signatures

Download Full-text

Corrigendum to “All-Atom Four-Body Knowledge-Based Statistical Potentials to Distinguish Native Protein Structures from Nonnative Folds”

BioMed Research International ◽

10.1155/2018/7108272 ◽

2018 ◽

Vol 2018 ◽

pp. 1-4

Author(s):

Majid Masso

Keyword(s):

Protein Structures ◽

Native Protein ◽

Statistical Potentials ◽

Knowledge Based ◽

Body Knowledge

Download Full-text

Modeling of Disordered Protein Structures Using Monte Carlo Simulations and Knowledge-Based Statistical Force Fields

10.20944/preprints201812.0193.v1 ◽

2018 ◽

Author(s):

Maciej Pawel Ciemny ◽

Aleksandra Elzbieta Badaczewska-Dawid ◽

Monika Pikuzinska ◽

Andrzej Kolinski ◽

Sebastian Kmiecik

Keyword(s):

Experimental Data ◽

Monte Carlo ◽

Protein Folding ◽

Case Studies ◽

Protein Structures ◽

Coarse Grained ◽

Computational Time ◽

Modeling Tools ◽

Disordered Protein ◽

Knowledge Based

The description of protein disordered states is important for understanding protein folding mechanisms and their functions. In this short review, we briefly describe a simulation approach to modeling disordered protein interactions and unfolded states of globular proteins. It is based on the CABS coarse-grained protein model that uses a Monte Carlo (MC) sampling scheme and a knowledge-based statistical force field. We review several case studies showing that description of protein disordered states resulting from CABS simulations is consistent with experimental data. The case studies comprise investigations of protein-peptide binding and protein folding processes. The CABS model has been recently made available as the simulation engine of multiscale modeling tools enabling studies of protein-peptide docking and protein flexibility. Those tools offer customization of the modeling process, driving the conformational search using distance restraints, reconstruction of selected models to all-atom resolution and studies of large protein systems in a reasonable computational time. Therefore, CABS can be combined in integrative modeling pipelines incorporating experimental data and other modeling tools of various resolution.

Download Full-text

Knowledge-based entropies improve the identification of native protein structures

Proceedings of the National Academy of Sciences ◽

10.1073/pnas.1613331114 ◽

2017 ◽

Vol 114 (11) ◽

pp. 2928-2933 ◽

Cited By ~ 14

Author(s):

Kannan Sankar ◽

Kejue Jia ◽

Robert L. Jernigan

Keyword(s):

Amino Acids ◽

Structure Prediction ◽

Protein Structures ◽

3D Structure ◽

Model Assessment ◽

Native Protein ◽

Solvent Exposure ◽

Alternative Structures ◽

Knowledge Based ◽

Protein Model

Evaluating protein structures requires reliable free energies with good estimates of both potential energies and entropies. Although there are many demonstrated successes from using knowledge-based potential energies, computing entropies of proteins has lagged far behind. Here we take an entirely different approach and evaluate knowledge-based conformational entropies of proteins based on the observed frequencies of contact changes between amino acids in a set of 167 diverse proteins, each of which has two alternative structures. The results show that charged and polar interactions break more often than hydrophobic pairs. This pattern correlates strongly with the average solvent exposure of amino acids in globular proteins, as well as with polarity indices and the sizes of the amino acids. Knowledge-based entropies are derived by using the inverse Boltzmann relationship, in a manner analogous to the way that knowledge-based potentials have been extracted. Including these new knowledge-based entropies almost doubles the performance of knowledge-based potentials in selecting the native protein structures from decoy sets. Beyond the overall energy–entropy compensation, a similar compensation is seen for individual pairs of interacting amino acids. The entropies in this report have immediate applications for 3D structure prediction, protein model assessment, and protein engineering and design.

Download Full-text

Comparative roles of charge,π, and hydrophobic interactions in sequence-dependent phase separation of intrinsically disordered proteins

Proceedings of the National Academy of Sciences ◽

10.1073/pnas.2008122117 ◽

2020 ◽

Vol 117 (46) ◽

pp. 28795-28805

Author(s):

Suman Das ◽

Yi-Hsuan Lin ◽

Robert M. Vernon ◽

Julie D. Forman-Kay ◽

Hue Sun Chan

Keyword(s):

Phase Separation ◽

Electrostatic Interactions ◽

Intrinsically Disordered Proteins ◽

Protein Structures ◽

Coarse Grained ◽

Disordered Proteins ◽

Chain Model ◽

Statistical Potentials ◽

Intrinsically Disordered ◽

Sequence Dependent

Endeavoring toward a transferable, predictive coarse-grained explicit-chain model for biomolecular condensates underlain by liquid–liquid phase separation (LLPS) of proteins, we conducted multiple-chain simulations of the N-terminal intrinsically disordered region (IDR) of DEAD-box helicase Ddx4, as a test case, to assess roles of electrostatic, hydrophobic, cation–π, and aromatic interactions in amino acid sequence-dependent LLPS. We evaluated three different residue–residue interaction schemes with a shared electrostatic potential. Neither a common hydrophobicity scheme nor one augmented with arginine/lysine-aromatic cation–π interactions consistently accounted for available experimental LLPS data on the wild-type, a charge-scrambled, a phenylalanine-to-alanine (FtoA), and an arginine-to-lysine (RtoK) mutant of Ddx4 IDR. In contrast, interactions based on contact statistics among folded globular protein structures reproduce the overall experimental trend, including that the RtoK mutant has a much diminished LLPS propensity. Consistency between simulation and experiment was also found for RtoK mutants of P-granule protein LAF-1, underscoring that, to a degree, important LLPS-driving π-related interactions are embodied in classical statistical potentials. Further elucidation is necessary, however, especially of phenylalanine’s role in condensate assembly because experiments on FtoA and tyrosine-to-phenylalanine mutants suggest that LLPS-driving phenylalanine interactions are significantly weaker than posited by common statistical potentials. Protein–protein electrostatic interactions are modulated by relative permittivity, which in general depends on aqueous protein concentration. Analytical theory suggests that this dependence entails enhanced interprotein interactions in the condensed phase but more favorable protein–solvent interactions in the dilute phase. The opposing trends lead to only a modest overall impact on LLPS.

Download Full-text

Anisotropic coarse-grained statistical potentials improve the ability to identify nativelike protein structures

The Journal of Chemical Physics ◽

10.1063/1.1561616 ◽

2003 ◽

Vol 118 (16) ◽

pp. 7658 ◽

Cited By ~ 42

Author(s):

N.-V. Buchete ◽

J. E. Straub ◽

D. Thirumalai

Keyword(s):

Protein Structures ◽

Coarse Grained ◽

Statistical Potentials

Download Full-text

Modeling of Disordered Protein Structures Using Monte Carlo Simulations and Knowledge-Based Statistical Force Fields

International Journal of Molecular Sciences ◽

10.3390/ijms20030606 ◽

2019 ◽

Vol 20 (3) ◽

pp. 606 ◽

Cited By ~ 18

Author(s):

Maciej Ciemny ◽

Aleksandra Badaczewska-Dawid ◽

Monika Pikuzinska ◽

Andrzej Kolinski ◽

Sebastian Kmiecik

Keyword(s):

Experimental Data ◽

Monte Carlo ◽

Protein Folding ◽

Case Studies ◽

Protein Structures ◽

Coarse Grained ◽

Computational Time ◽

Modeling Tools ◽

Disordered Protein ◽

Knowledge Based

The description of protein disordered states is important for understanding protein folding mechanisms and their functions. In this short review, we briefly describe a simulation approach to modeling protein interactions, which involve disordered peptide partners or intrinsically disordered protein regions, and unfolded states of globular proteins. It is based on the CABS coarse-grained protein model that uses a Monte Carlo (MC) sampling scheme and a knowledge-based statistical force field. We review several case studies showing that description of protein disordered states resulting from CABS simulations is consistent with experimental data. The case studies comprise investigations of protein–peptide binding and protein folding processes. The CABS model has been recently made available as the simulation engine of multiscale modeling tools enabling studies of protein–peptide docking and protein flexibility. Those tools offer customization of the modeling process, driving the conformational search using distance restraints, reconstruction of selected models to all-atom resolution, and simulation of large protein systems in a reasonable computational time. Therefore, CABS can be combined in integrative modeling pipelines incorporating experimental data and other modeling tools of various resolution.

Download Full-text