KORP: knowledge-based 6D potential for fast protein and loop modeling

José Ramón López-Blanco; Pablo Chacón

doi:10.1093/bioinformatics/btz026

KORP: knowledge-based 6D potential for fast protein and loop modeling

Bioinformatics ◽

10.1093/bioinformatics/btz026 ◽

2019 ◽

Vol 35 (17) ◽

pp. 3013-3019 ◽

Cited By ~ 13

Author(s):

José Ramón López-Blanco ◽

Pablo Chacón

Keyword(s):

Structure Prediction ◽

Protein Structures ◽

Joint Probability ◽

Protein Modeling ◽

Supplementary Information ◽

Joint Probability Distribution ◽

Loop Modeling ◽

Statistical Potentials ◽

Knowledge Based ◽

Backbone Atoms

Abstract Motivation Knowledge-based statistical potentials constitute a simpler and easier alternative to physics-based potentials in many applications, including folding, docking and protein modeling. Here, to improve the effectiveness of the current approximations, we attempt to capture the six-dimensional nature of residue–residue interactions from known protein structures using a simple backbone-based representation. Results We have developed KORP, a knowledge-based pairwise potential for proteins that depends on the relative position and orientation between residues. Using a minimalist representation of only three backbone atoms per residue, KORP utilizes a six-dimensional joint probability distribution to outperform state-of-the-art statistical potentials for native structure recognition and best model selection in recent critical assessment of protein structure prediction and loop-modeling benchmarks. Compared with the existing methods, our side-chain independent potential has a lower complexity and better efficiency. The superior accuracy and robustness of KORP represent a promising advance for protein modeling and refinement applications that require a fast but highly discriminative energy function. Availability and implementation http://chaconlab.org/modeling/korp. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

All-Atom Four-Body Knowledge-Based Statistical Potentials to Distinguish Native Protein Structures from Nonnative Folds

BioMed Research International ◽

10.1155/2017/5760612 ◽

2017 ◽

Vol 2017 ◽

pp. 1-17 ◽

Cited By ~ 3

Author(s):

Majid Masso

Keyword(s):

Structure Prediction ◽

Protein Structures ◽

Binding Energies ◽

Coarse Grained ◽

Amber Force Field ◽

Statistical Potentials ◽

Knowledge Based ◽

Native Proteins ◽

Interacting Atoms ◽

Atomic Coordinates

Recent advances in understanding protein folding have benefitted from coarse-grained representations of protein structures. Empirical energy functions derived from these techniques occasionally succeed in distinguishing native structures from their corresponding ensembles of nonnative folds or decoys which display varying degrees of structural dissimilarity to the native proteins. Here we utilized atomic coordinates of single protein chains, comprising a large diverse training set, to develop and evaluate twelve all-atom four-body statistical potentials obtained by exploring alternative values for a pair of inherent parameters. Delaunay tessellation was performed on the atomic coordinates of each protein to objectively identify all quadruplets of interacting atoms, and atomic potentials were generated via statistical analysis of the data and implementation of the inverted Boltzmann principle. Our potentials were evaluated using benchmarking datasets from Decoys-‘R’-Us, and comparisons were made with twelve other physics- and knowledge-based potentials. Ranking 3rd, our best potential tied CHARMM19 and surpassed AMBER force field potentials. We illustrate how a generalized version of our potential can be used to empirically calculate binding energies for target-ligand complexes, using HIV-1 protease-inhibitor complexes for a practical application. The combined results suggest an accurate and efficient atomic four-body statistical potential for protein structure prediction and assessment.

Download Full-text

KORP-PL: a coarse-grained knowledge-based scoring function for protein–ligand interactions

Bioinformatics ◽

10.1093/bioinformatics/btaa748 ◽

2020 ◽

Author(s):

Maria Kadukova ◽

Karina dos Santos Machado ◽

Pablo Chacón ◽

Sergei Grudinin

Keyword(s):

Joint Probability ◽

Coarse Grained ◽

Supplementary Information ◽

Joint Probability Distribution ◽

Pose Prediction ◽

Scoring Functions ◽

Widespread Application ◽

Knowledge Based ◽

Protein Ligand Interactions ◽

Ligand Interactions

Abstract Motivation Despite the progress made in studying protein–ligand interactions and the widespread application of docking and affinity prediction tools, improving their precision and efficiency still remains a challenge. Computational approaches based on the scoring of docking conformations with statistical potentials constitute a popular alternative to more accurate but costly physics-based thermodynamic sampling methods. In this context, a minimalist and fast sidechain-free knowledge-based potential with a high docking and screening power can be very useful when screening a big number of putative docking conformations. Results Here, we present a novel coarse-grained potential defined by a 3D joint probability distribution function that only depends on the pairwise orientation and position between protein backbone and ligand atoms. Despite its extreme simplicity, our approach yields very competitive results with the state-of-the-art scoring functions, especially in docking and screening tasks. For example, we observed a twofold improvement in the median 5% enrichment factor on the DUD-E benchmark compared to Autodock Vina results. Moreover, our results prove that a coarse sidechain-free potential is sufficient for a very successful docking pose prediction. Availabilityand implementation The standalone version of KORP-PL with the corresponding tests and benchmarks are available at https://team.inria.fr/nano-d/korp-pl/ and https://chaconlab.org/modeling/korp-pl. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

A Constraint Solver for Flexible Protein Model

Journal of Artificial Intelligence Research ◽

10.1613/jair.4193 ◽

2013 ◽

Vol 48 ◽

pp. 953-1000 ◽

Cited By ~ 5

Author(s):

F. Campeotto ◽

A. Dal Palù ◽

A. Dovier ◽

F. Fioretto ◽

E. Pontelli

Keyword(s):

Structure Prediction ◽

Dimensional Space ◽

Protein Structures ◽

Empirical Evaluation ◽

Geometric Constraints ◽

Conformational Space ◽

Loop Modeling ◽

Geometric Properties ◽

Constraint Solver ◽

Multi Body

This paper proposes the formalization and implementation of a novel class of constraints aimed at modeling problems related to placement of multi-body systems in the 3-dimensional space. Each multi-body is a system composed of body elements, connected by joint relationships and constrained by geometric properties. The emphasis of this investigation is the use of multi-body systems to model native conformations of protein structures---where each body represents an entity of the protein (e.g., an amino acid, a small peptide) and the geometric constraints are related to the spatial properties of the composing atoms. The paper explores the use of the proposed class of constraints to support a variety of different structural analysis of proteins, such as loop modeling and structure prediction. The declarative nature of a constraint-based encoding provides elaboration tolerance and the ability to make use of any additional knowledge in the analysis studies. The filtering capabilities of the proposed constraints also allow to control the number of representative solutions that are withdrawn from the conformational space of the protein, by means of criteria driven by uniform distribution sampling principles. In this scenario it is possible to select the desired degree of precision and/or number of solutions. The filtering component automatically excludes configurations that violate the spatial and geometric properties of the composing multi-body system. The paper illustrates the implementation of a constraint solver based on the multi-body perspective and its empirical evaluation on protein structure analysis problems.

Download Full-text

Sequence alignment using machine learning for accurate template-based protein structure prediction

Bioinformatics ◽

10.1093/bioinformatics/btz483 ◽

2019 ◽

Vol 36 (1) ◽

pp. 104-111

Author(s):

Shuichiro Makigaki ◽

Takashi Ishida

Keyword(s):

Machine Learning ◽

Structure Prediction ◽

Tertiary Structure ◽

Structural Alignment ◽

Protein Structures ◽

Substitution Matrix ◽

Detection Methods ◽

Supplementary Information ◽

Homology Detection ◽

Sequence Alignments

Abstract Motivation Template-based modeling, the process of predicting the tertiary structure of a protein by using homologous protein structures, is useful if good templates can be found. Although modern homology detection methods can find remote homologs with high sensitivity, the accuracy of template-based models generated from homology-detection-based alignments is often lower than that from ideal alignments. Results In this study, we propose a new method that generates pairwise sequence alignments for more accurate template-based modeling. The proposed method trains a machine learning model using the structural alignment of known homologs. It is difficult to directly predict sequence alignments using machine learning. Thus, when calculating sequence alignments, instead of a fixed substitution matrix, this method dynamically predicts a substitution score from the trained model. We evaluate our method by carefully splitting the training and test datasets and comparing the predicted structure’s accuracy with that of state-of-the-art methods. Our method generates more accurate tertiary structure models than those produced from alignments obtained by other methods. Availability and implementation https://github.com/shuichiro-makigaki/exmachina. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Smotifs as structural local descriptors of supersecondary elements: classification, completeness and applications

Bio-Algorithms and Med-Systems ◽

10.1515/bams-2014-0016 ◽

2014 ◽

Vol 10 (4) ◽

Author(s):

Jaume Bonet ◽

Andras Fiser ◽

Baldo Oliva ◽

Narcis Fernandez-Fuentes

Keyword(s):

Protein Design ◽

Structure Prediction ◽

Protein Structures ◽

Regular Structure ◽

Loop Structure ◽

Apparent Lack ◽

Knowledge Based ◽

Limits Of Knowledge ◽

Folding Dynamics ◽

And Function

AbstractProtein structures are made up of periodic and aperiodic structural elements (i.e., α-helices, β-strands and loops). Despite the apparent lack of regular structure, loops have specific conformations and play a central role in the folding, dynamics, and function of proteins. In this article, we reviewed our previous works in the study of protein loops as local supersecondary structural motifs or Smotifs. We reexamined our works about the structural classification of loops (ArchDB) and its application to loop structure prediction (ArchPRED), including the assessment of the limits of knowledge-based loop structure prediction methods. We finalized this article by focusing on the modular nature of proteins and how the concept of Smotifs provides a convenient and practical approach to decompose proteins into strings of concatenated Smotifs and how can this be used in computational protein design and protein structure prediction.

Download Full-text

BiORSEO: a bi-objective method to predict RNA secondary structures with pseudoknots using RNA 3D modules

Bioinformatics ◽

10.1093/bioinformatics/btz962 ◽

2020 ◽

Vol 36 (8) ◽

pp. 2451-2457

Author(s):

Louis Becquey ◽

Eric Angel ◽

Fariza Tahi

Keyword(s):

Structure Prediction ◽

Secondary Structure Prediction ◽

State Of The Art ◽

Secondary Structures ◽

Supplementary Information ◽

Large Set ◽

Objective Method ◽

Rna Secondary Structures ◽

Knowledge Based ◽

Module Size

Abstract Motivation RNA loops have been modelled and clustered from solved 3D structures into ordered collections of recurrent non-canonical interactions called ‘RNA modules’, available in databases. This work explores what information from such modules can be used to improve secondary structure prediction. We propose a bi-objective method for predicting RNA secondary structures by minimizing both an energy-based and a knowledge-based potential. The tool, called BiORSEO, outputs secondary structures corresponding to the optimal solutions from the Pareto set. Results We compare several approaches to predict secondary structures using inserted RNA modules information: two module data sources, Rna3Dmotif and the RNA 3D Motif Atlas, and different ways to score the module insertions: module size, module complexity or module probability according to models like JAR3D and BayesPairing. We benchmark them against a large set of known secondary structures, including some state-of-the-art tools, and comment on the usefulness of the half physics-based, half data-based approach. Availability and implementation The software is available for download on the EvryRNA website, as well as the datasets. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Using sequence signatures and kink-turn motifs in knowledge-based statistical potentials for RNA structure prediction

Nucleic Acids Research ◽

10.1093/nar/gkx045 ◽

2017 ◽

Vol 45 (9) ◽

pp. 5414-5422 ◽

Cited By ~ 14

Author(s):

Cigdem Sevim Bayrak ◽

Namhee Kim ◽

Tamar Schlick

Keyword(s):

Rna Structure ◽

Structure Prediction ◽

Statistical Potentials ◽

Rna Structure Prediction ◽

Knowledge Based ◽

Sequence Signatures

Download Full-text

Corrigendum to “All-Atom Four-Body Knowledge-Based Statistical Potentials to Distinguish Native Protein Structures from Nonnative Folds”

BioMed Research International ◽

10.1155/2018/7108272 ◽

2018 ◽

Vol 2018 ◽

pp. 1-4

Author(s):

Majid Masso

Keyword(s):

Protein Structures ◽

Native Protein ◽

Statistical Potentials ◽

Knowledge Based ◽

Body Knowledge

Download Full-text

FASPR: an open-source tool for fast and accurate protein side-chain packing

Bioinformatics ◽

10.1093/bioinformatics/btaa234 ◽

2020 ◽

Vol 36 (12) ◽

pp. 3758-3765 ◽

Cited By ~ 6

Author(s):

Xiaoqiang Huang ◽

Robin Pearce ◽

Yang Zhang

Keyword(s):

Protein Structure ◽

Protein Design ◽

Structure Prediction ◽

Protein Structures ◽

Scoring Function ◽

Supplementary Information ◽

Side Chain ◽

Chain Packing ◽

And Function ◽

Side Chain Packing

Abstract Motivation Protein structure and function are essentially determined by how the side-chain atoms interact with each other. Thus, accurate protein side-chain packing (PSCP) is a critical step toward protein structure prediction and protein design. Despite the importance of the problem, however, the accuracy and speed of current PSCP programs are still not satisfactory. Results We present FASPR for fast and accurate PSCP by using an optimized scoring function in combination with a deterministic searching algorithm. The performance of FASPR was compared with four state-of-the-art PSCP methods (CISRR, RASP, SCATD and SCWRL4) on both native and non-native protein backbones. For the assessment on native backbones, FASPR achieved a good performance by correctly predicting 69.1% of all the side-chain dihedral angles using a stringent tolerance criterion of 20°, compared favorably with SCWRL4, CISRR, RASP and SCATD which successfully predicted 68.8%, 68.6%, 67.8% and 61.7%, respectively. Additionally, FASPR achieved the highest speed for packing the 379 test protein structures in only 34.3 s, which was significantly faster than the control methods. For the assessment on non-native backbones, FASPR showed an equivalent or better performance on I-TASSER predicted backbones and the backbones perturbed from experimental structures. Detailed analyses showed that the major advantage of FASPR lies in the optimal combination of the dead-end elimination and tree decomposition with a well optimized scoring function, which makes FASPR of practical use for both protein structure modeling and protein design studies. Availability and implementation The web server, source code and datasets are freely available at https://zhanglab.ccmb.med.umich.edu/FASPR and https://github.com/tommyhuangthu/FASPR. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Highly accurate sequence-based prediction of half-sphere exposures of amino acid residues in proteins

Bioinformatics ◽

10.1093/bioinformatics/btv665 ◽

2015 ◽

Vol 32 (6) ◽

pp. 843-849 ◽

Cited By ~ 51

Author(s):

Rhys Heffernan ◽

Abdollah Dehzangi ◽

James Lyons ◽

Kuldip Paliwal ◽

Alok Sharma ◽

...

Keyword(s):

Protein Structure ◽

Amino Acid ◽

Structure Prediction ◽

Protein Structures ◽

Correlation Coefficients ◽

Accessible Surface Area ◽

Solvent Accessible Surface Area ◽

Supplementary Information ◽

Amino Acid Residues ◽

Solvent Exposure

Abstract Motivation: Solvent exposure of amino acid residues of proteins plays an important role in understanding and predicting protein structure, function and interactions. Solvent exposure can be characterized by several measures including solvent accessible surface area (ASA), residue depth (RD) and contact numbers (CN). More recently, an orientation-dependent contact number called half-sphere exposure (HSE) was introduced by separating the contacts within upper and down half spheres defined according to the Cα-Cβ (HSEβ) vector or neighboring Cα-Cα vectors (HSEα). HSEα calculated from protein structures was found to better describe the solvent exposure over ASA, CN and RD in many applications. Thus, a sequence-based prediction is desirable, as most proteins do not have experimentally determined structures. To our best knowledge, there is no method to predict HSEα and only one method to predict HSEβ. Results: This study developed a novel method for predicting both HSEα and HSEβ (SPIDER-HSE) that achieved a consistent performance for 10-fold cross validation and two independent tests. The correlation coefficients between predicted and measured HSEβ (0.73 for upper sphere, 0.69 for down sphere and 0.76 for contact numbers) for the independent test set of 1199 proteins are significantly higher than existing methods. Moreover, predicted HSEα has a higher correlation coefficient (0.46) to the stability change by residue mutants than predicted HSEβ (0.37) and ASA (0.43). The results, together with its easy Cα-atom-based calculation, highlight the potential usefulness of predicted HSEα for protein structure prediction and refinement as well as function prediction. Availability and implementation: The method is available at http://sparks-lab.org. Contact: [email protected] or [email protected] Supplementary information: Supplementary data are available at Bioinformatics online.

Download Full-text