Increasing the accuracy of protein loop structure prediction with evolutionary constraints

2018 ◽  
Vol 35 (15) ◽  
pp. 2585-2592 ◽  
Author(s):  
Claire Marks ◽  
Charlotte M Deane

Abstract Motivation Accurate prediction of loop structures remains challenging. This is especially true for long loops where the large conformational space and limited coverage of experimentally determined structures often leads to low accuracy. Co-evolutionary contact predictors, which provide information about the proximity of pairs of residues, have been used to improve whole-protein models generated through de novo techniques. Here we investigate whether these evolutionary constraints can enhance the prediction of long loop structures. Results As a first stage, we assess the accuracy of predicted contacts that involve loop regions. We find that these are less accurate than contacts in general. We also observe that some incorrectly predicted contacts can be identified as they are never satisfied in any of our generated loop conformations. We examined two different strategies for incorporating contacts, and on a test set of long loops (10 residues or more), both approaches improve the accuracy of prediction. For a set of 135 loops, contacts were predicted and hence our methods were applicable in 97 cases. Both strategies result in an increase in the proportion of near-native decoys in the ensemble, leading to more accurate predictions and in some cases improving the root-mean-square deviation of the final model by more than 3 Å. Supplementary information Supplementary data are available at Bioinformatics online.

2019 ◽  
Vol 36 (8) ◽  
pp. 2443-2450 ◽  
Author(s):  
Jun Liu ◽  
Xiao-Gen Zhou ◽  
Yang Zhang ◽  
Gui-Jun Zhang

Abstract Motivation Regions that connect secondary structure elements in a protein are known as loops, whose slight change will produce dramatic effect on the entire topology. This study investigates whether the accuracy of protein structure prediction can be improved using a loop-specific sampling strategy. Results A novel de novo protein structure prediction method that combines global exploration and loop perturbation is proposed in this study. In the global exploration phase, the fragment recombination and assembly are used to explore the massive conformational space and generate native-like topology. In the loop perturbation phase, a loop-specific local perturbation model is designed to improve the accuracy of the conformation and is solved by differential evolution algorithm. These two phases enable a cooperation between global exploration and local exploitation. The filtered contact information is used to construct the conformation selection model for guiding the sampling. The proposed CGLFold is tested on 145 benchmark proteins, 14 free modeling (FM) targets of CASP13 and 29 FM targets of CASP12. The experimental results show that the loop-specific local perturbation can increase the structure diversity and success rate of conformational update and gradually improve conformation accuracy. CGLFold obtains template modeling score ≥ 0.5 models on 95 standard test proteins, 7 FM targets of CASP13 and 9 FM targets of CASP12. Availability and implementation The source code and executable versions are freely available at https://github.com/iobio-zjut/CGLFold. Supplementary information Supplementary data are available at Bioinformatics online.


2011 ◽  
Vol 79 (8) ◽  
pp. 2403-2417 ◽  
Author(s):  
Juyong Lee ◽  
Jinhyuk Lee ◽  
Takeshi N. Sasaki ◽  
Masaki Sasai ◽  
Chaok Seok ◽  
...  

2015 ◽  
Vol 32 (6) ◽  
pp. 814-820 ◽  
Author(s):  
Gearóid Fox ◽  
Fabian Sievers ◽  
Desmond G. Higgins

Abstract Motivation: Multiple sequence alignments (MSAs) with large numbers of sequences are now commonplace. However, current multiple alignment benchmarks are ill-suited for testing these types of alignments, as test cases either contain a very small number of sequences or are based purely on simulation rather than empirical data. Results: We take advantage of recent developments in protein structure prediction methods to create a benchmark (ContTest) for protein MSAs containing many thousands of sequences in each test case and which is based on empirical biological data. We rank popular MSA methods using this benchmark and verify a recent result showing that chained guide trees increase the accuracy of progressive alignment packages on datasets with thousands of proteins. Availability and implementation: Benchmark data and scripts are available for download at http://www.bioinf.ucd.ie/download/ContTest.tar.gz. Contact: [email protected] Supplementary information: Supplementary data are available at Bioinformatics online.


2018 ◽  
Vol 35 (16) ◽  
pp. 2801-2808 ◽  
Author(s):  
Mikhail Karasikov ◽  
Guillaume Pagès ◽  
Sergei Grudinin

Abstract Motivation Protein quality assessment (QA) is a crucial element of protein structure prediction, a fundamental and yet open problem in structural bioinformatics. QA aims at ranking predicted protein models to select the best candidates. The assessment can be performed based either on a single model or on a consensus derived from an ensemble of models. The latter strategy can yield very high performance but substantially depends on the pool of available candidate models, which limits its applicability. Hence, single-model QA methods remain an important research target, also because they can assist the sampling of candidate models. Results We present a novel single-model QA method called SBROD. The SBROD (Smooth Backbone-Reliant Orientation-Dependent) method uses only the backbone protein conformation, and hence it can be applied to scoring coarse-grained protein models. The proposed method deduces its scoring function from a training set of protein models. The SBROD scoring function is composed of four terms related to different structural features: residue–residue orientations, contacts between backbone atoms, hydrogen bonding and solvent–solute interactions. It is smooth with respect to atomic coordinates and thus is potentially applicable to continuous gradient-based optimization of protein conformations. Furthermore, it can also be used for coarse-grained protein modeling and computational protein design. SBROD proved to achieve similar performance to state-of-the-art single-model QA methods on diverse datasets (CASP11, CASP12 and MOULDER). Availability and implementation The standalone application implemented in C++ and Python is freely available at https://gitlab.inria.fr/grudinin/sbrod and supported on Linux, MacOS and Windows. Supplementary information Supplementary data are available at Bioinformatics online.


Author(s):  
Amélie Barozet ◽  
Kevin Molloy ◽  
Marc Vaisset ◽  
Christophe Zanon ◽  
Pierre Fauret ◽  
...  

Abstract Summary MoMA-LoopSampler is a sampling method that globally explores the conformational space of flexible protein loops. It combines a large structural library of three-residue fragments and a novel reinforcement-learning-based approach to accelerate the sampling process while maintaining diversity. The method generates a set of statistically-likely loop states satisfying geometric constraints, and its ability to sample experimentally observed conformations has been demonstrated. This paper presents a web user interface to MoMA-LoopSampler through the illustration of a typical use-case. Availability MoMA-LoopSampler is freely available at: https://moma.laas.fr/applications/LoopSampler/ We recommend users to create an account, but anonymous access is possible. In most cases, jobs are completed within a few minutes. The waiting time may increase depending on the server load, but it very rarely exceeds an hour. For users requiring more intensive use, binaries can be provided upon request. Supplementary information Supplementary data are available at Bioinformatics online.


2016 ◽  
Author(s):  
Ling-Hong Hung ◽  
Ram Samudrala

AbstractBackgroundMany rice protein sequences are very different from the sequence of proteins with known structures. Homology modeling is not possible for many rice proteins. However, it is possible to use computational intensive de novo techniques to obtain protein models when the protein sequence cannot be mapped to a protein of known structure. The Nutritious Rice for the World project generated 10 billion models encompassing more than 60,000 small proteins and protein domains for the rice strains Oryza sativa and Oryza japonica.FindingsOver a period of 1.5 years, the volunteers of World Community Grid supported by IBM generated 10 billion candidate structures, a task that would have taken a single CPU on the order of 10 millennia. For each protein sequence, 5 top structures were chosen using a novel clustering methodology developed for analyzing large datasets. These are provided along with the entire set of 10 billion conformers.ConclusionsWe anticipate that the centroid models will be of use in visualizing and determining the role of rice proteins where the function is unknown. The entire set of conformers is unique in terms of size and that they were derived from sequences that lack detectable homologs. Large sets of de novo conformers are rare and we anticipate that this set will be useful for benchmarking and developing new protein structure prediction methodologies.


1997 ◽  
Vol 44 (3) ◽  
pp. 389-422 ◽  
Author(s):  
A Koliński ◽  
J Skolnick

A high coordination lattice discretization of protein conformational space is described. The model allows discrete representation of polypeptide chains of globular proteins and small macromolecular assemblies with an accuracy comparable to the accuracy of crystallographic structures. Knowledge based force field, that consists of sequence specific short range interactions, cooperative model of hydrogen bond network and tertiary one body, two body and multibody interactions, is outlined and discussed. A model of stochastic dynamics for these protein models is also described. The proposed method enables moderate resolution tertiary structure prediction of simple and small globular proteins. Its applicability in structure prediction increases significantly when evolutionary information is exploited or/and when sparse experimental data are available. The model responds correctly to sequence mutations and could be used at early stages of a computer aided protein design and protein redesign. Computational speed, associated with the discrete structure of the model, enables studies of the long time dynamics of polypeptides and proteins and quite detailed theoretical studies of thermodynamics of nontrivial protein models.


2020 ◽  
Author(s):  
Lim Heo ◽  
Collin Arbour ◽  
Michael Feig

Protein structures provide valuable information for understanding biological processes. Protein structures can be determined by experimental methods such as X-ray crystallography, nuclear magnetic resonance (NMR) spectroscopy, or cryogenic electron microscopy. As an alternative, in silico methods can be used to predict protein structures. Those methods utilize protein structure databases for structure prediction via template-based modeling or for training machine-learning models to generate predictions. Structure prediction for proteins distant from proteins with known structures often results in lower accuracy with respect to the true physiological structures. Physics-based protein model refinement methods can be applied to improve model accuracy in the predicted models. Refinement methods rely on conformational sampling around the predicted structures, and if structures closer to the native states are sampled, improvements in the model quality become possible. Molecular dynamics simulations have been especially successful for improving model qualities but although consistent refinement can be achieved, the improvements in model qualities are still moderate. To extend the refinement performance of a simulation-based protocol, we explored new schemes that focus on an optimized use of biasing functions and the application of increased simulation temperatures. In addition, we tested the use of alternative initial models so that the simulations can explore conformational space more broadly. Based on the insight of this analysis we are proposing a new refinement protocol that significantly outperformed previous state-of-the-art molecular dynamics simulation-based protocols in the benchmark tests described here. <br>


Sign in / Sign up

Export Citation Format

Share Document