Improved estimation of model quality using predicted inter-residue distance

Author(s):  
Lisha Ye ◽  
Peikun Wu ◽  
Zhenling Peng ◽  
Jianzhao Gao ◽  
Jian Liu ◽  
...  

Abstract Motivation Protein model quality assessment (QA) is an essential component in protein structure prediction, which aims to estimate the quality of a structure model and/or select the most accurate model out from a pool of structure models, without knowing the native structure. QA remains a challenging task in protein structure prediction. Results Based on the inter-residue distance predicted by the recent deep learning-based structure prediction algorithm trRosetta, we developed QDistance, a new approach to the estimation of both global and local qualities. QDistance works for both single-model and multi-models inputs. We designed several distance-based features to assess the agreement between the predicted and model-derived inter-residue distances. Together with a few widely used features, they are fed into a simple yet powerful linear regression model to infer the global QA scores. The local QA scores for each structure model are predicted based on a comparative analysis with a set of selected reference models. For multi-models input, the reference models are selected from the input based on the predicted global QA scores. For single-model input, the reference models are predicted by trRosetta. With the informative distance-based features, QDistance can predict the global quality with satisfactory accuracy. Benchmark tests on the CASP13 and the CAMEO structure models suggested that QDistance was competitive other methods. Blind tests in the CASP14 experiments showed that QDistance was robust and ranked among the top predictors. Especially, QDistance was the top 3 local QA method and made the most accurate local QA prediction for unreliable local region. Analysis showed that this superior performance can be attributed to the inclusion of the predicted inter-residue distance. Availability and Implementation http://yanglab.nankai.edu.cn/QDistance Supplementary information Supplementary data are available at Bioinformatics online.

2020 ◽  
Vol 36 (Supplement_1) ◽  
pp. i285-i291 ◽  
Author(s):  
Md Hossain Shuvo ◽  
Sutanu Bhattacharya ◽  
Debswapna Bhattacharya

Abstract Motivation Protein model quality estimation, in many ways, informs protein structure prediction. Despite their tight coupling, existing model quality estimation methods do not leverage inter-residue distance information or the latest technological breakthrough in deep learning that has recently revolutionized protein structure prediction. Results We present a new distance-based single-model quality estimation method called QDeep by harnessing the power of stacked deep residual neural networks (ResNets). Our method first employs stacked deep ResNets to perform residue-level ensemble error classifications at multiple predefined error thresholds, and then combines the predictions from the individual error classifiers for estimating the quality of a protein structural model. Experimental results show that our method consistently outperforms existing state-of-the-art methods including ProQ2, ProQ3, ProQ3D, ProQ4, 3DCNN, MESHI, and VoroMQA in multiple independent test datasets across a wide-range of accuracy measures; and that predicted distance information significantly contributes to the improved performance of QDeep. Availability and implementation https://github.com/Bhattacharya-Lab/QDeep. Supplementary information Supplementary data are available at Bioinformatics online.


PLoS ONE ◽  
2012 ◽  
Vol 7 (7) ◽  
pp. e38799 ◽  
Author(s):  
David Simoncini ◽  
Francois Berenger ◽  
Rojan Shrestha ◽  
Kam Y. J. Zhang

2019 ◽  
Author(s):  
Matthew Conover ◽  
Max Staples ◽  
Dong Si ◽  
Miao Sun ◽  
Renzhi Cao

AbstractQuality Assessment (QA) plays an important role in protein structure prediction. Traditional protein QA methods suffer from searching databases or comparing with other models for making predictions, which usually fail. We propose a novel protein single-model QA method which is built on a new representation that converts raw atom information into a series of carbon-alpha (Cα) atoms with side-chain information, defined by their dihedral angles and bond lengths to the prior residue. An LSTM network is used to predict the quality by treating each amino acid as a time-step and consider the final value returned by the LSTM cells. To the best of our knowledge, this is the first time anyone has attempted to use an LSTM model on the QA problem; furthermore, we use a new representation which has not been studied for QA. In addition to angles, we make use of sequence properties like secondary structure at each time-step, without using any database. Our model achieves an overall correlation of 0.651 on the CASP12 testing dataset. Our experiment points out new directions for QA problem and our method could be widely used for protein structure prediction problem. The software is freely available at GitHub:https://github.com/caorenzhi/AngularQA


2020 ◽  
Author(s):  
Fandi Wu ◽  
Jinbo Xu

AbstractMotivationTBM (template-based modeling) is a popular method for protein structure prediction. When very good templates are not available, it is challenging to identify the best templates, build accurate sequence-template alignments and construct 3D models from alignments.ResultsThis paper presents a new method NDThreader (New Deep-learning Threader) to address the challenges of TBM. DNThreader first employs DRNF (deep convolutional residual neural fields), which is an integration of deep ResNet (convolutional residue neural networks) and CRF (conditional random fields), to align a query protein to templates without using any distance information. Then NDThreader uses ADMM (alternating direction method of multipliers) and DRNF to further improve sequence-template alignments by making use of predicted distance potential. Finally NDThreader builds 3D models from a sequence-template alignment by feeding it and sequence co-evolution information into a deep ResNet to predict inter-atom distance distribution, which is then fed into PyRosetta for 3D model construction. Our experimental results on the CASP13 and CAMEO data show that our methods outperform existing ones such as CNFpred, HHpred, DeepThreader and CEthreader. NDThreader was blindly tested in CASP14 as a part of RaptorX server, which obtained the best GDT score among all CASP14 servers on the 58 TBM targets.Availability and Implementationavailable as a part of web server at http://[email protected] InformationSupplementary data are available online.


2019 ◽  
Author(s):  
Georg Kuenze ◽  
Jens Meiler

AbstractComputational methods that produce accurate protein structure models from limited experimental data, e.g. from nuclear magnetic resonance (NMR) spectroscopy, hold great potential for biomedical research. The NMR-assisted modeling challenge in CASP13 provided a blind test to explore the capabilities and limitations of current modeling techniques in leveraging NMR data which had high sparsity, ambiguity and error rate for protein structure prediction. We describe our approach to predict the structure of these proteins leveraging the Rosetta software suite. Protein structure models were predictedde novousing a two-stage protocol. First, low-resolution models were generated with the Rosettade novomethod guided by non-ambiguous nuclear Overhauser effect (NOE) contacts and residual dipolar coupling (RDC) restraints. Second, iterative model hybridization and fragment insertion with the Rosetta comparative modeling method was used to refine and regularize models guided by all ambiguous and non-ambiguous NOE contacts and RDCs. Nine out of 16 of the Rosettade novomodels had the correct fold (GDT-TS score >45) and in three cases high-resolution models were achieved (RMSD <3.5 Å). We also show that a meta-approach applying iterative Rosetta+NMR refinement on server-predicted models which employed non-NMR-contacts and structural templates leads to substantial improvement in model quality. Integrating these data-assisted refinement strategies with innovative non-data-assisted approaches which became possible in CASP13 such as high precision contact prediction will in the near future enable structure determination for large proteins that are outside of the realm of conventional NMR.


2018 ◽  
Vol 2018 ◽  
pp. 1-17 ◽  
Author(s):  
Sandhya Parasnath Dubey ◽  
S. Balaji ◽  
N. Gopalakrishna Kini ◽  
M. Sathish Kumar

Hydrophobic-Polar model is a simplified representation of Protein Structure Prediction (PSP) problem. However, even with the HP model, the PSP problem remains NP-complete. This work proposes a systematic and problem specific design for operators of the evolutionary program which hybrids with local search hill climbing, to efficiently explore the search space of PSP and thereby obtain an optimum conformation. The proposed algorithm achieves this by incorporating the following novel features: (i) new initialization method which generates only valid individuals with (rather than random) better fitness values; (ii) use of probability-based selection operators that limit the local convergence; (iii) use of secondary structure based mutation operator that makes the structure more closely to the laboratory determined structure; and (iv) incorporating all the above-mentioned features developed a complete two-tier framework. The developed framework builds the protein conformation on the square and triangular lattice. The test has been performed using benchmark sequences, and a comparative evaluation is done with various state-of-the-art algorithms. Moreover, in addition to hypothetical test sequences, we have tested protein sequences deposited in protein database repository. It has been observed that the proposed framework has shown superior performance regarding accuracy (fitness value) and speed (number of generations needed to attain the final conformation). The concepts used to enhance the performance are generic and can be used with any other population-based search algorithm such as genetic algorithm, ant colony optimization, and immune algorithm.


Author(s):  
Luciano A Abriata ◽  
Matteo Dal Peraro

Abstract Residue coevolution estimations coupled to machine learning methods are revolutionizing the ability of protein structure prediction approaches to model proteins that lack clear homologous templates in the Protein Data Bank (PDB). This has been patent in the last round of the Critical Assessment of Structure Prediction (CASP), which presented several very good models for the hardest targets. Unfortunately, literature reporting on these advances often lacks digests tailored to lay end users; moreover, some of the top-ranking predictors do not provide webservers that can be used by nonexperts. How can then end users benefit from these advances and correctly interpret the predicted models? Here we review the web resources that biologists can use today to take advantage of these state-of-the-art methods in their research, including not only the best de novo modeling servers but also datasets of models precomputed by experts for structurally uncharacterized protein families. We highlight their features, advantages and pitfalls for predicting structures of proteins without clear templates. We present a broad number of applications that span from driving forward biochemical investigations that lack experimental structures to actually assisting experimental structure determination in X-ray diffraction, cryo-EM and other forms of integrative modeling. We also discuss issues that must be considered by users yet still require further developments, such as global and residue-wise model quality estimates and sources of residue coevolution other than monomeric tertiary structure.


2015 ◽  
Vol 13 (04) ◽  
pp. 1550009 ◽  
Author(s):  
Christophe Guyeux ◽  
Jean-Marc Nicod ◽  
Laurent Philippe ◽  
Jacques M. Bahi

Self-avoiding walks (SAWs) are the source of very difficult problems in probability and enumerative combinatorics. They are of great interest as, for example, they are the basis of protein structure prediction (PSP) in bioinformatics. The authors of this paper have previously shown that, depending on the prediction algorithm, the sets of obtained walk conformations differ: For example, all the SAWs can be generated using stretching-based algorithms whereas only the unfoldable SAWs can be obtained with methods that iteratively fold the straight line. A deeper study of (non-)unfoldable SAWs is presented in this paper. The contribution is first a survey of what is currently known about these sets. In particular, we provide clear definitions of various subsets of SAWs related to pivot moves (unfoldable and non-unfoldable SAWs, etc.) and the first results that we have obtained, theoretically or computationally, on these sets. Then a new theorem on the number of non-unfoldable SAWs is demonstrated. Finally, a list of open questions is provided and the consequences on the PSP problem is proposed.


Sign in / Sign up

Export Citation Format

Share Document