scholarly journals De novo protein structure prediction by incremental inter-residue geometries prediction and model quality assessment using deep learning

2022 ◽  
Author(s):  
Jun Liu ◽  
Guangxing He ◽  
Kailong Zhao ◽  
Guijun Zhang

Motivation: The successful application of deep learning has promoted progress in protein model quality assessment. How to use model quality assessment to further improve the accuracy of protein structure prediction, especially not reliant on the existing templates, is helpful for unraveling the folding mechanism. Here, we investigate whether model quality assessment can be introduced into structure prediction to form a closed-loop feedback, and iteratively improve the accuracy of de novo protein structure prediction. Results: In this study, we propose a de novo protein structure prediction method called RocketX. In RocketX, a feedback mechanism is constructed through the geometric constraint prediction network GeomNet, the structural simulation module, and the model quality evaluation network EmaNet. In GeomNet, the co-evolutionary features extracted from MSA that search from the sequence databases are sent to an improved residual neural network to predict the inter-residue geometric constraints. The structure model is folded based on the predicted geometric constraints. In EmaNet, the 1D and 2D features are extracted from the folded model and sent to the deep residual neural network to estimate the inter-residue distance deviation and per-residue lDDT of the model, which will be fed back to GeomNet as dynamic features to correct the geometries prediction and progressively improve model accuracy. RocketX is tested on 483 benchmark proteins and 20 FM targets of CASP14. Experimental results show that the closed-loop feedback mechanism significantly contributes to the performance of RocketX, and the prediction accuracy of RocketX outperforms that of the state-of-the-art methods trRosetta (without templates) and RaptorX. In addition, the blind test results on CAMEO show that although no template is used, the prediction accuracy of RocketX on medium and hard targets is comparable to the advanced methods that integrate templates.

2011 ◽  
Vol 79 (S10) ◽  
pp. 172-184 ◽  
Author(s):  
Jingfen Zhang ◽  
Qingguo Wang ◽  
Kittinun Vantasin ◽  
Jiong Zhang ◽  
Zhiquan He ◽  
...  

2019 ◽  
Author(s):  
Matthew Conover ◽  
Max Staples ◽  
Dong Si ◽  
Miao Sun ◽  
Renzhi Cao

AbstractQuality Assessment (QA) plays an important role in protein structure prediction. Traditional protein QA methods suffer from searching databases or comparing with other models for making predictions, which usually fail. We propose a novel protein single-model QA method which is built on a new representation that converts raw atom information into a series of carbon-alpha (Cα) atoms with side-chain information, defined by their dihedral angles and bond lengths to the prior residue. An LSTM network is used to predict the quality by treating each amino acid as a time-step and consider the final value returned by the LSTM cells. To the best of our knowledge, this is the first time anyone has attempted to use an LSTM model on the QA problem; furthermore, we use a new representation which has not been studied for QA. In addition to angles, we make use of sequence properties like secondary structure at each time-step, without using any database. Our model achieves an overall correlation of 0.651 on the CASP12 testing dataset. Our experiment points out new directions for QA problem and our method could be widely used for protein structure prediction problem. The software is freely available at GitHub:https://github.com/caorenzhi/AngularQA


2019 ◽  
Author(s):  
Georg Kuenze ◽  
Jens Meiler

AbstractComputational methods that produce accurate protein structure models from limited experimental data, e.g. from nuclear magnetic resonance (NMR) spectroscopy, hold great potential for biomedical research. The NMR-assisted modeling challenge in CASP13 provided a blind test to explore the capabilities and limitations of current modeling techniques in leveraging NMR data which had high sparsity, ambiguity and error rate for protein structure prediction. We describe our approach to predict the structure of these proteins leveraging the Rosetta software suite. Protein structure models were predictedde novousing a two-stage protocol. First, low-resolution models were generated with the Rosettade novomethod guided by non-ambiguous nuclear Overhauser effect (NOE) contacts and residual dipolar coupling (RDC) restraints. Second, iterative model hybridization and fragment insertion with the Rosetta comparative modeling method was used to refine and regularize models guided by all ambiguous and non-ambiguous NOE contacts and RDCs. Nine out of 16 of the Rosettade novomodels had the correct fold (GDT-TS score >45) and in three cases high-resolution models were achieved (RMSD <3.5 Å). We also show that a meta-approach applying iterative Rosetta+NMR refinement on server-predicted models which employed non-NMR-contacts and structural templates leads to substantial improvement in model quality. Integrating these data-assisted refinement strategies with innovative non-data-assisted approaches which became possible in CASP13 such as high precision contact prediction will in the near future enable structure determination for large proteins that are outside of the realm of conventional NMR.


Author(s):  
Luciano A Abriata ◽  
Matteo Dal Peraro

Abstract Residue coevolution estimations coupled to machine learning methods are revolutionizing the ability of protein structure prediction approaches to model proteins that lack clear homologous templates in the Protein Data Bank (PDB). This has been patent in the last round of the Critical Assessment of Structure Prediction (CASP), which presented several very good models for the hardest targets. Unfortunately, literature reporting on these advances often lacks digests tailored to lay end users; moreover, some of the top-ranking predictors do not provide webservers that can be used by nonexperts. How can then end users benefit from these advances and correctly interpret the predicted models? Here we review the web resources that biologists can use today to take advantage of these state-of-the-art methods in their research, including not only the best de novo modeling servers but also datasets of models precomputed by experts for structurally uncharacterized protein families. We highlight their features, advantages and pitfalls for predicting structures of proteins without clear templates. We present a broad number of applications that span from driving forward biochemical investigations that lack experimental structures to actually assisting experimental structure determination in X-ray diffraction, cryo-EM and other forms of integrative modeling. We also discuss issues that must be considered by users yet still require further developments, such as global and residue-wise model quality estimates and sources of residue coevolution other than monomeric tertiary structure.


2020 ◽  
Author(s):  
Jianquan Ouyang ◽  
Ningqiao Huang ◽  
Yunqi Jiang

Abstract Quality assessment of protein tertiary structure prediction models, in which structures of the best quality are selected from decoys, is a major challenge in protein structure prediction, and is crucial to determine a model’s utility and potential applications. Estimating the quality of a single model predicts the model’s quality based on the single model itself. In general, the Pearson correlation value of the quality assessment method increases in tandem with an increase in the quality of the model pool. However, there is no consensus regarding the best method to select a few good models from the poor quality model pool. In this work, we introduce a novel single-model quality assessment method for poor quality models that uses simple linear combinations of six features. We perform weighted search and linear regression on a large dataset of models from the 12th Critical Assessment of Protein Structure Prediction (CASP12) and benchmark the results on CASP13 models. We demonstrate that our method achieves outstanding performance on poor quality models.


2019 ◽  
Vol 7 (1) ◽  
pp. 1-9 ◽  
Author(s):  
Matthew Conover ◽  
Max Staples ◽  
Dong Si ◽  
Miao Sun ◽  
Renzhi Cao

AbstractQuality Assessment (QA) plays an important role in protein structure prediction. Traditional multimodel QA method usually suffer from searching databases or comparing with other models for making predictions, which usually fail when the poor quality models dominate the model pool. We propose a novel protein single-model QA method which is built on a new representation that converts raw atom information into a series of carbon-alpha (Cα) atoms with side-chain information, defined by their dihedral angles and bond lengths to the prior residue. An LSTM network is used to predict the quality by treating each amino acid as a time-step and consider the final value returned by the LSTM cells. To the best of our knowledge, this is the first time anyone has attempted to use an LSTM model on the QA problem; furthermore, we use a new representation which has not been studied for QA. In addition to angles, we make use of sequence properties like secondary structure parsed from protein structure at each time-step without using any database, which is different than all existed QA methods. Our model achieves an overall correlation of 0.651 on the CASP12 testing dataset. Our experiment points out new directions for QA problem and our method could be widely used for protein structure prediction problem. The software is freely available at GitHub: https://github.com/caorenzhi/AngularQA


2020 ◽  
Author(s):  
Jianquan Ouyang ◽  
Ningqiao Huang ◽  
Yunqi Jiang

Abstract Background: Quality assessment of protein tertiary structure prediction models, in which structures of the best quality are selected from decoys, is a major challenge in protein structure prediction, and is crucial to determine a model’s utility and potential applications. Estimating the quality of a single model predicts the model’s quality based on the single model itself. In general, the Pearson correlation value of the quality assessment method increases in tandem with an increase in the quality of the model pool. However, there is no consensus regarding the best method to select a few good models from the poor quality model pool.Results: We introduce a novel single-model quality assessment method for poor quality models that uses simple linear combinations of six features. We perform weighted search and linear regression on a large dataset of models from the 12th Critical Assessment of Protein Structure Prediction (CASP12) and benchmark the results on CASP13 models. We demonstrate that our method achieves outstanding performance on poor quality models.Conclusions: According to results of poor protein structure assessment based on six features, contact prediction and relying on fewer prediction features can improve selection accuracy.


PLoS ONE ◽  
2015 ◽  
Vol 10 (4) ◽  
pp. e0123998 ◽  
Author(s):  
Saulo H. P. de Oliveira ◽  
Jiye Shi ◽  
Charlotte M. Deane

Sign in / Sign up

Export Citation Format

Share Document