Fast and effective protein model refinement by deep graph neural networks

Mapping Intimacies ◽

10.1101/2020.12.10.419994 ◽

2020 ◽

Author(s):

Xiaoyang Jing ◽

Jinbo Xu

Keyword(s):

Neural Networks ◽

Structure Prediction ◽

Initial Model ◽

Model Quality ◽

Model Refinement ◽

Protein Model ◽

Improve Model ◽

Graph Neural Networks ◽

Improved Model ◽

Better Than

AbstractProtein structure prediction has been greatly improved, but there are still a good portion of predicted models that do not have very high quality. Protein model refinement is one of the methods that may further improve model quality. Nevertheless, it is very challenging to refine a protein model towards better quality. Currently the most successful refinement methods rely on extensive conformation sampling and thus, takes hours or days to refine even a single protein model. Here we propose a fast and effective method for protein model refinement with very limited conformation sampling. Our method applies GNN (graph neural networks) to predict refined inter-atom distance probability distribution from an initial model and then rebuilds the model using the predicted distance as restraints. On the CASP13 refinement targets our method may refine models with comparable quality as the two leading human groups (Feig and Baker) and greatly outperforms the others. On the CASP14 refinement targets our method is only second to Feig’s method, comparable to Baker’s method and much better than the others (who worsened instead of improved model quality). Our method achieves this result by generating only 5 refined models for an initial model, which can be done in ~15 minutes. Our study also shows that GNN performs much better than convolutional residual neural networks for protein model refinement when conformation sampling is limited.AvailabilityThe code will be released once the manuscript is published and available at http://[email protected]

Download Full-text

QDeep: distance-based protein model quality estimation by residue-level ensemble error classifications using stacked deep residual neural networks

Bioinformatics ◽

10.1093/bioinformatics/btaa455 ◽

2020 ◽

Vol 36 (Supplement_1) ◽

pp. i285-i291 ◽

Cited By ~ 1

Author(s):

Md Hossain Shuvo ◽

Sutanu Bhattacharya ◽

Debswapna Bhattacharya

Keyword(s):

Neural Networks ◽

Protein Structure ◽

Protein Structure Prediction ◽

Structure Prediction ◽

Residue Level ◽

Supplementary Information ◽

Model Quality ◽

Quality Estimation ◽

Distance Information ◽

Protein Model

Abstract Motivation Protein model quality estimation, in many ways, informs protein structure prediction. Despite their tight coupling, existing model quality estimation methods do not leverage inter-residue distance information or the latest technological breakthrough in deep learning that has recently revolutionized protein structure prediction. Results We present a new distance-based single-model quality estimation method called QDeep by harnessing the power of stacked deep residual neural networks (ResNets). Our method first employs stacked deep ResNets to perform residue-level ensemble error classifications at multiple predefined error thresholds, and then combines the predictions from the individual error classifiers for estimating the quality of a protein structural model. Experimental results show that our method consistently outperforms existing state-of-the-art methods including ProQ2, ProQ3, ProQ3D, ProQ4, 3DCNN, MESHI, and VoroMQA in multiple independent test datasets across a wide-range of accuracy measures; and that predicted distance information significantly contributes to the improved performance of QDeep. Availability and implementation https://github.com/Bhattacharya-Lab/QDeep. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

QDeep: distance-based protein model quality estimation by residue-level ensemble error classifications using stacked deep residual neural networks

10.1101/2020.01.31.928622 ◽

2020 ◽

Author(s):

Md Hossain Shuvo ◽

Sutanu Bhattacharya ◽

Debswapna Bhattacharya

Keyword(s):

Neural Networks ◽

Protein Structure ◽

Protein Structure Prediction ◽

Structure Prediction ◽

Residue Level ◽

Estimation Methods ◽

Model Quality ◽

Quality Estimation ◽

Distance Information ◽

Protein Model

AbstractMotivationProtein model quality estimation, in many ways, informs protein structure prediction. Despite their tight coupling, existing model quality estimation methods do not leverage inter-residue distance information or the latest technological breakthrough in deep learning that has recently revolutionized protein structure prediction.ResultsWe present a new distance-based single-model quality estimation method called QDeep by harnessing the power of stacked deep residual neural networks (ResNets). Our method first employs stacked deep ResNets to perform residue-level ensemble error classifications at multiple predefined error thresholds, and then combines the predictions from the individual error classifiers for estimating the quality of a protein structural model. Experimental results show that our method consistently out-performs existing state-of-the-art methods including ProQ2, ProQ3, ProQ3D, ProQ4, 3DCNN, MESHI, and VoroMQA in multiple independent test datasets across a wide-range of accuracy measures; and that predicted distance information significantly contributes to the improved performance of QDeep.Availabilityhttps://github.com/Bhattacharya-Lab/[email protected]

Download Full-text

Fast and effective protein model refinement using deep graph neural networks

Nature Computational Science ◽

10.1038/s43588-021-00098-9 ◽

2021 ◽

Cited By ~ 1

Author(s):

Xiaoyang Jing ◽

Jinbo Xu

Keyword(s):

Neural Networks ◽

Model Refinement ◽

Protein Model ◽

Graph Neural Networks

Download Full-text

Improved Sampling Strategies for Protein Model Refinement based on Molecular Dynamics Simulation

10.26434/chemrxiv.13299197.v1 ◽

2020 ◽

Author(s):

Lim Heo ◽

Collin Arbour ◽

Michael Feig

Keyword(s):

Molecular Dynamics ◽

Molecular Dynamics Simulation ◽

Structure Prediction ◽

Protein Structures ◽

Conformational Space ◽

Dynamics Simulation ◽

Model Refinement ◽

Protein Model ◽

Lower Accuracy ◽

Simulation Based

Protein structures provide valuable information for understanding biological processes. Protein structures can be determined by experimental methods such as X-ray crystallography, nuclear magnetic resonance (NMR) spectroscopy, or cryogenic electron microscopy. As an alternative, in silico methods can be used to predict protein structures. Those methods utilize protein structure databases for structure prediction via template-based modeling or for training machine-learning models to generate predictions. Structure prediction for proteins distant from proteins with known structures often results in lower accuracy with respect to the true physiological structures. Physics-based protein model refinement methods can be applied to improve model accuracy in the predicted models. Refinement methods rely on conformational sampling around the predicted structures, and if structures closer to the native states are sampled, improvements in the model quality become possible. Molecular dynamics simulations have been especially successful for improving model qualities but although consistent refinement can be achieved, the improvements in model qualities are still moderate. To extend the refinement performance of a simulation-based protocol, we explored new schemes that focus on an optimized use of biasing functions and the application of increased simulation temperatures. In addition, we tested the use of alternative initial models so that the simulations can explore conformational space more broadly. Based on the insight of this analysis we are proposing a new refinement protocol that significantly outperformed previous state-of-the-art molecular dynamics simulation-based protocols in the benchmark tests described here. <br>

Download Full-text

Advances inRosettastructure prediction for difficult molecular-replacement problems

Acta Crystallographica Section D Biological Crystallography ◽

10.1107/s0907444913023305 ◽

2013 ◽

Vol 69 (11) ◽

pp. 2202-2208 ◽

Cited By ~ 16

Author(s):

Frank DiMaio

Keyword(s):

Structure Prediction ◽

Prediction Methods ◽

Molecular Replacement ◽

Improved Method ◽

Model Quality ◽

Future Directions ◽

Protein Structure Modeling ◽

Improve Model ◽

Difficult Cases

Recent work has shown the effectiveness of structure-prediction methods in solving difficult molecular-replacement problems. TheRosettaprotein structure modeling suite can aid in the solution of difficult molecular-replacement problems using templates from 15 to 25% sequence identity;Rosettarefinement guided by noisy density has consistently led to solved structures where other methods fail. In this paper, an overview of the use ofRosettafor these difficult molecular-replacement problems is provided and new modeling developments that further improve model quality are described. Several variations to the method are introduced that significantly reduce the time needed to generate a model and the sampling required to improve the starting template. The improvements are benchmarked on a set of nine difficult cases and it is shown that this improved method obtains consistently better models in less running time. Finally, strategies for best usingRosettato solve difficult molecular-replacement problems are presented and future directions for the role of structure-prediction methods in crystallography are discussed.

Download Full-text

Knowledge-Enhanced Graph Neural Networks for Sequential Recommendation

Information ◽

10.3390/info11080388 ◽

2020 ◽

Vol 11 (8) ◽

pp. 388

Author(s):

Baocheng Wang ◽

Wentao Cai

Keyword(s):

Neural Networks ◽

Recurrent Neural Networks ◽

Internet Technology ◽

Sequential Data ◽

Current Interest ◽

Important Method ◽

Real World Datasets ◽

Set Up ◽

Graph Neural Networks ◽

Better Than

With the rapid increase in the popularity of big data and internet technology, sequential recommendation has become an important method to help people find items they are potentially interested in. Traditional recommendation methods use only recurrent neural networks (RNNs) to process sequential data. Although effective, the results may be unable to capture both the semantic-based preference and the complex transitions between items adequately. In this paper, we model separated session sequences into session graphs and capture complex transitions using graph neural networks (GNNs). We further link items in interaction sequences with existing external knowledge base (KB) entities and integrate the GNN-based recommender with key-value memory networks (KV-MNs) to incorporate KB knowledge. Specifically, we set a key matrix to many relation embeddings that learned from KB, corresponding to many entity attributes, and set up a set of value matrices storing the semantic-based preferences of different users for the corresponding attribute. By using a hybrid of a GNN and KV-MN, each session is represented as the combination of the current interest (i.e., sequential preference) and the global preference (i.e., semantic-based preference) of that session. Extensive experiments on three public real-world datasets show that our method performs better than baseline algorithms consistently.

Download Full-text

Protein model quality assessment using 3D oriented convolutional neural networks

Bioinformatics ◽

10.1093/bioinformatics/btz122 ◽

2019 ◽

Vol 35 (18) ◽

pp. 3313-3319 ◽

Cited By ~ 14

Author(s):

Guillaume Pagès ◽

Benoit Charmettant ◽

Sergei Grudinin

Keyword(s):

Neural Networks ◽

Quality Assessment ◽

Convolutional Neural Networks ◽

Single Model ◽

Model Quality ◽

Model Quality Assessment ◽

Density Maps ◽

Protein Model ◽

Protein Model Quality Assessment ◽

3D Cnn

Abstract Motivation Protein model quality assessment (QA) is a crucial and yet open problem in structural bioinformatics. The current best methods for single-model QA typically combine results from different approaches, each based on different input features constructed by experts in the field. Then, the prediction model is trained using a machine-learning algorithm. Recently, with the development of convolutional neural networks (CNN), the training paradigm has changed. In computer vision, the expert-developed features have been significantly overpassed by automatically trained convolutional filters. This motivated us to apply a three-dimensional (3D) CNN to the problem of protein model QA. Results We developed Ornate (Oriented Routed Neural network with Automatic Typing)—a novel method for single-model QA. Ornate is a residue-wise scoring function that takes as input 3D density maps. It predicts the local (residue-wise) and the global model quality through a deep 3D CNN. Specifically, Ornate aligns the input density map, corresponding to each residue and its neighborhood, with the backbone topology of this residue. This circumvents the problem of ambiguous orientations of the initial models. Also, Ornate includes automatic identification of atom types and dynamic routing of the data in the network. Established benchmarks (CASP 11 and CASP 12) demonstrate the state-of-the-art performance of our approach among single-model QA methods. Availability and implementation The method is available at https://team.inria.fr/nano-d/software/Ornate/. It consists of a C++ executable that transforms molecular structures into volumetric density maps, and a Python code based on the TensorFlow framework for applying the Ornate model to these maps. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

NEW MDS AND CLUSTERING BASED ALGORITHMS FOR PROTEIN MODEL QUALITY ASSESSMENT AND SELECTION

International Journal of Artificial Intelligence Tools ◽

10.1142/s0218213013600063 ◽

2013 ◽

Vol 22 (05) ◽

pp. 1360006 ◽

Cited By ~ 2

Author(s):

QINGGUO WANG ◽

CHARLES SHANG ◽

DONG XU ◽

YI SHANG

Keyword(s):

Multidimensional Scaling ◽

Quality Assessment ◽

Structure Prediction ◽

Tertiary Structure ◽

Selection Problem ◽

Model Quality ◽

Model Quality Assessment ◽

Protein Model ◽

Protein Model Quality Assessment ◽

Consensus Score

In protein tertiary structure prediction, assessing the quality of predicted models is an essential task. Over the past years, many methods have been proposed for the protein model quality assessment (QA) and selection problem. Despite significant advances, the discerning power of current methods is still unsatisfactory. In this paper, we propose two new algorithms, CC-Select and MDS-QA, based on multidimensional scaling and k-means clustering. For the model selection problem, CC-Select combines consensus with clustering techniques to select the best models from a given pool. Given a set of predicted models, CC-Select first calculates a consensus score for each structure based on its average pairwise structural similarity to other models. Then, similar structures are grouped into clusters using multidimensional scaling and clustering algorithms. In each cluster, the one with the highest consensus score is selected as a candidate model. For the QA problem, MDS-QA combines single-model scoring functions with consensus to determine more accurate assessment score for every model in a given pool. Using extensive benchmark sets of a large collection of predicted models, we compare the two algorithms with existing state-of-the-art quality assessment methods and show significant improvement.

Download Full-text

Protein model quality assessment using 3D oriented convolutional neural networks

10.1101/432146 ◽

2018 ◽

Cited By ~ 1

Author(s):

Guillaume Pagès ◽

Benoit Charmettant ◽

Sergei Grudinin

Keyword(s):

Neural Networks ◽

Quality Assessment ◽

Convolutional Neural Networks ◽

Single Model ◽

Model Quality ◽

Model Quality Assessment ◽

Density Maps ◽

Protein Model ◽

Protein Model Quality Assessment ◽

3D Cnn

Protein model quality assessment (QA) is a crucial and yet open problem in structural bioinformatics. The current best methods for single-model QA typically combine results from different approaches, each based on different input features constructed by experts in the field. Then, the prediction model is trained using a machine-learning algorithm. Recently, with the development of convolutional neural networks (CNN), the training paradigm has changed. In computer vision, the expert-developed features have been significantly overpassed by automatically trained convolutional filters. This motivated us to apply a three-dimensional (3D) CNN to the problem of protein model QA.We developed a novel method for single-model QA called Ornate. Ornate (Oriented Routed Neural network with Automatic Typing) is a residue-wise scoring function that takes as input 3D density maps. It predicts the local (residue-wise) and the global model quality through a deep 3D CNN. Specifically, Ornate aligns the input density map, corresponding to each residue and its neighborhood, with the backbone topology of this residue. This circumvents the problem of ambiguous orientations of the initial models. Also, Ornate includes automatic identification of atom types and dynamic routing of the data in the network. Established benchmarks (CASP 11 and CASP 12) demonstrate the state-of-the-art performance of our approach among singlemodel QA methods.The method is available at https://team.inria.fr/nanod/software/Ornate/. It consists of a C++ executable that transforms molecular structures into volumetric density maps, and a Python code based on the TensorFlow framework for applying the Ornate model to these maps.

Download Full-text

Improved protein model quality assessment by integrating sequential and pairwise features using deep learning

10.1101/2020.09.30.321661 ◽

2020 ◽

Author(s):

Xiaoyang Jing ◽

Jinbo Xu

Keyword(s):

Neural Networks ◽

Quality Assessment ◽

Single Model ◽

Model Quality ◽

Protein Targets ◽

Model Quality Assessment ◽

Global Quality ◽

Protein Model ◽

Protein Model Quality Assessment ◽

Experimental Structure

AbstractMotivationAccurately estimating protein model quality in the absence of experimental structure is not only important for model evaluation and selection, but also useful for model refinement. Progress has been steadily made by introducing new features and algorithms (especially deep neural networks), but accuracy of quality assessment (QA) is still not very satisfactory, especially local QA on hard protein targets.ResultsWe propose a new single-model-based QA method ResNetQA for both local and global quality assessment. Our method predicts model quality by integrating sequential and pairwise features using a deep neural network composed of both 1D and 2D convolutional residual neural networks (ResNet). The 2D ResNet module extracts useful information from pairwise features such as model-derived distance maps, co-evolution information and predicted distance potential. The 1D ResNet is used to predict local (global) model quality from sequential features and pooled pairwise information generated by 2D ResNet. Tested on the CASP12 and CASP13 datasets, our experimental results show that our method greatly outperforms existing state-of-the-art methods. Our ablation studies indicate that the 2D ResNet module and pairwise features play an important role in improving model quality assessment.Availability and Implementationhttps://github.com/AndersJing/[email protected]

Download Full-text