InterPepScore: A Deep Learning Score for Improving the FlexPepDock Refinement Protocol

Mapping Intimacies ◽

10.1101/2021.12.09.471890 ◽

2021 ◽

Author(s):

Isak Johansson-Åkhe ◽

Björn Wallner

Keyword(s):

Neural Network ◽

Deep Learning ◽

Structure Prediction ◽

Cell Function ◽

Protein Complexes ◽

Protein Docking ◽

Peptide Fragments ◽

Model Quality ◽

Protein Receptors ◽

Medium Quality

Motivation: Interactions between peptide fragments and protein receptors are vital to cell function yet difficult to experimentally determine the structural details of. As such, many computational methods have been developed to aid in peptide-protein docking or structure prediction. One such method is Rosetta FlexPepDock which consistently refines coarse peptide-protein models into sub-Ångström precision using Monte-Carlo simulations and statistical potentials. Deep learning has recently seen increased use in protein structure prediction, with graph neural network seeing use in protein model quality assessment. Results: Here, we introduce a graph neural network, InterPepScore, as an additional scoring term to complement and improve the Rosetta FlexPepDock refinement protocol. InterPepScore is trained on simulation trajectories from FlexPepDock refinement starting from thousands of peptide-protein complexes generated by a wide variety of docking schemes. The addition of InterPepScore into the refinement protocol consistently improves the quality of models created, and on an independent benchmark on 109 peptide-protein complexes its inclusion results in an increase in the number of complexes for which the top-scoring model had a DockQ-score of 0.49 (Medium quality) or better from 14.8% to 26.1%.

Download Full-text

Accurate refinement of docked protein complexes using evolutionary information and deep learning

Journal of Bioinformatics and Computational Biology ◽

10.1142/s0219720016420026 ◽

2016 ◽

Vol 14 (03) ◽

pp. 1642002 ◽

Cited By ~ 11

Author(s):

Bahar Akbal-Delibas ◽

Roshanak Farhoodi ◽

Marc Pomplun ◽

Nurit Haspel

Keyword(s):

Deep Learning ◽

Protein Complexes ◽

Scoring Function ◽

Protein Docking ◽

Training Data ◽

Evolutionary Information ◽

Native Structure ◽

Learning Network ◽

Small Set ◽

Deep Learning Network

One of the major challenges for protein docking methods is to accurately discriminate native-like structures from false positives. Docking methods are often inaccurate and the results have to be refined and re-ranked to obtain native-like complexes and remove outliers. In a previous work, we introduced AccuRefiner, a machine learning based tool for refining protein–protein complexes. Given a docked complex, the refinement tool produces a small set of refined versions of the input complex, with lower root-mean-square-deviation (RMSD) of atomic positions with respect to the native structure. The method employs a unique ranking tool that accurately predicts the RMSD of docked complexes with respect to the native structure. In this work, we use a deep learning network with a similar set of features and five layers. We show that a properly trained deep learning network can accurately predict the RMSD of a docked complex with 1.40 Å error margin on average, by approximating the complex relationship between a wide set of scoring function terms and the RMSD of a docked structure. The network was trained on 35000 unbound docking complexes generated by RosettaDock. We tested our method on 25 different putative docked complexes produced also by RosettaDock for five proteins that were not included in the training data. The results demonstrate that the high accuracy of the ranking tool enables AccuRefiner to consistently choose the refinement candidates with lower RMSD values compared to the coarsely docked input structures.

Download Full-text

LZerD Protein-Protein Docking Webserver Enhanced With de novo Structure Prediction

Frontiers in Molecular Biosciences ◽

10.3389/fmolb.2021.724947 ◽

2021 ◽

Vol 8 ◽

Author(s):

Charles Christoffer ◽

Vijay Bharadwaj ◽

Ryan Luu ◽

Daisuke Kihara

Keyword(s):

Protein Structure ◽

Protein Structure Prediction ◽

Structure Prediction ◽

De Novo ◽

Protein Complexes ◽

Protein Sequences ◽

Data Bank ◽

Protein Docking ◽

Functional Mechanisms ◽

Established Technique

Protein-protein docking is a useful tool for modeling the structures of protein complexes that have yet to be experimentally determined. Understanding the structures of protein complexes is a key component for formulating hypotheses in biophysics regarding the functional mechanisms of complexes. Protein-protein docking is an established technique for cases where the structures of the subunits have been determined. While the number of known structures deposited in the Protein Data Bank is increasing, there are still many cases where the structures of individual proteins that users want to dock are not determined yet. Here, we have integrated the AttentiveDist method for protein structure prediction into our LZerD webserver for protein-protein docking, which enables users to simply submit protein sequences and obtain full-complex atomic models, without having to supply any structure themselves. We have further extended the LZerD docking interface with a symmetrical homodimer mode. The LZerD server is available at https://lzerd.kiharalab.org/.

Download Full-text

Prediction of 8-state protein secondary structures by 1D-Inception and BD-LSTM

10.1101/871921 ◽

2019 ◽

Author(s):

Aminur Rab Ratul ◽

Marcel Turcotte ◽

M. Hamed Mozaffari ◽

WonSook Lee

Keyword(s):

Neural Network ◽

Deep Learning ◽

Secondary Structure ◽

Structure Prediction ◽

Secondary Structure Prediction ◽

3D Structure ◽

Protein Secondary Structure ◽

Research Area ◽

Protein Secondary Structures ◽

Precise Prediction

AbstractProtein secondary structure is crucial to create an information bridge between the primary structure and the tertiary (3D) structure. Precise prediction of 8-state protein secondary structure (PSS) significantly utilized in the structural and functional analysis of proteins in bioinformatics. In this recent period, deep learning techniques have been applied in this research area and raise the Q8 accuracy remarkably. Nevertheless, from a theoretical standpoint, there still lots of room for improvement, specifically in 8-state (Q8) protein secondary structure prediction. In this paper, we presented two deep learning architecture, namely 1D-Inception and BD-LSTM, to improve the performance of 8-classes PSS prediction. The input of these two architectures is a carefully constructed feature matrix from the sequence features and profile features of the proteins. Firstly, 1D-Inception is a Deep convolutional neural network-based approach that was inspired by the InceptionV3 model and containing three inception modules. Secondly, BD-LSTM is a recurrent neural network model which including bidirectional LSTM layers. Our proposed 1D-Inception method achieved 76.65%, 71.18%, 76.86%, and 74.07% Q8 accuracy respectively on benchmark CullPdb6133, CB513, CASP10, and CASP11 datasets. Moreover, BD-LSTM acquired 74.71%, 69.49%, 74.07%, and 72.37% state-8 accuracy after evaluated on CullPdb6133, CB513, CASP10, and CASP11 datasets, respectively. Both these architectures enable the efficient processing of local and global interdependencies between amino acids to make an accurate prediction of each class is very beneficial in the deep neural network. To the best of our knowledge, experiment results of the 1D-Inception model demonstrate that it outperformed all the state-of-art methods on the benchmark CullPdb6133, CB513, and CASP10 datasets.

Download Full-text

Protein homodimers structure prediction based on deep neural network

Informatics ◽

10.37661/1816-0301-2020-17-2-44-53 ◽

2020 ◽

Vol 17 (2) ◽

pp. 44-53

Author(s):

A. Y. Hadarovich ◽

A. A. Kalinouski ◽

A. V. Tuzikov

Keyword(s):

Neural Network ◽

Protein Complex ◽

Structure Prediction ◽

Protein Complexes ◽

3D Structure ◽

Optimization Procedure ◽

Descent Method ◽

Gradient Descent Method ◽

Contact Map ◽

Suggested Approach

Structural prediction of protein-protein complexes has important application in such domains as modeling of biological processes and drug design. Homodimers (complexes which consist of two identical proteins) are the most common type of protein complexes in nature but there is still no universal algorithm to predict their 3D structures. Experimental techniques to identify the structure of protein complex require enormous amount of time and resources, and each method has its own limitations. Recently Deep Neural Networks allowed to predict structures of individual proteins greatly prevailing in accuracy over other algorithmic approaches. Building on the idea of this approach, we developed an algorithm to model the 3D structure of homodimer based on deep learning. It consists of two major steps: at the first step a protein complex contact map is predicted with the deep convolutional neural network, and the second stage is used to predict 3D structure of homodimer based on obtained contact map and optimization procedure. The use of the neural network in combination with optimization procedure based on gradient descent method allowed to predict structures for protein homodimers. The suggested approach was tested and validated on a dataset of protein homodimers from Protein Data Bank (PDB). The developed procedure could be also used for evaluating protein homodimer models as one of the stages in drug compounds developing.

Download Full-text

Long sequence feature extraction based on deep learning neural network for protein secondary structure prediction

2017 IEEE 3rd Information Technology and Mechatronics Engineering Conference (ITOEC) ◽

10.1109/itoec.2017.8122472 ◽

2017 ◽

Cited By ~ 1

Author(s):

Yehong Chen

Keyword(s):

Neural Network ◽

Feature Extraction ◽

Deep Learning ◽

Secondary Structure ◽

Structure Prediction ◽

Secondary Structure Prediction ◽

Protein Secondary Structure ◽

Sequence Feature ◽

Protein Secondary Structure Prediction ◽

Deep Learning Neural Network

Download Full-text

Integrating ab initio and template-based algorithms for protein–protein complex structure prediction

Bioinformatics ◽

10.1093/bioinformatics/btz623 ◽

2019 ◽

Vol 36 (3) ◽

pp. 751-757 ◽

Cited By ~ 1

Author(s):

Sweta Vangaveti ◽

Thom Vreven ◽

Yang Zhang ◽

Zhiping Weng

Keyword(s):

Protein Complex ◽

Structure Prediction ◽

Protein Complexes ◽

Complex Structure ◽

Protein Docking ◽

Supplementary Information ◽

Test Case ◽

Binding Modes ◽

Success Rates ◽

Template Free

Abstract Motivation Template-based and template-free methods have both been widely used in predicting the structures of protein–protein complexes. Template-based modeling is effective when a reliable template is available, while template-free methods are required for predicting the binding modes or interfaces that have not been previously observed. Our goal is to combine the two methods to improve computational protein–protein complex structure prediction. Results Here, we present a method to identify and combine high-confidence predictions of a template-based method (SPRING) with a template-free method (ZDOCK). Cross-validated using the protein–protein docking benchmark version 5.0, our method (ZING) achieved a success rate of 68.2%, outperforming SPRING and ZDOCK, with success rates of 52.1% and 35.9% respectively, when the top 10 predictions were considered per test case. In conclusion, a statistics-based method that evaluates and integrates predictions from template-based and template-free methods is more successful than either method independently. Availability and implementation ZING is available for download as a Github repository (https://github.com/weng-lab/ZING.git). Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Protein model accuracy estimation empowered by deep learning and inter-residue distance prediction in CASP14

10.1101/2021.01.31.428975 ◽

2021 ◽

Author(s):

Xiao Chen ◽

Jian Liu ◽

Zhiye Guo ◽

Tianqi Wu ◽

Jie Hou ◽

...

Keyword(s):

Deep Learning ◽

Structure Prediction ◽

Structural Models ◽

Single Model ◽

Model Accuracy ◽

Model Quality ◽

Residue Contact ◽

Contact Distance ◽

Protein Model ◽

Contact Predictions

AbstractThe inter-residue contact prediction and deep learning showed the promise to improve the estimation of protein model accuracy (EMA) in the 13th Critical Assessment of Protein Structure Prediction (CASP13). During the 2020 CASP14 experiment, we developed and tested several EMA predictors that used deep learning with the new features based on inter-residue distance/contact predictions as well as the existing model quality features. The average global distance test (GDT-TS) score loss of ranking CASP14 structural models by three multi-model MULTICOM EMA predictors (MULTICOM-CONSTRUCT, MULTICOM-AI, and MULTICOM-CLUSTER) is 0.073, 0.079, and 0.081, respectively, which are ranked first, second, and third places out of 68 CASP14 EMA predictors. The single-model EMA predictor (MULTICOM-DEEP) is ranked 10th place among all the single-model EMA methods in terms of GDT_TS score loss. The results show that deep learning and contact/distance predictions are useful in ranking and selecting protein structural models.

Download Full-text

Improving protein tertiary structure prediction by deep learning and distance prediction in CASP14

10.1101/2021.01.28.428706 ◽

2021 ◽

Author(s):

Jian Liu ◽

Tianqi Wu ◽

Zhiye Guo ◽

Jie Hou ◽

Jianlin Cheng

Keyword(s):

Deep Learning ◽

Protein Structure ◽

Structure Prediction ◽

Tertiary Structure ◽

Model Quality ◽

Tertiary Structure Prediction ◽

Model Quality Assessment ◽

Template Free ◽

Distance Prediction ◽

Protein Tertiary Structure Prediction

Substantial progresses in protein structure prediction have been made by utilizing deep-learning and residue-residue distance prediction since CASP13. Inspired by the advances, we improve our CASP14 MULTICOM protein structure prediction system in the three main aspects: (1) a new deep-learning based protein inter-residue distance predictor (DeepDist) to improve template-free (ab initio) tertiary structure prediction, (2) an enhanced template-based tertiary structure prediction method, and (3) distance-based model quality assessment methods empowered by deep learning. In the 2020 CASP14 experiment, MULTICOM predictor was ranked 7th out of 146 predictors in protein tertiary structure prediction and ranked 3rd out of 136 predictors in inter-domain structure prediction. The results of MULTICOM demonstrate that the template-free modeling based on deep learning and residue-residue distance prediction can predict the correct topology for almost all template-based modeling targets and a majority of hard targets (template-free targets or targets whose templates cannot be recognized), which is a significant improvement over the CASP13 MULTICOM predictor. The performance of template-free tertiary structure prediction largely depends on the accuracy of distance predictions that is closely related to the quality of multiple sequence alignments. The structural model quality assessment works reasonably well on targets for which a sufficient number of good models can be predicted, but may perform poorly when only a few good models are predicted for a hard target and the distribution of model quality scores is highly skewed.

Download Full-text

Protein model accuracy estimation empowered by deep learning and inter-residue distance prediction in CASP14

Scientific Reports ◽

10.1038/s41598-021-90303-6 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Xiao Chen ◽

Jian Liu ◽

Zhiye Guo ◽

Tianqi Wu ◽

Jie Hou ◽

...

Keyword(s):

Deep Learning ◽

Structure Prediction ◽

Structural Models ◽

Single Model ◽

Model Accuracy ◽

Model Quality ◽

Distance Information ◽

Evaluation Of Performance ◽

Protein Model

AbstractThe inter-residue contact prediction and deep learning showed the promise to improve the estimation of protein model accuracy (EMA) in the 13th Critical Assessment of Protein Structure Prediction (CASP13). To further leverage the improved inter-residue distance predictions to enhance EMA, during the 2020 CASP14 experiment, we integrated several new inter-residue distance features with the existing model quality assessment features in several deep learning methods to predict the quality of protein structural models. According to the evaluation of performance in selecting the best model from the models of CASP14 targets, our three multi-model predictors of estimating model accuracy (MULTICOM-CONSTRUCT, MULTICOM-AI, and MULTICOM-CLUSTER) achieve the averaged loss of 0.073, 0.079, and 0.081, respectively, in terms of the global distance test score (GDT-TS). The three methods are ranked first, second, and third out of all 68 CASP14 predictors. MULTICOM-DEEP, the single-model predictor of estimating model accuracy (EMA), is ranked within top 10 among all the single-model EMA methods according to GDT-TS score loss. The results demonstrate that inter-residue distance features are valuable inputs for deep learning to predict the quality of protein structural models. However, larger training datasets and better ways of leveraging inter-residue distance information are needed to fully explore its potentials.

Download Full-text

Protein docking model evaluation by 3D deep convolutional neural networks

Bioinformatics ◽

10.1093/bioinformatics/btz870 ◽

2019 ◽

Vol 36 (7) ◽

pp. 2113-2118 ◽

Cited By ~ 7

Author(s):

Xiao Wang ◽

Genki Terashi ◽

Charles W Christoffer ◽

Mengmeng Zhu ◽

Daisuke Kihara

Keyword(s):

Neural Network ◽

Structure Prediction ◽

Deep Neural Network ◽

Molecular Mechanisms ◽

Complex Structure ◽

Protein Docking ◽

Supplementary Information ◽

Atomic Interaction ◽

Deep Convolutional Neural Networks ◽

Docking Model

Abstract Motivation Many important cellular processes involve physical interactions of proteins. Therefore, determining protein quaternary structures provide critical insights for understanding molecular mechanisms of functions of the complexes. To complement experimental methods, many computational methods have been developed to predict structures of protein complexes. One of the challenges in computational protein complex structure prediction is to identify near-native models from a large pool of generated models. Results We developed a convolutional deep neural network-based approach named DOcking decoy selection with Voxel-based deep neural nEtwork (DOVE) for evaluating protein docking models. To evaluate a protein docking model, DOVE scans the protein–protein interface of the model with a 3D voxel and considers atomic interaction types and their energetic contributions as input features applied to the neural network. The deep learning models were trained and validated on docking models available in the ZDock and DockGround databases. Among the different combinations of features tested, almost all outperformed existing scoring functions. Availability and implementation Codes available at http://github.com/kiharalab/DOVE, http://kiharalab.org/dove/. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text