ProALIGN: Directly learning alignments for protein structure prediction via exploiting context-specific alignment motifs

AbstractTemplate-based modeling (TBM), including homology modeling and protein threading, is one of the most reliable techniques for protein structure prediction. It predicts protein structure by building an alignment between the query sequence under prediction and the templates with solved structures. However, it is still very challenging to build the optimal sequence-template alignment, especially when only distantly-related templates are available.Here we report a novel deep learning approach ProALIGN that can predict much more accurate sequence-template alignment. Like protein sequences consisting of sequence motifs, protein alignments are also composed of frequently-occurring alignment motifs with characteristic patterns. Alignment motifs are context-specific as their characteristic patterns are tightly related to sequence contexts of the aligned regions. Inspired by this observation, we represent a protein alignment as a binary matrix (in which 1 denotes an aligned residue pair) and then use a deep convolutional neural network to predict the optimal alignment from the query protein and its template. The trained neural network implicitly but effectively encodes an alignment scoring function, which reduces inaccuracies in the handcrafted scoring functions widely used by the current threading approaches. For a query protein and a template, we apply the neural network to directly infer likelihoods of all possible residue pairs in their entirety, which could effectively consider the correlations among multiple residues. We further construct the alignment with maximum likelihood, and finally build structure model according to the alignment.Tested on three independent datasets with in total 6,688 protein alignment targets and 80 CASP13 TBM targets, our method achieved much better alignments and 3D structure models than the existing methods including HHpred, CNFpred, CEthreader and DeepThreader. These results clearly demonstrate the effectiveness of exploiting the context-specific alignment motifs by deep learning for protein threading.

Download Full-text

Template-based prediction of protein structure with deep learning

BMC Genomics ◽

10.1186/s12864-020-07249-8 ◽

2020 ◽

Vol 21 (S11) ◽

Author(s):

Haicang Zhang ◽

Yufeng Shen

Keyword(s):

Deep Learning ◽

Protein Structure ◽

Structure Prediction ◽

Tertiary Structure ◽

Query Sequence ◽

Dynamic Programming Algorithm ◽

Tertiary Structure Prediction ◽

Protein Tertiary Structure ◽

Protein Threading ◽

Protein Tertiary Structure Prediction

Abstract Background Accurate prediction of protein structure is fundamentally important to understand biological function of proteins. Template-based modeling, including protein threading and homology modeling, is a popular method for protein tertiary structure prediction. However, accurate template-query alignment and template selection are still very challenging, especially for the proteins with only distant homologs available. Results We propose a new template-based modelling method called ThreaderAI to improve protein tertiary structure prediction. ThreaderAI formulates the task of aligning query sequence with template as the classical pixel classification problem in computer vision and naturally applies deep residual neural network in prediction. ThreaderAI first employs deep learning to predict residue-residue aligning probability matrix by integrating sequence profile, predicted sequential structural features, and predicted residue-residue contacts, and then builds template-query alignment by applying a dynamic programming algorithm on the probability matrix. We evaluated our methods both in generating accurate template-query alignment and protein threading. Experimental results show that ThreaderAI outperforms currently popular template-based modelling methods HHpred, CNFpred, and the latest contact-assisted method CEthreader, especially on the proteins that do not have close homologs with known structures. In particular, in terms of alignment accuracy measured with TM-score, ThreaderAI outperforms HHpred, CNFpred, and CEthreader by 56, 13, and 11%, respectively, on template-query pairs at the similarity of fold level from SCOPe data. And on CASP13’s TBM-hard data, ThreaderAI outperforms HHpred, CNFpred, and CEthreader by 16, 9 and 8% in terms of TM-score, respectively. Conclusions These results demonstrate that with the help of deep learning, ThreaderAI can significantly improve the accuracy of template-based structure prediction, especially for distant-homology proteins.

Download Full-text

Template-based prediction of protein structure with deep learning

10.1101/2020.06.02.129270 ◽

2020 ◽

Author(s):

Haicang Zhang ◽

Yufeng Shen

Keyword(s):

Deep Learning ◽

Protein Structure ◽

Structure Prediction ◽

Tertiary Structure ◽

Query Sequence ◽

Dynamic Programming Algorithm ◽

Tertiary Structure Prediction ◽

Protein Tertiary Structure ◽

Protein Threading ◽

Protein Tertiary Structure Prediction

AbstractAccurate prediction of protein structure is fundamentally important to understand biological function of proteins. Template-based modeling, including protein threading and homology modeling, is a popular method for protein tertiary structure prediction. However, accurate template-query alignment and template selection are still very challenging, especially for the proteins with only distant homologs available. We propose a new template-based modelling method called ThreaderAI to improve protein tertiary structure prediction. ThreaderAI formulates the task of aligning query sequence with template as the classical pixel classification problem in computer vision and naturally applies deep residual neural network in prediction. ThreaderAI first employs deep learning to predict residue-residue aligning probability matrix by integrating sequence profile, predicted sequential structural features, and predicted residueresidue contacts, and then builds template-query alignment by applying a dynamic programming algorithm on the probability matrix. We evaluated our methods both in generating accurate template-query alignment and protein threading. Experimental results show that ThreaderAI outperforms currently popular template-based modelling methods HHpred, CNFpred, and the latest contact-assisted method CEthreader, especially on the proteins that do not have close homologs with known structures. In particular, in terms of alignment accuracy measured with TM-score, ThreaderAI outperforms HHpred, CNFpred, and CEthreader by 56%, 13%, and 11%, respectively, on template-query pairs at the similarity of fold level from SCOPe data. And on CASP13’s TBM-hard data, ThreaderAI outperforms HHpred, CNFpred, and CEthreader by 16%, 9% and 8% in terms of TM-score, respectively. These results demonstrate that with the help of deep learning, ThreaderAI can significantly improve the accuracy of template-based structure prediction, especially for distant-homology proteins.Availabilityhttps://github.com/ShenLab/ThreaderAI

Download Full-text

Deep template-based protein structure prediction

PLoS Computational Biology ◽

10.1371/journal.pcbi.1008954 ◽

2021 ◽

Vol 17 (5) ◽

pp. e1008954

Author(s):

Fandi Wu ◽

Jinbo Xu

Keyword(s):

Deep Learning ◽

Protein Structure ◽

Protein Structure Prediction ◽

Structure Prediction ◽

Conditional Random Fields ◽

3D Models ◽

Query Protein ◽

Distance Information ◽

Alternating Direction ◽

Template Free

Motivation Protein structure prediction has been greatly improved by deep learning, but most efforts are devoted to template-free modeling. But very few deep learning methods are developed for TBM (template-based modeling), a popular technique for protein structure prediction. TBM has been studied extensively in the past, but its accuracy is not satisfactory when highly similar templates are not available. Results This paper presents a new method NDThreader (New Deep-learning Threader) to address the challenges of TBM. NDThreader first employs DRNF (deep convolutional residual neural fields), which is an integration of deep ResNet (convolutional residue neural networks) and CRF (conditional random fields), to align a query protein to templates without using any distance information. Then NDThreader uses ADMM (alternating direction method of multipliers) and DRNF to further improve sequence-template alignments by making use of predicted distance potential. Finally, NDThreader builds 3D models from a sequence-template alignment by feeding it and sequence coevolution information into a deep ResNet to predict inter-atom distance distribution, which is then fed into PyRosetta for 3D model construction. Our experimental results show that NDThreader greatly outperforms existing methods such as CNFpred, HHpred, DeepThreader and CEthreader. NDThreader was blindly tested in CASP14 as a part of RaptorX server, which obtained the best average GDT score among all CASP14 servers on the 58 TBM targets.

Download Full-text

Deep learning techniques have significantly impacted protein structure prediction and protein design

Current Opinion in Structural Biology ◽

10.1016/j.sbi.2021.01.007 ◽

2021 ◽

Vol 68 ◽

pp. 194-207

Author(s):

Robin Pearce ◽

Yang Zhang

Keyword(s):

Deep Learning ◽

Protein Structure ◽

Protein Structure Prediction ◽

Protein Design ◽

Structure Prediction ◽

Learning Techniques

Download Full-text

Improved protein structure prediction by deep learning irrespective of co-evolution information

Nature Machine Intelligence ◽

10.1038/s42256-021-00348-5 ◽

2021 ◽

Author(s):

Jinbo Xu ◽

Matthew McPartlon ◽

Jin Li

Keyword(s):

Deep Learning ◽

Protein Structure ◽

Protein Structure Prediction ◽

Structure Prediction

Download Full-text

Review of: "ROPIUS0: A deep learning-based protocol for protein structure prediction and model selection and its performance in CASP14"

Qeios ◽

10.32388/nmtocb ◽

2021 ◽

Author(s):

jianquan ouyang

Keyword(s):

Deep Learning ◽

Protein Structure ◽

Model Selection ◽

Protein Structure Prediction ◽

Structure Prediction

Download Full-text

Protein structure prediction based on BN-GRU method

International Journal of Wavelets Multiresolution and Information Processing ◽

10.1142/s0219691320500459 ◽

2020 ◽

Vol 18 (06) ◽

pp. 2050045

Author(s):

Lina Yang ◽

Pu Wei ◽

Cheng Zhong ◽

Xichun Li ◽

Yuan Yan Tang

Keyword(s):

Neural Network ◽

Protein Structure ◽

Spatial Structure ◽

Protein Structure Prediction ◽

Structure Prediction ◽

Main Idea ◽

Traditional Methods ◽

Analysis Of Similarity ◽

New Feature ◽

Activity Mechanism

The spatial structure of the protein reflects the biological function and activity mechanism. Predicting the secondary structure of a protein is the basis content for predicting its spatial structure. Traditional methods based on statistics and sequential patterns do not achieve higher accuracy. In this paper, the application of BN-GRU neural network in protein structure prediction is discussed. The main idea is to construct a Gated Recurrent Unit (GRU) neural network. The GRU neural network can learn long-term dependencies. It can handle long sequences better than traditional methods. Based on this, BN is combined with GRU to construct a new network. Position Specific Scoring Matrix (PSSM) is used to associate with other features to build a completely new feature set. It can be proved that the application of BN on GRU can improve the accuracy of the results. The idea in this paper can also be applied to the analysis of similarity of other sequences.

Download Full-text

Recent developments in deep learning applied to protein structure prediction

Proteins Structure Function and Bioinformatics ◽

10.1002/prot.25824 ◽

2019 ◽

Vol 87 (12) ◽

pp. 1179-1189 ◽

Cited By ~ 20

Author(s):

Shaun M. Kandathil ◽

Joe G. Greener ◽

David T. Jones

Keyword(s):

Deep Learning ◽

Protein Structure ◽

Protein Structure Prediction ◽

Structure Prediction ◽

Recent Developments

Download Full-text

A Unified Deep Learning Model for Protein Structure Prediction

2017 3rd IEEE International Conference on Cybernetics (CYBCONF) ◽

10.1109/cybconf.2017.7985752 ◽

2017 ◽

Cited By ~ 3

Author(s):

Lin Bai ◽

Lina Yang

Keyword(s):

Deep Learning ◽

Protein Structure ◽

Protein Structure Prediction ◽

Structure Prediction ◽

Learning Model ◽

Deep Learning Model

Download Full-text

Deep Template-based Protein Structure Prediction

10.1101/2020.12.26.424433 ◽

2020 ◽

Author(s):

Fandi Wu ◽

Jinbo Xu

Keyword(s):

Protein Structure ◽

Protein Structure Prediction ◽

Random Fields ◽

Structure Prediction ◽

Conditional Random Fields ◽

3D Models ◽

Query Protein ◽

Supplementary Information ◽

Distance Information ◽

Alternating Direction

AbstractMotivationTBM (template-based modeling) is a popular method for protein structure prediction. When very good templates are not available, it is challenging to identify the best templates, build accurate sequence-template alignments and construct 3D models from alignments.ResultsThis paper presents a new method NDThreader (New Deep-learning Threader) to address the challenges of TBM. DNThreader first employs DRNF (deep convolutional residual neural fields), which is an integration of deep ResNet (convolutional residue neural networks) and CRF (conditional random fields), to align a query protein to templates without using any distance information. Then NDThreader uses ADMM (alternating direction method of multipliers) and DRNF to further improve sequence-template alignments by making use of predicted distance potential. Finally NDThreader builds 3D models from a sequence-template alignment by feeding it and sequence co-evolution information into a deep ResNet to predict inter-atom distance distribution, which is then fed into PyRosetta for 3D model construction. Our experimental results on the CASP13 and CAMEO data show that our methods outperform existing ones such as CNFpred, HHpred, DeepThreader and CEthreader. NDThreader was blindly tested in CASP14 as a part of RaptorX server, which obtained the best GDT score among all CASP14 servers on the 58 TBM targets.Availability and Implementationavailable as a part of web server at http://[email protected] InformationSupplementary data are available online.

Download Full-text