Principal component analysis in protein tertiary structure prediction

We discuss applicability of principal component analysis (PCA) for protein tertiary structure prediction from amino acid sequence. The algorithm presented in this paper belongs to the category of protein refinement models and involves establishing a low-dimensional space where the sampling (and optimization) is carried out via particle swarm optimizer (PSO). The reduced space is found via PCA performed for a set of low-energy protein models previously found using different optimization techniques. A high frequency term is added into this expansion by projecting the best decoy into the PCA basis set and calculating the residual model. This term is aimed at providing high frequency details in the energy optimization. The goal of this research is to analyze how the dimensionality reduction affects the prediction capability of the PSO procedure. For that purpose, different proteins from the Critical Assessment of Techniques for Protein Structure Prediction experiments were modeled. In all the cases, both the energy of the best decoy and the distance to the native structure have decreased. Our analysis also shows how the predicted backbone structure of native conformation and of alternative low energy states varies with respect to the PCA dimensionality. Generally speaking, the reconstruction can be successfully achieved with 10 principal components and the high frequency term. We also provide a computational analysis of protein energy landscape for the inverse problem of reconstructing structure from the reduced number of principal components, showing that the dimensionality reduction alleviates the ill-posed character of this high-dimensional energy optimization problem. The procedure explained in this paper is very fast and allows testing different PCA expansions. Our results show that PSO improves the energy of the best decoy used in the PCA when the adequate number of PCA terms is considered.

Download Full-text

On the Use of Principal Component Analysis and Particle Swarm Optimization in Protein Tertiary Structure Prediction

Artificial Intelligence and Soft Computing - Lecture Notes in Computer Science ◽

10.1007/978-3-319-91262-2_10 ◽

2018 ◽

pp. 107-116

Author(s):

Óscar Álvarez ◽

Juan Luis Fernández-Martínez ◽

Celia Fernández-Brillet ◽

Ana Cernea ◽

Zulima Fernández-Muñiz ◽

...

Keyword(s):

Principal Component Analysis ◽

Particle Swarm Optimization ◽

Structure Prediction ◽

Tertiary Structure ◽

Particle Swarm ◽

Principal Component ◽

Component Analysis ◽

Tertiary Structure Prediction ◽

Swarm Optimization ◽

Protein Tertiary Structure Prediction

Download Full-text

Protein tertiary structure prediction and refinement using deep learning and Rosetta in CASP14

Proteins Structure Function and Bioinformatics ◽

10.1002/prot.26194 ◽

2021 ◽

Author(s):

Ivan Anishchenko ◽

Minkyung Baek ◽

Hahnbeom Park ◽

Naozumi Hiranuma ◽

David E. Kim ◽

...

Keyword(s):

Deep Learning ◽

Structure Prediction ◽

Tertiary Structure ◽

Tertiary Structure Prediction ◽

Protein Tertiary Structure ◽

Protein Tertiary Structure Prediction

Download Full-text

Template-based prediction of protein structure with deep learning

BMC Genomics ◽

10.1186/s12864-020-07249-8 ◽

2020 ◽

Vol 21 (S11) ◽

Author(s):

Haicang Zhang ◽

Yufeng Shen

Keyword(s):

Deep Learning ◽

Protein Structure ◽

Structure Prediction ◽

Tertiary Structure ◽

Query Sequence ◽

Dynamic Programming Algorithm ◽

Tertiary Structure Prediction ◽

Protein Tertiary Structure ◽

Protein Threading ◽

Protein Tertiary Structure Prediction

Abstract Background Accurate prediction of protein structure is fundamentally important to understand biological function of proteins. Template-based modeling, including protein threading and homology modeling, is a popular method for protein tertiary structure prediction. However, accurate template-query alignment and template selection are still very challenging, especially for the proteins with only distant homologs available. Results We propose a new template-based modelling method called ThreaderAI to improve protein tertiary structure prediction. ThreaderAI formulates the task of aligning query sequence with template as the classical pixel classification problem in computer vision and naturally applies deep residual neural network in prediction. ThreaderAI first employs deep learning to predict residue-residue aligning probability matrix by integrating sequence profile, predicted sequential structural features, and predicted residue-residue contacts, and then builds template-query alignment by applying a dynamic programming algorithm on the probability matrix. We evaluated our methods both in generating accurate template-query alignment and protein threading. Experimental results show that ThreaderAI outperforms currently popular template-based modelling methods HHpred, CNFpred, and the latest contact-assisted method CEthreader, especially on the proteins that do not have close homologs with known structures. In particular, in terms of alignment accuracy measured with TM-score, ThreaderAI outperforms HHpred, CNFpred, and CEthreader by 56, 13, and 11%, respectively, on template-query pairs at the similarity of fold level from SCOPe data. And on CASP13’s TBM-hard data, ThreaderAI outperforms HHpred, CNFpred, and CEthreader by 16, 9 and 8% in terms of TM-score, respectively. Conclusions These results demonstrate that with the help of deep learning, ThreaderAI can significantly improve the accuracy of template-based structure prediction, especially for distant-homology proteins.

Download Full-text