Decoy selection for protein structure prediction via extreme gradient boosting and ranking

Abstract Background Identifying one or more biologically-active/native decoys from millions of non-native decoys is one of the major challenges in computational structural biology. The extreme lack of balance in positive and negative samples (native and non-native decoys) in a decoy set makes the problem even more complicated. Consensus methods show varied success in handling the challenge of decoy selection despite some issues associated with clustering large decoy sets and decoy sets that do not show much structural similarity. Recent investigations into energy landscape-based decoy selection approaches show promises. However, lack of generalization over varied test cases remains a bottleneck for these methods. Results We propose a novel decoy selection method, ML-Select, a machine learning framework that exploits the energy landscape associated with the structure space probed through a template-free decoy generation. The proposed method outperforms both clustering and energy ranking-based methods, all the while consistently offering better performance on varied test-cases. Moreover, ML-Select shows promising results even for the decoy sets consisting of mostly low-quality decoys. Conclusions ML-Select is a useful method for decoy selection. This work suggests further research in finding more effective ways to adopt machine learning frameworks in achieving robust performance for decoy selection in template-free protein structure prediction.

Download Full-text

Building maps of protein structure spaces in template-free protein structure prediction

Journal of Bioinformatics and Computational Biology ◽

10.1142/s0219720019400134 ◽

2019 ◽

Vol 17 (06) ◽

pp. 1940013

Author(s):

Ahmed Bin Zaman ◽

Amarda Shehu

Keyword(s):

Protein Structure ◽

Protein Structure Prediction ◽

Structure Prediction ◽

Protein Structures ◽

Biologically Active ◽

Tertiary Structures ◽

Free Protein ◽

Structure Space ◽

Template Free ◽

Protein Structure Space

An important goal in template-free protein structure prediction is how to control the quality of computed tertiary structures of a target amino-acid sequence. Despite great advances in algorithmic research, given the size, dimensionality, and inherent characteristics of the protein structure space, this task remains exceptionally challenging. It is current practice to aim to generate as many structures as can be afforded so as to increase the likelihood that some of them will reside near the sought but unknown biologically-active/native structure. When operating within a given computational budget, this is impractical and uninformed by any metrics of interest. In this paper, we propose instead to equip algorithms that generate tertiary structures, also known as decoy generation algorithms, with memory of the protein structure space that they explore. Specifically, we propose an evolving, granularity-controllable map of the protein structure space that makes use of low-dimensional representations of protein structures. Evaluations on diverse target sequences that include recent hard CASP targets show that drastic reductions in storage can be made without sacrificing decoy quality. The presented results make the case that integrating a map of the protein structure space is a promising mechanism to enhance decoy generation algorithms in template-free protein structure prediction.

Download Full-text

An Energy Landscape Treatment of Decoy Selection in Template-Free Protein Structure Prediction

Computation ◽

10.3390/computation6020039 ◽

2018 ◽

Vol 6 (2) ◽

pp. 39 ◽

Cited By ~ 6

Author(s):

Nasrin Akhter ◽

Wanli Qiao ◽

Amarda Shehu

Keyword(s):

Protein Structure ◽

Protein Structure Prediction ◽

Structure Prediction ◽

Energy Landscape ◽

Free Protein ◽

Template Free

Download Full-text

Machine Learning for Protein Structure Prediction

Studies in Classification, Data Analysis, and Knowledge Organization - Information Systems and Data Analysis ◽

10.1007/978-3-642-46808-7_36 ◽

1994 ◽

pp. 384-390

Author(s):

Joachim Selbig

Keyword(s):

Machine Learning ◽

Protein Structure ◽

Protein Structure Prediction ◽

Structure Prediction

Download Full-text

A Multi-Objective Stochastic Optimization Approach for Decoy Generation in Template-Free Protein Structure Prediction

Biophysical Journal ◽

10.1016/j.bpj.2018.11.365 ◽

2019 ◽

Vol 116 (3) ◽

pp. 59a

Author(s):

Ahmed Bin Zaman ◽

Amarda Shehu

Keyword(s):

Protein Structure ◽

Stochastic Optimization ◽

Protein Structure Prediction ◽

Structure Prediction ◽

Optimization Approach ◽

Free Protein ◽

Multi Objective ◽

Template Free

Download Full-text

Protein structure prediction (RMSD ≤ 5 Å) using machine learning models

International Journal of Data Mining and Bioinformatics ◽

10.1504/ijdmb.2016.073361 ◽

2016 ◽

Vol 14 (1) ◽

pp. 71 ◽

Cited By ~ 4

Author(s):

Yadunath Pathak ◽

Prashant Singh Rana ◽

P.K. Singh ◽

Mukesh Saraswat

Keyword(s):

Machine Learning ◽

Protein Structure ◽

Protein Structure Prediction ◽

Structure Prediction ◽

Learning Models ◽

Machine Learning Models

Download Full-text

3P-012 Toward the Success of Template-Free Protein Structure Prediction : Dependency of Performance on the Fragment Libraries(Protein:Structure,The 47th Annual Meeting of the Biophysical Society of Japan)

Seibutsu Butsuri ◽

10.2142/biophys.49.s152_5 ◽

2009 ◽

Vol 49 (supplement) ◽

pp. S152

Author(s):

Shintaro Minami ◽

George Chikenji

Keyword(s):

Protein Structure ◽

Annual Meeting ◽

Protein Structure Prediction ◽

Structure Prediction ◽

Free Protein ◽

Biophysical Society ◽

Template Free ◽

Fragment Libraries

Download Full-text

Protein structure prediction using AI and quantum computers

10.1101/2021.05.22.445242 ◽

2021 ◽

Author(s):

Ben Geoffrey A S

Keyword(s):

Machine Learning ◽

Protein Structure ◽

Protein Structure Prediction ◽

Structure Prediction ◽

Machine Learning Algorithms ◽

Quantum Computers ◽

Data Set ◽

Backbone Atoms ◽

Protein Contact Maps

This work seeks to combine the combined advantage of leveraging these emerging areas of Artificial Intelligence and quantum computing in applying it to solve the specific biological problem of protein structure prediction using Quantum Machine Learning algorithms. The CASP dataset from ProteinNet was downloaded which is a standardized data set for machine learning of protein structure. Its large and standardized dataset of PDB entries contains the coordinates of the backbone atoms, corresponding to the sequential chain of N, C_alpha, and C' atoms. This dataset was used to train a quantum-classical hybrid Keras deep neural network model to predict the structure of the proteins. To visually qualify the quality of the predicted versus the actual protein structure, protein contact maps were generated with the experimental and predicted protein structure data and qualified. Therefore this model is recommended for the use of protein structure prediction using AI leveraging the power of quantum computers. The code is provided in the following Github repository https://github.com/bengeof/Protein-structure-prediction-using-AI-and-quantum-computers.

Download Full-text

MULTICOM2: an open-source protein structure prediction system powered by deep learning and distance prediction

10.21203/rs.3.rs-339464/v1 ◽

2021 ◽

Author(s):

Tianqi Wu ◽

Jian Liu ◽

Zhiye Guo ◽

Jie Hou ◽

Jianlin Cheng

Keyword(s):

Deep Learning ◽

Protein Structure ◽

Open Source ◽

Protein Structure Prediction ◽

Structure Prediction ◽

Tertiary Structure ◽

Modeling Method ◽

Structure Modeling ◽

Prediction System ◽

Template Free

Abstract Protein structure prediction is an important problem in bioinformatics and has been studied for decades. However, there are still few open-source comprehensive protein structure prediction packages publicly available in the field. In this paper, we present our latest open-source protein tertiary structure prediction system - MULTICOM2, an integration of template-based modeling (TBM) and template-free modeling (FM) methods. The template-based modeling uses sequence alignment tools with deep multiple sequence alignments to search for structural templates, which are much faster and more accurate than MULTICOM1. The template-free (ab initio or de novo) modeling uses the inter-residue distances predicted by DeepDist to reconstruct tertiary structure models without using any known structure as template. In the blind CASP14 experiment, the average TM-score of the models predicted by our server predictor based on the MULTICOM2 system is 0.720 for 58 TBM (regular) domains and 0.514 for 38 FM and FM/TBM (hard) domains, indicating that MULTICOM2 is capable of predicting good tertiary structures across the board. It can predict the correct fold for 76 CASP14 domains (95% regular domains and 55% hard domains) if only one prediction is made for a domain. The success rate is increased to 3% for both regular and hard domains if five predictions are made per domain. Moreover, the prediction accuracy of the pure template-free structure modeling method on both TBM and FM targets is very close to the combination of template-based and template-free modeling methods. This demonstrates that the distance-based template-free modeling method powered by deep learning can largely replace the traditional template-based modeling method even on TBM targets that TBM methods used to dominate and therefore provides a uniform structure modeling approach to any protein. Finally, on the 38 CASP14 FM and FM/TBM hard domains, MULTICOM2 server predictors (MULTICOM-HYBRID, MULTICOM-DEEP, MULTICOM-DIST) were ranked among the top 20 automated server predictors in the CASP14 experiment. After combining multiple predictors from the same research group as one entry, MULTICOM-HYBRID was ranked no. 5. The source code of MULTICOM2 is freely available at https://github.com/multicom-toolbox/multicom/tree/multicom_v2.0.

Download Full-text

Deep Learning Approaches for Protein Structure Prediction

International Journal of Engineering & Technology ◽

10.14419/ijet.v7i4.5.20037 ◽

2018 ◽

Vol 7 (4.5) ◽

pp. 168

Author(s):

Khatri Chandni ◽

Prof. Mrudang Pandya ◽

Dr. Sunil Jardosh

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Protein Structure ◽

Protein Structure Prediction ◽

Structure Prediction ◽

Machine Learning Techniques ◽

Great Promise ◽

Learning Approaches ◽

Learning Networks ◽

Learning Techniques

In recent years, Machine Learning techniques that are based on Deep Learning networks that show a great promise in research communities.Successful methods for deep learning involve Artificial Neural Networks and Machine Learning. Deep Learning solves severa problems in bioinformatics. Protein Structure Prediction is one of the most important fields that can be solving using Deep Learning approaches.These protein are categorized on basis of occurrence of amino acid patterns occur to extract the feature. In these paper aimed to review work based on protein structure prediction solve using Deep Learning Networks. Objective is to review motivate and facilitatethese deep learn the network for predicting protein sequences using Deep Learning.

Download Full-text

Keynote: Protein Structure Prediction and its Understanding Based on Machine Learning Methods

2010 10th IEEE International Conference on Computer and Information Technology ◽

10.1109/cit.2010.509 ◽

2010 ◽

Keyword(s):

Machine Learning ◽

Protein Structure ◽

Protein Structure Prediction ◽

Structure Prediction ◽

Learning Methods ◽

Machine Learning Methods

Download Full-text