Performance of secondary structure prediction methods on proteins containing structurally ambivalent sequence fragments

Biopolymers ◽  
2013 ◽  
Vol 100 (2) ◽  
pp. 148-153 ◽  
Author(s):  
K. Mani Saravanan ◽  
Samuel Selvaraj
Author(s):  
Roma Chandra

Protein structure prediction is one of the important goals in the area of bioinformatics and biotechnology. Prediction methods include structure prediction of both secondary and tertiary structures of protein. Protein secondary structure prediction infers knowledge related to presence of helixes, sheets and coils in a polypeptide chain whereas protein tertiary structure prediction infers knowledge related to three dimensional structures of proteins. Protein secondary structures represent the possible motifs or regular expressions represented as patterns that are predicted from primary protein sequence in the form of alpha helix, betastr and and coils. The secondary structure prediction is useful as it infers information related to the structure and function of unknown protein sequence. There are various secondary structure prediction methods used to predict about helixes, sheets and coils. Based on these methods there are various prediction tools under study. This study includes prediction of hemoglobin using various tools. The results produced inferred knowledge with reference to percentage of amino acids participating to produce helices, sheets and coils. PHD and DSC produced the best of the results out of all the tools used.


2020 ◽  
Author(s):  
Maxim Shapovalov ◽  
Roland L. Dunbrack ◽  
Slobodan Vucetic

AbstractProtein secondary structure prediction remains a vital topic with improving accuracy and broad applications. By using deep learning algorithms, prediction methods not relying on structure templates were recently reported to reach as high as 87% accuracy on 3 labels (helix, sheet or coil). Due to lack of a widely accepted standard in secondary structure predictor development and evaluation, a fair comparison of predictors is challenging. A detailed examination of factors that contribute to higher accuracy is also lacking. In this paper, we present: (1) a new test set, Test2018, consisting of proteins from structures released in 2018 with less than 25% similar to any protein published before 2018; (2) a 4-layer convolutional neural network, SecNet, with an input window of ±14 amino acids which was trained on proteins less than 25% identical to proteins in Test2018 and the commonly used CB513 test set; (3) a detailed ablation study where we reverse one algorithmic choice at a time in SecNet and evaluate the effect on the prediction accuracy; (4) new 4- and 5-label prediction alphabets that may be more practical for tertiary structure prediction methods. The 3-label accuracy of the leading predictors on both Test2018 and CB513 is 81-82%, while SecNet’s accuracy is 84% for both sets. The ablation study of different factors (evolutionary information, neural network architecture, and training hyper-parameters) suggests the best accuracy results are achieved with good choices for each of them while the neural network architecture is not as critical as long as it is not too simple. Protocols for generating and using unbiased test, validation, and training sets are provided. Our data sets, including input features and assigned labels, and SecNet software including third-party dependencies and databases, are downloadable from dunbrack.fccc.edu/ss and github.com/sh-maxim/ss.


Author(s):  
Saad O.A. Subair ◽  
Safaai Deris

Protein secondary-structure prediction is a fundamental step in determining the 3D structure of a protein. In this chapter, a new method for predicting protein secondary structure from amino-acid sequences has been proposed and implemented. Cuff and Barton 513 protein data set is used in training and testing the prediction methods under the same hardware, platforms, and environments. The newly developed method utilizes the knowledge of the GOR-V information theory and the power of the neural networks to classify a novel protein sequence in one of its three secondary-structures classes (i.e., helices, strands, and coils). The newly developed method (NN-GORV-I) is further improved by applying a filtering mechanism to the searched database and hence named NN-GORV-II. The developed prediction methods are rigorously analyzed and tested together with the other five well-known prediction methods in this domain to allow easy comparison and clear conclusions.


Sign in / Sign up

Export Citation Format

Share Document