SVM-BASED METHOD FOR PROTEIN STRUCTURAL CLASS PREDICTION USING SECONDARY STRUCTURAL CONTENT AND STRUCTURAL INFORMATION OF AMINO ACIDS

2011 ◽  
Vol 09 (04) ◽  
pp. 489-502 ◽  
Author(s):  
TABREZ ANWAR SHAMIM MOHAMMAD ◽  
HAMPAPATHALU ADIMURTHY NAGARAJARAM

The knowledge collated from the known protein structures has revealed that the proteins are usually folded into the four structural classes: all-α, all-β, α/β and α + β. A number of methods have been proposed to predict the protein's structural class from its primary structure; however, it has been observed that these methods fail or perform poorly in the cases of distantly related sequences. In this paper, we propose a new method for protein structural class prediction using low homology (twilight-zone) protein sequences dataset. Since protein structural class prediction is a typical classification problem, we have developed a Support Vector Machine (SVM)-based method for protein structural class prediction that uses features derived from the predicted secondary structure and predicted burial information of amino acid residues. The examination of different individual as well as feature combinations revealed that the combination of secondary structural content, secondary structural and solvent accessibility state frequencies of amino acids gave rise to the best leave-one-out cross-validation accuracy of ~81% which is comparable to the best accuracy reported in the literature so far.

2021 ◽  
Author(s):  
Syeda Nadia Firdaus

This thesis explores machine learning models based on various feature sets to solve the protein structural class prediction problem which is a significant classification problem in bioinformatics. Knowledge of protein structural classes contributes to an understanding of protein folding patterns, and this has made structural class prediction research a major topic of interest. In this thesis, features are extracted from predicted secondary structure and hydropathy sequence using new strategies to classify proteins into one of the four major structural classes: all-α, all-β, α/β, and α+β. The prediction accuracy using these features compares favourably with some existing successful methods. We use Support Vector Machines (SVM), since this learning method has well-known efficiency in solving this classification problem. On a standard dataset (25PDB), the proposed system has an overall accuracy of 89% with as few as 22 features, whereas the previous best performing method had an accuracy of 88% using 2510 features.


Author(s):  
Liang Kong ◽  
◽  
Lingfu Kong ◽  
Rong Jing ◽  

Protein structural class prediction is beneficial to study protein function, regulation and interactions. However, protein structural class prediction for low-similarity sequences (i.e., below 40% in pairwise sequence similarity) remains a challenging problem at present. In this study, a novel computational method is proposed to accurately predict protein structural class for low-similarity sequences. This method is based on support vector machine in conjunction with integrated features from evolutionary information generated with position specific iterative basic local alignment search tool (PSI-BLAST) and predicted secondary structure. Various prediction accuracies evaluated by the jackknife tests are reported on two widely-used low-similarity benchmark datasets (25PDB and 1189), reaching overall accuracies 89.3% and 87.9%, which are significantly higher than those achieved by state-of-the-art in protein structural class prediction. The experimental results suggest that our method could serve as an effective alternative to existing methods in protein structural classification, especially for low-similarity sequences.


2021 ◽  
Author(s):  
Syeda Nadia Firdaus

This thesis explores machine learning models based on various feature sets to solve the protein structural class prediction problem which is a significant classification problem in bioinformatics. Knowledge of protein structural classes contributes to an understanding of protein folding patterns, and this has made structural class prediction research a major topic of interest. In this thesis, features are extracted from predicted secondary structure and hydropathy sequence using new strategies to classify proteins into one of the four major structural classes: all-α, all-β, α/β, and α+β. The prediction accuracy using these features compares favourably with some existing successful methods. We use Support Vector Machines (SVM), since this learning method has well-known efficiency in solving this classification problem. On a standard dataset (25PDB), the proposed system has an overall accuracy of 89% with as few as 22 features, whereas the previous best performing method had an accuracy of 88% using 2510 features.


2019 ◽  
Vol 16 (4) ◽  
pp. 317-324
Author(s):  
Liang Kong ◽  
Lichao Zhang ◽  
Xiaodong Han ◽  
Jinfeng Lv

Protein structural class prediction is beneficial to protein structure and function analysis. Exploring good feature representation is a key step for this prediction task. Prior works have demonstrated the effectiveness of the secondary structure based feature extraction methods especially for lowsimilarity protein sequences. However, the prediction accuracies still remain limited. To explore the potential of secondary structure information, a novel feature extraction method based on a generalized chaos game representation of predicted secondary structure is proposed. Each protein sequence is converted into a 20-dimensional distance-related statistical feature vector to characterize the distribution of secondary structure elements and segments. The feature vectors are then fed into a support vector machine classifier to predict the protein structural class. Our experiments on three widely used lowsimilarity benchmark datasets (25PDB, 1189 and 640) show that the proposed method achieves superior performance to the state-of-the-art methods. It is anticipated that our method could be extended to other graphical representations of protein sequence and be helpful in future protein research.


Author(s):  
Harsh Saini ◽  
◽  
Gaurav Raicar ◽  
Alok Sharma ◽  
Sunil Lal ◽  
...  

Protein structural class prediction (SCP) is as important task in identifying protein tertiary structure and protein functions. In this study, we propose a feature extraction technique to predict secondary structures. The technique utilizes bigram (of adjacent andk-separated amino acids) information derived from Position Specific Scoring Matrix (PSSM). The technique has shown promising results when evaluated on benchmarked Ding and Dubchak dataset.


2019 ◽  
Vol 8 (4) ◽  
pp. 478
Author(s):  
Thair A. Kadhim ◽  
Mohammed Hasan Aldulaimi ◽  
Suhaila Zainudin ◽  
Azuraliza Abu Bakar

The effective selection of protein features and the accurate method for predicting protein structural class (PSP) is an important aspect in protein folding, especially for low-similarity sequences. Many promising approaches are proposed to solve this problem, mostly via computational intelligence methods. One of the main aspect of the prediction is the extraction of an excellent representation of a protein sequence. An integrated vector of dimensions 71 was extracted using secondary and hydropathy information in this study Using newly developed strategies for categorizing proteins into their respective main structures classes, which are all-α, all-β, α/β, and α+β. Support Vector Machine (SVM) and Differential Evolution (DE) were combined using the wrapper method to select the top N features based on the level of their respective importance. The classification can be made more accurate by tuning the kernel parameters for the SVM in the training phase. In this study, the mean of the classification rate from using the SVM classifier was used to evaluate the selected subset of features. This study was tested using two low - similarity data sets (D640 and ASTRAL). A comparison between the proposed (SVM + DE) based on DE feature selection approach and (SVM+DE) based on grid search (a traditional method to search for parameters) forms the core of this work. The proposed SVM+DE model is competitive and highly reliable in terms of time and performance accuracy compared with other reported methods in literature.   


Sign in / Sign up

Export Citation Format

Share Document