An improved method to enhance protein structural class prediction using their secondary structure sequences and genetic algorithm

Author(s):  
Azuraliza Abu Bakar ◽  
Mohammed Hasan Aldulaimi ◽  
Suhaila Zainudin
2019 ◽  
Vol 16 (4) ◽  
pp. 317-324
Author(s):  
Liang Kong ◽  
Lichao Zhang ◽  
Xiaodong Han ◽  
Jinfeng Lv

Protein structural class prediction is beneficial to protein structure and function analysis. Exploring good feature representation is a key step for this prediction task. Prior works have demonstrated the effectiveness of the secondary structure based feature extraction methods especially for lowsimilarity protein sequences. However, the prediction accuracies still remain limited. To explore the potential of secondary structure information, a novel feature extraction method based on a generalized chaos game representation of predicted secondary structure is proposed. Each protein sequence is converted into a 20-dimensional distance-related statistical feature vector to characterize the distribution of secondary structure elements and segments. The feature vectors are then fed into a support vector machine classifier to predict the protein structural class. Our experiments on three widely used lowsimilarity benchmark datasets (25PDB, 1189 and 640) show that the proposed method achieves superior performance to the state-of-the-art methods. It is anticipated that our method could be extended to other graphical representations of protein sequence and be helpful in future protein research.


2021 ◽  
Author(s):  
Syeda Nadia Firdaus

This thesis explores machine learning models based on various feature sets to solve the protein structural class prediction problem which is a significant classification problem in bioinformatics. Knowledge of protein structural classes contributes to an understanding of protein folding patterns, and this has made structural class prediction research a major topic of interest. In this thesis, features are extracted from predicted secondary structure and hydropathy sequence using new strategies to classify proteins into one of the four major structural classes: all-α, all-β, α/β, and α+β. The prediction accuracy using these features compares favourably with some existing successful methods. We use Support Vector Machines (SVM), since this learning method has well-known efficiency in solving this classification problem. On a standard dataset (25PDB), the proposed system has an overall accuracy of 89% with as few as 22 features, whereas the previous best performing method had an accuracy of 88% using 2510 features.


2021 ◽  
Vol 35 (5) ◽  
pp. 403-408
Author(s):  
Subhendu Bhusan Rout ◽  
Sasmita Mishra ◽  
Susanta Kumar Sahoo

The protein secondary structure prediction (PSP) of the large biological molecule protein is an important task of bioinformatics and in the last decades many machines learning and soft computing methodologies play vital roles in achieving satisfactory results. The protein structural class determination is an important topic in protein science because an idea about protein structural class is quite useful to know about the changes and reaction of a living body in order to design new drugs and medicines. Though several hard computing techniques may be helpful in these areas but focusing upon the steady development and big data size in protein sequences that are entering into databanks, it is a challenge to do experiments with the hard computing techniques. Soft computing techniques like Artificial Neural Network, Fuzzy logic, Genetic Algorithm play a vital role for these types of genomic researches. To face these complex challenges, this article presents a novel method to predict the protein structure by using Genetic Algorithm. The Q3 accuracy and SOV measure analysis with SOVH, SOVE, SOVC value of respective α-helix (H), β-sheet (E) and coil/loop(C) structures are also discussed. The application of Genetic algorithm i.e. the proposed technique GApred provides better result than that of SPIDER2, JPred4, FSVM and SSpro5 for all the three datasets in the experiment. This method is helpful for distinct protein secondary structure prediction and a significant success rate was observed, which indicates that it can be used as a powerful tool in drug design and medicine research.


2021 ◽  
Author(s):  
Syeda Nadia Firdaus

This thesis explores machine learning models based on various feature sets to solve the protein structural class prediction problem which is a significant classification problem in bioinformatics. Knowledge of protein structural classes contributes to an understanding of protein folding patterns, and this has made structural class prediction research a major topic of interest. In this thesis, features are extracted from predicted secondary structure and hydropathy sequence using new strategies to classify proteins into one of the four major structural classes: all-α, all-β, α/β, and α+β. The prediction accuracy using these features compares favourably with some existing successful methods. We use Support Vector Machines (SVM), since this learning method has well-known efficiency in solving this classification problem. On a standard dataset (25PDB), the proposed system has an overall accuracy of 89% with as few as 22 features, whereas the previous best performing method had an accuracy of 88% using 2510 features.


Sign in / Sign up

Export Citation Format

Share Document