Protein structural class prediction using predicted secondary structure and hydropathy profile

10.32920/ryerson.14657172.v1 ◽

2021 ◽

Author(s):

Syeda Nadia Firdaus

Keyword(s):

Secondary Structure ◽

Classification Problem ◽

Support Vector ◽

Prediction Problem ◽

Class Prediction ◽

Structural Class ◽

Protein Structural Class ◽

Vector Machines ◽

Structural Classes ◽

New Strategies

This thesis explores machine learning models based on various feature sets to solve the protein structural class prediction problem which is a significant classification problem in bioinformatics. Knowledge of protein structural classes contributes to an understanding of protein folding patterns, and this has made structural class prediction research a major topic of interest. In this thesis, features are extracted from predicted secondary structure and hydropathy sequence using new strategies to classify proteins into one of the four major structural classes: all-α, all-β, α/β, and α+β. The prediction accuracy using these features compares favourably with some existing successful methods. We use Support Vector Machines (SVM), since this learning method has well-known efficiency in solving this classification problem. On a standard dataset (25PDB), the proposed system has an overall accuracy of 89% with as few as 22 features, whereas the previous best performing method had an accuracy of 88% using 2510 features.

Download Full-text

Protein Structural Class Prediction Based on Distance-related Statistical Features from Graphical Representation of Predicted Secondary Structure

Letters in Organic Chemistry ◽

10.2174/1570178615666180914110451 ◽

2019 ◽

Vol 16 (4) ◽

pp. 317-324

Author(s):

Liang Kong ◽

Lichao Zhang ◽

Xiaodong Han ◽

Jinfeng Lv

Keyword(s):

Feature Extraction ◽

Secondary Structure ◽

Protein Sequence ◽

Function Analysis ◽

Superior Performance ◽

Support Vector ◽

Chaos Game Representation ◽

Class Prediction ◽

Structural Class ◽

Protein Structural Class

Protein structural class prediction is beneficial to protein structure and function analysis. Exploring good feature representation is a key step for this prediction task. Prior works have demonstrated the effectiveness of the secondary structure based feature extraction methods especially for lowsimilarity protein sequences. However, the prediction accuracies still remain limited. To explore the potential of secondary structure information, a novel feature extraction method based on a generalized chaos game representation of predicted secondary structure is proposed. Each protein sequence is converted into a 20-dimensional distance-related statistical feature vector to characterize the distribution of secondary structure elements and segments. The feature vectors are then fed into a support vector machine classifier to predict the protein structural class. Our experiments on three widely used lowsimilarity benchmark datasets (25PDB, 1189 and 640) show that the proposed method achieves superior performance to the state-of-the-art methods. It is anticipated that our method could be extended to other graphical representations of protein sequence and be helpful in future protein research.

Download Full-text

SVM-BASED METHOD FOR PROTEIN STRUCTURAL CLASS PREDICTION USING SECONDARY STRUCTURAL CONTENT AND STRUCTURAL INFORMATION OF AMINO ACIDS

Journal of Bioinformatics and Computational Biology ◽

10.1142/s0219720011005422 ◽

2011 ◽

Vol 09 (04) ◽

pp. 489-502 ◽

Cited By ~ 6

Author(s):

TABREZ ANWAR SHAMIM MOHAMMAD ◽

HAMPAPATHALU ADIMURTHY NAGARAJARAM

Keyword(s):

Amino Acids ◽

Structural Information ◽

Solvent Accessibility ◽

Protein Structures ◽

Classification Problem ◽

Support Vector ◽

Class Prediction ◽

Structural Class ◽

Protein Structural Class ◽

Structural Content

The knowledge collated from the known protein structures has revealed that the proteins are usually folded into the four structural classes: all-α, all-β, α/β and α + β. A number of methods have been proposed to predict the protein's structural class from its primary structure; however, it has been observed that these methods fail or perform poorly in the cases of distantly related sequences. In this paper, we propose a new method for protein structural class prediction using low homology (twilight-zone) protein sequences dataset. Since protein structural class prediction is a typical classification problem, we have developed a Support Vector Machine (SVM)-based method for protein structural class prediction that uses features derived from the predicted secondary structure and predicted burial information of amino acid residues. The examination of different individual as well as feature combinations revealed that the combination of secondary structural content, secondary structural and solvent accessibility state frequencies of amino acids gave rise to the best leave-one-out cross-validation accuracy of ~81% which is comparable to the best accuracy reported in the literature so far.

Download Full-text

Prediction of Protein Structural Classes: Features Extraction to Classification Algorithm

Current Proteomics ◽

10.2174/1570164618666210218141148 ◽

2021 ◽

Vol 18 ◽

Author(s):

Xiaoqing Liu ◽

Zhenyu Yang ◽

Yaoxin Wang ◽

Qi Dai

Keyword(s):

Tertiary Structure ◽

Protein Sequencing ◽

Conformational Space ◽

Folding Rate ◽

Class Prediction ◽

Structural Class ◽

Protein Structural Class ◽

Dna Binding Sites ◽

Structural Classes ◽

Protein Structure Data

: The fast growing of protein sequencing and protein structure data has promoted the development of the protein structural class prediction. Several prediction methods have been proposed to study protein folding rate, DNA binding sites, as well as reducing the search of conformational space and realizing the prediction of tertiary structure. This paper introduces the current approaches of protein structural class prediction and emphasize their steps from information extraction to classification algorithms.

Download Full-text

Application of Improved Three-Dimensional Kernel Approach to Prediction of Protein Structural Class

BioMed Research International ◽

10.1155/2013/625403 ◽

2013 ◽

Vol 2013 ◽

pp. 1-8

Author(s):

Xu Liu ◽

Yuchao Zhang ◽

Hua Yang ◽

Lisheng Wang ◽

Shuaibing Liu

Keyword(s):

Support Vector Machines ◽

Three Dimensional ◽

Machine Learning Techniques ◽

Support Vector ◽

Structural Class ◽

Protein Structural Class ◽

Complementary Role ◽

Vector Machines ◽

Kernel Approach ◽

Leave One Out

Kernel methods, such as kernel PCA, kernel PLS, and support vector machines, are widely known machine learning techniques in biology, medicine, chemistry, and material science. Based on nonlinear mapping and Coulomb function, two 3D kernel approaches were improved and applied to predictions of the four protein tertiary structural classes of domains (all-α, all-β,α/β, andα + β) and five membrane protein types with satisfactory results. In a benchmark test, the performances of improved 3D kernel approach were compared with those of neural networks, support vector machines, and ensemble algorithm. Demonstration through leave-one-out cross-validation on working datasets constructed by investigators indicated that new kernel approaches outperformed other predictors. It has not escaped our notice that 3D kernel approaches may hold a high potential for improving the quality in predicting the other protein features as well. Or at the very least, it will play a complementary role to many of the existing algorithms in this regard.

Download Full-text

Protein structural class prediction using predicted secondary structure and hydropathy profile

Proceedings of the International C* Conference on Computer Science and Software Engineering - C3S2E '13 ◽

10.1145/2494444.2494459 ◽

2013 ◽

Author(s):

Syeda Nadia Firdaus ◽

Eric Harley

Keyword(s):

Secondary Structure ◽

Class Prediction ◽

Structural Class ◽

Protein Structural Class

Download Full-text

A Combination of Feature Extraction Methods with an Ensemble of Different Classifiers for Protein Structural Class Prediction Problem

IEEE/ACM Transactions on Computational Biology and Bioinformatics ◽

10.1109/tcbb.2013.65 ◽

2013 ◽

Vol 10 (3) ◽

pp. 564-575 ◽

Cited By ~ 36

Author(s):

Abdollah Dehzangi ◽

Kuldip Paliwal ◽

Alok Sharma ◽

Omid Dehzangi ◽

Abdul Sattar

Keyword(s):

Feature Extraction ◽

Extraction Methods ◽

Prediction Problem ◽

Class Prediction ◽

Structural Class ◽

Protein Structural Class

Download Full-text

Improving the Prediction of Protein Structural Class for Low-Similarity Sequences by Incorporating Evolutionaryand Structural Information

Journal of Advanced Computational Intelligence and Intelligent Informatics ◽

10.20965/jaciii.2016.p0402 ◽

2016 ◽

Vol 20 (3) ◽

pp. 402-411 ◽

Cited By ~ 2

Author(s):

Liang Kong ◽

◽

Lingfu Kong ◽

Rong Jing ◽

Keyword(s):

Protein Function ◽

Structural Information ◽

Sequence Similarity ◽

Computational Method ◽

Evolutionary Information ◽

Support Vector ◽

Local Alignment ◽

Class Prediction ◽

Structural Class ◽

Protein Structural Class

Protein structural class prediction is beneficial to study protein function, regulation and interactions. However, protein structural class prediction for low-similarity sequences (i.e., below 40% in pairwise sequence similarity) remains a challenging problem at present. In this study, a novel computational method is proposed to accurately predict protein structural class for low-similarity sequences. This method is based on support vector machine in conjunction with integrated features from evolutionary information generated with position specific iterative basic local alignment search tool (PSI-BLAST) and predicted secondary structure. Various prediction accuracies evaluated by the jackknife tests are reported on two widely-used low-similarity benchmark datasets (25PDB and 1189), reaching overall accuracies 89.3% and 87.9%, which are significantly higher than those achieved by state-of-the-art in protein structural class prediction. The experimental results suggest that our method could serve as an effective alternative to existing methods in protein structural classification, especially for low-similarity sequences.

Download Full-text

An improved method to enhance protein structural class prediction using their secondary structure sequences and genetic algorithm

International Journal of Bioinformatics Research and Applications ◽

10.1504/ijbra.2018.10009965 ◽

2018 ◽

Vol 14 (4) ◽

pp. 376

Author(s):

Azuraliza Abu Bakar ◽

Mohammed Hasan Aldulaimi ◽

Suhaila Zainudin

Keyword(s):

Genetic Algorithm ◽

Secondary Structure ◽

Improved Method ◽

Class Prediction ◽

Structural Class ◽

Protein Structural Class

Download Full-text

Protein Structural Class Determination Using Support Vector Machines

Lecture Notes in Computer Science - Computer and Information Sciences - ISCIS 2004 ◽

10.1007/978-3-540-30182-0_9 ◽

2004 ◽

pp. 82-89 ◽

Cited By ~ 6

Author(s):

Zerrin Isik ◽

Berrin Yanikoglu ◽

Ugur Sezerman

Keyword(s):

Support Vector Machines ◽

Support Vector ◽

Structural Class ◽

Protein Structural Class ◽

Vector Machines

Download Full-text