Protein Structural Class Prediction viak-Separated Bigrams Using Position Specific Scoring Matrix

Protein structural class prediction (SCP) is as important task in identifying protein tertiary structure and protein functions. In this study, we propose a feature extraction technique to predict secondary structures. The technique utilizes bigram (of adjacent andk-separated amino acids) information derived from Position Specific Scoring Matrix (PSSM). The technique has shown promising results when evaluated on benchmarked Ding and Dubchak dataset.

Download Full-text

Protein Structural Class Prediction Based on Distance-related Statistical Features from Graphical Representation of Predicted Secondary Structure

Letters in Organic Chemistry ◽

10.2174/1570178615666180914110451 ◽

2019 ◽

Vol 16 (4) ◽

pp. 317-324

Author(s):

Liang Kong ◽

Lichao Zhang ◽

Xiaodong Han ◽

Jinfeng Lv

Keyword(s):

Feature Extraction ◽

Secondary Structure ◽

Protein Sequence ◽

Function Analysis ◽

Superior Performance ◽

Support Vector ◽

Chaos Game Representation ◽

Class Prediction ◽

Structural Class ◽

Protein Structural Class

Protein structural class prediction is beneficial to protein structure and function analysis. Exploring good feature representation is a key step for this prediction task. Prior works have demonstrated the effectiveness of the secondary structure based feature extraction methods especially for lowsimilarity protein sequences. However, the prediction accuracies still remain limited. To explore the potential of secondary structure information, a novel feature extraction method based on a generalized chaos game representation of predicted secondary structure is proposed. Each protein sequence is converted into a 20-dimensional distance-related statistical feature vector to characterize the distribution of secondary structure elements and segments. The feature vectors are then fed into a support vector machine classifier to predict the protein structural class. Our experiments on three widely used lowsimilarity benchmark datasets (25PDB, 1189 and 640) show that the proposed method achieves superior performance to the state-of-the-art methods. It is anticipated that our method could be extended to other graphical representations of protein sequence and be helpful in future protein research.

Download Full-text

Prediction of Protein Structural Classes: Features Extraction to Classification Algorithm

Current Proteomics ◽

10.2174/1570164618666210218141148 ◽

2021 ◽

Vol 18 ◽

Author(s):

Xiaoqing Liu ◽

Zhenyu Yang ◽

Yaoxin Wang ◽

Qi Dai

Keyword(s):

Tertiary Structure ◽

Protein Sequencing ◽

Conformational Space ◽

Folding Rate ◽

Class Prediction ◽

Structural Class ◽

Protein Structural Class ◽

Dna Binding Sites ◽

Structural Classes ◽

Protein Structure Data

: The fast growing of protein sequencing and protein structure data has promoted the development of the protein structural class prediction. Several prediction methods have been proposed to study protein folding rate, DNA binding sites, as well as reducing the search of conformational space and realizing the prediction of tertiary structure. This paper introduces the current approaches of protein structural class prediction and emphasize their steps from information extraction to classification algorithms.

Download Full-text

A Tri-Gram Based Feature Extraction Technique Using Linear Probabilities of Position Specific Scoring Matrix for Protein Fold Recognition

IEEE Transactions on NanoBioscience ◽

10.1109/tnb.2013.2296050 ◽

2014 ◽

Vol 13 (1) ◽

pp. 44-50 ◽

Cited By ~ 48

Author(s):

Kuldip K. Paliwal ◽

Alok Sharma ◽

James Lyons ◽

Abdollah Dehzangi

Keyword(s):

Feature Extraction ◽

Fold Recognition ◽

Position Specific Scoring Matrix ◽

Extraction Technique ◽

Protein Fold ◽

Protein Fold Recognition ◽

Scoring Matrix ◽

Feature Extraction Technique

Download Full-text

SVM-BASED METHOD FOR PROTEIN STRUCTURAL CLASS PREDICTION USING SECONDARY STRUCTURAL CONTENT AND STRUCTURAL INFORMATION OF AMINO ACIDS

Journal of Bioinformatics and Computational Biology ◽

10.1142/s0219720011005422 ◽

2011 ◽

Vol 09 (04) ◽

pp. 489-502 ◽

Cited By ~ 6

Author(s):

TABREZ ANWAR SHAMIM MOHAMMAD ◽

HAMPAPATHALU ADIMURTHY NAGARAJARAM

Keyword(s):

Amino Acids ◽

Structural Information ◽

Solvent Accessibility ◽

Protein Structures ◽

Classification Problem ◽

Support Vector ◽

Class Prediction ◽

Structural Class ◽

Protein Structural Class ◽

Structural Content

The knowledge collated from the known protein structures has revealed that the proteins are usually folded into the four structural classes: all-α, all-β, α/β and α + β. A number of methods have been proposed to predict the protein's structural class from its primary structure; however, it has been observed that these methods fail or perform poorly in the cases of distantly related sequences. In this paper, we propose a new method for protein structural class prediction using low homology (twilight-zone) protein sequences dataset. Since protein structural class prediction is a typical classification problem, we have developed a Support Vector Machine (SVM)-based method for protein structural class prediction that uses features derived from the predicted secondary structure and predicted burial information of amino acid residues. The examination of different individual as well as feature combinations revealed that the combination of secondary structural content, secondary structural and solvent accessibility state frequencies of amino acids gave rise to the best leave-one-out cross-validation accuracy of ~81% which is comparable to the best accuracy reported in the literature so far.

Download Full-text