A feature extraction technique using bi-gram probabilities of position specific scoring matrix for protein fold recognition

AbstractProtein fold recognition plays a crucial role in discovering three-dimensional structure of proteins and protein functions. Several approaches have been employed for the prediction of protein folds. Some of these approaches are based on extracting features from protein sequences and using a strong classifier. Feature extraction techniques generally utilize syntactical-based information, evolutionary-based information and physiochemical-based information to extract features. In recent years, Finding an efficient technique for integrating discriminate features have been received advancing attention. In this study, we integrate Auto-Cross-Covariance (ACC) and Separated dimer (SD) evolutionary feature extraction methods. The results features are scored by Information gain (IG) to define and select several discriminated features. According to three benchmark datasets, DD, RDD and EDD, the results of the support vector machine (SVM) show more than 6% improvement in accuracy on these benchmark datasets.

Download Full-text

MotifCNN-fold: protein fold recognition based on fold-specific features extracted by motif-based convolutional neural networks

Briefings in Bioinformatics ◽

10.1093/bib/bbz133 ◽

2019 ◽

Vol 21 (6) ◽

pp. 2133-2141 ◽

Cited By ~ 22

Author(s):

Chen-Chen Li ◽

Bin Liu

Keyword(s):

Neural Networks ◽

Feature Extraction ◽

Convolutional Neural Networks ◽

Fold Recognition ◽

Extraction Methods ◽

Support Vector ◽

Sequence Information ◽

Protein Fold ◽

Protein Fold Recognition ◽

Protein Folds

Abstract Protein fold recognition is one of the most critical tasks to explore the structures and functions of the proteins based on their primary sequence information. The existing protein fold recognition approaches rely on features reflecting the characteristics of protein folds. However, the feature extraction methods are still the bottleneck of the performance improvement of these methods. In this paper, we proposed two new feature extraction methods called MotifCNN and MotifDCNN to extract more discriminative fold-specific features based on structural motif kernels to construct the motif-based convolutional neural networks (CNNs). The pairwise sequence similarity scores calculated based on fold-specific features are then fed into support vector machines to construct the predictor for fold recognition, and a predictor called MotifCNN-fold has been proposed. Experimental results on the benchmark dataset showed that MotifCNN-fold obviously outperformed all the other competing methods. In particular, the fold-specific features extracted by MotifCNN and MotifDCNN are more discriminative than the fold-specific features extracted by other deep learning techniques, indicating that incorporating the structural motifs into the CNN is able to capture the characteristics of protein folds.

Download Full-text

Protein Structural Class Prediction viak-Separated Bigrams Using Position Specific Scoring Matrix

Journal of Advanced Computational Intelligence and Intelligent Informatics ◽

10.20965/jaciii.2014.p0474 ◽

2014 ◽

Vol 18 (4) ◽

pp. 474-479 ◽

Cited By ~ 8

Author(s):

Harsh Saini ◽

◽

Gaurav Raicar ◽

Alok Sharma ◽

Sunil Lal ◽

...

Keyword(s):

Amino Acids ◽

Feature Extraction ◽

Tertiary Structure ◽

Position Specific Scoring Matrix ◽

Class Prediction ◽

Structural Class ◽

Protein Structural Class ◽

Protein Functions ◽

Scoring Matrix ◽

Feature Extraction Technique

Protein structural class prediction (SCP) is as important task in identifying protein tertiary structure and protein functions. In this study, we propose a feature extraction technique to predict secondary structures. The technique utilizes bigram (of adjacent andk-separated amino acids) information derived from Position Specific Scoring Matrix (PSSM). The technique has shown promising results when evaluated on benchmarked Ding and Dubchak dataset.

Download Full-text