An Integrated-OFFT Model for the Prediction of Protein Secondary Structure Class

2018 ◽  
Vol 15 (1) ◽  
pp. 45-54 ◽  
Author(s):  
Bishnupriya Panda ◽  
Babita Majhi ◽  
Abhimanyu Thakur

Background: Proteins are the utmost multi-purpose macromolecules, which play a crucial function in many aspects of biological processes. For a long time, sequence arrangement of amino acid has been utilized for the prediction of protein secondary structure. Besides, in major methods for the prediction of protein secondary structure class, the impact of Gaussian noise on sequence representation of amino acids has not been considered until now; which is one of the important constraints for the functionality of a protein. </P><P> Methods: In the present research, the prediction of protein secondary structure class was accomplished by integrated application of Stockwell transformation and Amino Acid Composition (AAC), on equivalent Electron-ion Interaction Potential (EIIP) representation of raw amino acid sequence. The introduced method was evaluated by using 4 benchmark datasets of low sequence homology, namely PDB25, 498, 277, and 204. Furthermore, random forest algorithm together with the out-of-bag error estimate and Support Vector Machine (SVM), using k-fold cross validation demonstrated high feature representation potential of our reported approach. Results: The overall prediction accuracy for PDB25, 498, 277, and 204 datasets with randomforest classifier was 92.5%, 94.79%, 92.45%, and 88.04% respectively, whereas with SVM, the results were 84.66%, 95.32%, 89.29%, and 84.37% respectively. An integrated-order-function-frequency-time (OFFT) model has been proposed for the prediction of protein secondary structure class. For the first time, we reported the effect of Gaussian noise on the prediction accuracy of protein secondary structure class and proposed a robust integrated- OFFT model, which is effectively noise resistant.

2015 ◽  
Vol 13 (05) ◽  
pp. 1550022 ◽  
Author(s):  
Ming Fan ◽  
Bin Zheng ◽  
Lihua Li

Knowledge of the structural class of a given protein is important for understanding its folding patterns. Although a lot of efforts have been made, it still remains a challenging problem for prediction of protein structural class solely from protein sequences. The feature extraction and classification of proteins are the main problems in prediction. In this research, we extended our earlier work regarding these two aspects. In protein feature extraction, we proposed a scheme by calculating the word frequency and word position from sequences of amino acid, reduced amino acid, and secondary structure. For an accurate classification of the structural class of protein, we developed a novel Multi-Agent Ada-Boost (MA-Ada) method by integrating the features of Multi-Agent system into Ada-Boost algorithm. Extensive experiments were taken to test and compare the proposed method using four benchmark datasets in low homology. The results showed classification accuracies of 88.5%, 96.0%, 88.4%, and 85.5%, respectively, which are much better compared with the existing methods. The source code and dataset are available on request.


2019 ◽  
Vol 16 (4) ◽  
pp. 258-262 ◽  
Author(s):  
Feng Yonge ◽  
Xie Weixia

Malaria has been one of the serious infectious diseases caused by Plasmodium falciparum (P. falciparum). Mitochondrial proteins of P. falciparum are regarded as effective drug targets against malaria. Thus, it is necessary to accurately identify mitochondrial proteins of malaria parasite. Many algorithms have been proposed for the prediction of mitochondrial proteins of malaria parasite and yielded the better results. However, the parameters used by these methods were primarily based on amino acid sequences. In this study, we added a novel parameter for predicting mitochondrial proteins of malaria parasite based on protein secondary structure. Firstly, we extracted three feature parameters, namely, three kinds of protein secondary structures compositions (3PSS), 20 amino acid compositions (20AAC) and 400 dipeptide compositions (400DC), and used the analysis of variance (ANOVA) to screen 400 dipeptides. Secondly, we adopted these features to predict mitochondrial proteins of malaria parasite by using support vector machine (SVM). Finally, we found that 1) adding the feature of protein secondary structure (3PSS) can indeed improve the prediction accuracy. This result demonstrated that the parameter of protein secondary structure is a valid feature in the prediction of mitochondrial proteins of malaria parasite; 2) feature combination can improve the prediction’s results; feature selection can reduce the dimension and simplify the calculation. We achieved the sensitivity (Sn) of 98.16%, the specificity (Sp) of 97.64% and overall accuracy (Acc) of 97.88% with 0.957 of Mathew’s correlation coefficient (MCC) by using 3PSS+ 20AAC+ 34DC as a feature in 15-fold cross-validation. This result is compared with that of the similar work in the same dataset, showing the superiority of our work.


Author(s):  
JAYAVARDHANA GUBBI ◽  
DANIEL T. H. LAI ◽  
MARIMUTHU PALANISWAMI ◽  
MICHAEL PARKER

Knowledge of the secondary structure and solvent accessibility of a protein plays a vital role in the prediction of fold, and eventually the tertiary structure of the protein. A challenging issue of predicting protein secondary structure from sequence alone is addressed. Support vector machines (SVM) are employed for the classification and the SVM outputs are converted to posterior probabilities for multi-class classification. The effect of using Chou–Fasman parameters and physico-chemical parameters along with evolutionary information in the form of position specific scoring matrix (PSSM) is analyzed. These proposed methods are tested on the RS126 and CB513 datasets. A new dataset is curated (PSS504) using recent release of CATH. On the CB513 dataset, sevenfold cross-validation accuracy of 77.9% was obtained using the proposed encoding method. A new method of calculating the reliability index based on the number of votes and the Support Vector Machine decision value is also proposed. A blind test on the EVA dataset gives an average Q3 accuracy of 74.5% and ranks in top five protein structure prediction methods. Supplementary material including datasets are available on .


Genetics ◽  
1998 ◽  
Vol 149 (1) ◽  
pp. 445-458 ◽  
Author(s):  
Nick Goldman ◽  
Jeffrey L Thorne ◽  
David T Jones

Abstract Empirically derived models of amino acid replacement are employed to study the association between various physical features of proteins and evolution. The strengths of these associations are statistically evaluated by applying the models of protein evolution to 11 diverse sets of protein sequences. Parametric bootstrap tests indicate that the solvent accessibility status of a site has a particularly strong association with the process of amino acid replacement that it experiences. Significant association between secondary structure environment and the amino acid replacement process is also observed. Careful description of the length distribution of secondary structure elements and of the organization of secondary structure and solvent accessibility along a protein did not always significantly improve the fit of the evolutionary models to the data sets that were analyzed. As indicated by the strength of the association of both solvent accessibility and secondary structure with amino acid replacement, the process of protein evolution—both above and below the species level—will not be well understood until the physical constraints that affect protein evolution are identified and characterized.


Author(s):  
Jia-Bin Zhou ◽  
Yan-Qin Bai ◽  
Yan-Ru Guo ◽  
Hai-Xiang Lin

AbstractIn general, data contain noises which come from faulty instruments, flawed measurements or faulty communication. Learning with data in the context of classification or regression is inevitably affected by noises in the data. In order to remove or greatly reduce the impact of noises, we introduce the ideas of fuzzy membership functions and the Laplacian twin support vector machine (Lap-TSVM). A formulation of the linear intuitionistic fuzzy Laplacian twin support vector machine (IFLap-TSVM) is presented. Moreover, we extend the linear IFLap-TSVM to the nonlinear case by kernel function. The proposed IFLap-TSVM resolves the negative impact of noises and outliers by using fuzzy membership functions and is a more accurate reasonable classifier by using the geometric distribution information of labeled data and unlabeled data based on manifold regularization. Experiments with constructed artificial datasets, several UCI benchmark datasets and MNIST dataset show that the IFLap-TSVM has better classification accuracy than other state-of-the-art twin support vector machine (TSVM), intuitionistic fuzzy twin support vector machine (IFTSVM) and Lap-TSVM.


2017 ◽  
Author(s):  
Manato Akiyama ◽  
Kengo Sato ◽  
Yasubumi Sakakibara

AbstractMotivation: A popular approach for predicting RNA secondary structure is the thermodynamic nearest neighbor model that finds a thermodynamically most stable secondary structure with the minimum free energy (MFE). For further improvement, an alternative approach that is based on machine learning techniques has been developed. The machine learning based approach can employ a fine-grained model that includes much richer feature representations with the ability to fit the training data. Although a machine learning based fine-grained model achieved extremely high performance in prediction accuracy, a possibility of the risk of overfitting for such model has been reported.Results: In this paper, we propose a novel algorithm for RNA secondary structure prediction that integrates the thermodynamic approach and the machine learning based weighted approach. Ourfine-grained model combines the experimentally determined thermodynamic parameters with a large number of scoring parameters for detailed contexts of features that are trained by the structured support vector machine (SSVM) with the ℓ1 regularization to avoid overfitting. Our benchmark shows that our algorithm achieves the best prediction accuracy compared with existing methods, and heavy overfitting cannot be observed.Availability: The implementation of our algorithm is available at https://github.com/keio-bioinformatics/mxfold.Contact:[email protected]


Sign in / Sign up

Export Citation Format

Share Document