Prediction of mitochondrial proteins of malaria parasite using improved hybrid method and reduced amino acid alphabet

Abstract By reducing amino acid alphabet, the protein complexity can be significantly simplified, which could improve computational efficiency, decrease information redundancy and reduce chance of overfitting. Although some reduced alphabets have been proposed, different classification rules could produce distinctive results for protein sequence analysis. Thus, it is urgent to construct a systematical frame for reduced alphabets. In this work, we constructed a comprehensive web server called RAACBook for protein sequence analysis and machine learning application by integrating reduction alphabets. The web server contains three parts: (i) 74 types of reduced amino acid alphabet were manually extracted to generate 673 reduced amino acid clusters (RAACs) for dealing with unique protein problems. It is easy for users to select desired RAACs from a multilayer browser tool. (ii) An online tool was developed to analyze primary sequence of protein. The tool could produce K-tuple reduced amino acid composition by defining three correlation parameters (K-tuple, g-gap, λ-correlation). The results are visualized as sequence alignment, mergence of RAA composition, feature distribution and logo of reduced sequence. (iii) The machine learning server is provided to train the model of protein classification based on K-tuple RAAC. The optimal model could be selected according to the evaluation indexes (ROC, AUC, MCC, etc.). In conclusion, RAACBook presents a powerful and user-friendly service in protein sequence analysis and computational proteomics. RAACBook can be freely available at http://bioinfor.imu.edu.cn/raacbook. Database URL: http://bioinfor.imu.edu.cn/raacbook

Download Full-text

Identification of Mitochondrial Proteins of Malaria Parasite Adding the New Parameter

Letters in Organic Chemistry ◽

10.2174/1570178615666180608100348 ◽

2019 ◽

Vol 16 (4) ◽

pp. 258-262 ◽

Cited By ~ 1

Author(s):

Feng Yonge ◽

Xie Weixia

Keyword(s):

Amino Acid ◽

Secondary Structure ◽

Drug Targets ◽

Malaria Parasite ◽

Protein Secondary Structure ◽

Amino Acid Sequences ◽

Mitochondrial Proteins ◽

Support Vector ◽

Protein Secondary Structures ◽

Similar Work

Malaria has been one of the serious infectious diseases caused by Plasmodium falciparum (P. falciparum). Mitochondrial proteins of P. falciparum are regarded as effective drug targets against malaria. Thus, it is necessary to accurately identify mitochondrial proteins of malaria parasite. Many algorithms have been proposed for the prediction of mitochondrial proteins of malaria parasite and yielded the better results. However, the parameters used by these methods were primarily based on amino acid sequences. In this study, we added a novel parameter for predicting mitochondrial proteins of malaria parasite based on protein secondary structure. Firstly, we extracted three feature parameters, namely, three kinds of protein secondary structures compositions (3PSS), 20 amino acid compositions (20AAC) and 400 dipeptide compositions (400DC), and used the analysis of variance (ANOVA) to screen 400 dipeptides. Secondly, we adopted these features to predict mitochondrial proteins of malaria parasite by using support vector machine (SVM). Finally, we found that 1) adding the feature of protein secondary structure (3PSS) can indeed improve the prediction accuracy. This result demonstrated that the parameter of protein secondary structure is a valid feature in the prediction of mitochondrial proteins of malaria parasite; 2) feature combination can improve the prediction’s results; feature selection can reduce the dimension and simplify the calculation. We achieved the sensitivity (Sn) of 98.16%, the specificity (Sp) of 97.64% and overall accuracy (Acc) of 97.88% with 0.957 of Mathew’s correlation coefficient (MCC) by using 3PSS+ 20AAC+ 34DC as a feature in 15-fold cross-validation. This result is compared with that of the similar work in the same dataset, showing the superiority of our work.

Download Full-text

Characterization and Prediction of Presynaptic and Postsynaptic Neurotoxin by Reduced Amino Acid and Biological Property.

Current Bioinformatics ◽

10.2174/1574893615999200707150512 ◽

2020 ◽

Vol 15 ◽

Author(s):

Yiyin Cao ◽

Chunlu Yu ◽

Shenghui Huang ◽

Shiyuan Wang ◽

Yongchun Zuo ◽

...

Keyword(s):

Amino Acids ◽

Support Vector Machine ◽

Amino Acid ◽

Biological Properties ◽

Statistical Test ◽

Support Vector ◽

Reduced Amino Acid Alphabet ◽

The Difference ◽

Amino Acid Alphabet

Background: Presynaptic and postsynaptic neurotoxins are two important neurotoxins. Due to the important role of presynaptic and postsynaptic neurotoxins in pharmacology and neuroscience, identification of them becomes very important in biology. Method: In this study, the statistical test and F-score were used to calculate the difference between amino acids and biological properties. The support vector machine was used to predict the presynaptic and postsynaptic neurotoxins by using the reduced amino acid alphabet types. Results: By using the reduced amino acid alphabet as the input parameters of support vector machine, the overall accuracy of our classifier had increased to 91.07%, which was the highest overall accuracy in this study. When compared with the other published methods, better predictive results were obtained by our classifier. Conclusion: In summary, we analyzed the differences between two neurotoxins in amino acids and biological properties, and constructed a classifier that could predict these two neurotoxins by using the reduced amino acid alphabet.

Download Full-text