Identification of Plasmodium secreted proteins based on monoDiKGap and distance-based Top-n-gram methods

2022 ◽  
Vol 17 ◽  
Author(s):  
Xinyi Liao ◽  
Xiaomei Gu ◽  
Dejun Peng

Background: Many malaria infections are caused by Plasmodium falciparum. Accurate classification of the proteins secreted by the malaria parasite, which are essential for the development of anti-malarial drugs, is essential. Objective: To accurately classify the proteins secreted by the malaria parasite. Methods: Therefore, in order to improve the accuracy of the prediction of plasmodium secreted proteins, we established a classification model MGAP-SGD. MonodikGap features (k=7) of the secreted proteins were extracted, and then the optimal features were selected by the AdaBoost method. Finally, based on the optimal set of secreted proteins, the model was used to predict the secreted proteins using the stochastic gradient descent (SGD) algorithm. Results: Our model uses a 10-fold cross-validation set and independent test set in the stochastic gradient descent (SGD) classifier to validate the model, and the accuracy rates are 98.5859% and 97.973%, respectively. Conclusion: This also fully proves that the effectiveness and robustness of the prediction results of the MGAP-SGD model can meet the prediction needs of the secreted proteins of plasmodium.

2020 ◽  
Vol 4 (2) ◽  
pp. 329-335
Author(s):  
Rusydi Umar ◽  
Imam Riadi ◽  
Purwono

The failure of most startups in Indonesia is caused by team performance that is not solid and competent. Programmers are an integral profession in a startup team. The development of social media can be used as a strategic tool for recruiting the best programmer candidates in a company. This strategic tool is in the form of an automatic classification system of social media posting from prospective programmers. The classification results are expected to be able to predict the performance patterns of each candidate with a predicate of good or bad performance. The classification method with the best accuracy needs to be chosen in order to get an effective strategic tool so that a comparison of several methods is needed. This study compares classification methods including the Support Vector Machines (SVM) algorithm, Random Forest (RF) and Stochastic Gradient Descent (SGD). The classification results show the percentage of accuracy with k = 10 cross validation for the SVM algorithm reaches 81.3%, RF at 74.4%, and SGD at 80.1% so that the SVM method is chosen as a model of programmer performance classification on social media activities.


Sign in / Sign up

Export Citation Format

Share Document