Predicting protein subcellular localization by approximate nearest neighbor searching

Author(s):  
Wei Xue ◽  
Xiao-yu Hong ◽  
Nan Zhao ◽  
Rong-li Yang ◽  
Liang Zhang
2020 ◽  
Author(s):  
Qi Zhang ◽  
Shan Li ◽  
Bin Yu ◽  
Yang Li ◽  
Yandan Zhang ◽  
...  

ABSTRACTProteins play a significant part in life processes such as cell growth, development, and reproduction. Exploring protein subcellular localization (SCL) is a direct way to better understand the function of proteins in cells. Studies have found that more and more proteins belong to multiple subcellular locations, and these proteins are called multi-label proteins. They not only play a key role in cell life activities, but also play an indispensable role in medicine and drug development. This article first presents a new prediction model, MpsLDA-ProSVM, to predict the SCL of multi-label proteins. Firstly, the physical and chemical information, evolution information, sequence information and annotation information of protein sequences are fused. Then, for the first time, use a weighted multi-label linear discriminant analysis framework based on entropy weight form (wMLDAe) to refine and purify features, reduce the difficulty of learning. Finally, input the optimal feature subset into the multi-label learning with label-specific features (LIFT) and multi-label k-nearest neighbor (ML-KNN) algorithms to obtain a synthetic ranking of relevant labels, and then use Prediction and Relevance Ordering based SVM (ProSVM) classifier to predict the SCLs. This method can rank and classify related tags at the same time, which greatly improves the efficiency of the model. Tested by jackknife method, the overall actual accuracy (OAA) on virus, plant, Gram-positive bacteria and Gram-negative bacteria datasets are 98.06%, 98.97%, 99.81% and 98.49%, which are 0.56%-9.16%, 5.37%-30.87%, 3.51%-6.91% and 3.99%-8.59% higher than other advanced methods respectively. The source codes and datasets are available at https://github.com/QUST-AIBBDRC/MpsLDA-ProSVM/.


2013 ◽  
Vol 765-767 ◽  
pp. 3099-3103 ◽  
Author(s):  
Ze Yue Wu ◽  
Yue Hui Chen

Protein subcellular localization is an important research field of bioinformatics. In this paper, we use the algorithm of the increment of diversity combined with weighted K nearest neighbor to predict protein in SNL6 which has six subcelluar localizations and SNL9 which has nine subcelluar localizations. We use the increment of diversity to extract diversity finite coefficient as new features of proteins. And the basic classifier is weighted K-nearest neighbor. The prediction ability was evaluated by 5-jackknife cross-validation. Its predicted result is 83.3% for SNL6 and 87.6 % for SNL9. By comparing its results with other methods, it indicates the new approach is feasible and effective.


Sign in / Sign up

Export Citation Format

Share Document