Predicting protein subcellular localization by approximate nearest neighbor searching

ABSTRACTProteins play a significant part in life processes such as cell growth, development, and reproduction. Exploring protein subcellular localization (SCL) is a direct way to better understand the function of proteins in cells. Studies have found that more and more proteins belong to multiple subcellular locations, and these proteins are called multi-label proteins. They not only play a key role in cell life activities, but also play an indispensable role in medicine and drug development. This article first presents a new prediction model, MpsLDA-ProSVM, to predict the SCL of multi-label proteins. Firstly, the physical and chemical information, evolution information, sequence information and annotation information of protein sequences are fused. Then, for the first time, use a weighted multi-label linear discriminant analysis framework based on entropy weight form (wMLDAe) to refine and purify features, reduce the difficulty of learning. Finally, input the optimal feature subset into the multi-label learning with label-specific features (LIFT) and multi-label k-nearest neighbor (ML-KNN) algorithms to obtain a synthetic ranking of relevant labels, and then use Prediction and Relevance Ordering based SVM (ProSVM) classifier to predict the SCLs. This method can rank and classify related tags at the same time, which greatly improves the efficiency of the model. Tested by jackknife method, the overall actual accuracy (OAA) on virus, plant, Gram-positive bacteria and Gram-negative bacteria datasets are 98.06%, 98.97%, 99.81% and 98.49%, which are 0.56%-9.16%, 5.37%-30.87%, 3.51%-6.91% and 3.99%-8.59% higher than other advanced methods respectively. The source codes and datasets are available at https://github.com/QUST-AIBBDRC/MpsLDA-ProSVM/.

Download Full-text

Approximate nearest neighbor searching in multimedia databases

Proceedings 17th International Conference on Data Engineering ◽

10.1109/icde.2001.914864 ◽

2002 ◽

Cited By ~ 40

Author(s):

H. Ferhatosmanoglu ◽

E. Tuncel ◽

D. Agrawal ◽

A. El Abbadi

Keyword(s):

Nearest Neighbor ◽

Multimedia Databases ◽

Approximate Nearest Neighbor ◽

Nearest Neighbor Searching

Download Full-text

A lower bound on the complexity of approximate nearest-neighbor searching on the Hamming cube

Proceedings of the thirty-first annual ACM symposium on Theory of computing - STOC '99 ◽

10.1145/301250.301325 ◽

1999 ◽

Cited By ~ 18

Author(s):

Amit Chakrabarti ◽

Bernard Chazelle ◽

Benjamin Gum ◽

Alexey Lvov

Keyword(s):

Lower Bound ◽

Nearest Neighbor ◽

Approximate Nearest Neighbor ◽

Nearest Neighbor Searching

Download Full-text

Approximate Nearest Neighbor Searching with Non-Euclidean and Weighted Distances

Proceedings of the Thirtieth Annual ACM-SIAM Symposium on Discrete Algorithms ◽

10.1137/1.9781611975482.23 ◽

2019 ◽

pp. 355-372 ◽

Cited By ~ 1

Author(s):

Ahmed Abdelkader ◽

Sunil Arya ◽

Guilherme D. da Fonseca ◽

David M. Mount

Keyword(s):

Nearest Neighbor ◽

Approximate Nearest Neighbor ◽

Nearest Neighbor Searching

Download Full-text

Expected-Case Complexity of Approximate Nearest Neighbor Searching

SIAM Journal on Computing ◽

10.1137/s0097539799366340 ◽

2003 ◽

Vol 32 (3) ◽

pp. 793-815 ◽

Cited By ~ 6

Author(s):

Sunil Arya ◽

Ho-Yam Addy Fu

Keyword(s):

Nearest Neighbor ◽

Approximate Nearest Neighbor ◽

Case Complexity ◽

Nearest Neighbor Searching

Download Full-text

Predicting Protein Subcellular Localization Using the Algorithm of Increment of Diversity Combined with Weighted K-Nearest Neighbor

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.765-767.3099 ◽

2013 ◽

Vol 765-767 ◽

pp. 3099-3103 ◽

Cited By ~ 1

Author(s):

Ze Yue Wu ◽

Yue Hui Chen

Keyword(s):

Subcellular Localization ◽

Cross Validation ◽

Nearest Neighbor ◽

Research Field ◽

Important Research ◽

Protein Subcellular Localization ◽

K Nearest Neighbor ◽

Prediction Ability ◽

New Approach ◽

Increment Of Diversity

Protein subcellular localization is an important research field of bioinformatics. In this paper, we use the algorithm of the increment of diversity combined with weighted K nearest neighbor to predict protein in SNL6 which has six subcelluar localizations and SNL9 which has nine subcelluar localizations. We use the increment of diversity to extract diversity finite coefficient as new features of proteins. And the basic classifier is weighted K-nearest neighbor. The prediction ability was evaluated by 5-jackknife cross-validation. Its predicted result is 83.3% for SNL6 and 87.6 % for SNL9. By comparing its results with other methods, it indicates the new approach is feasible and effective.

Download Full-text