Knowledge of protein subcellular localization is vitally important for both basic research and drug
development. With the avalanche of protein sequences emerging in the post-genomic age, it is highly desired to
develop computational tools for timely and effectively identifying their subcellular localization based on the
sequence information alone. Recently, a predictor called “pLoc-mPlant” was developed for identifying the subcellular
localization of plant proteins. Its performance is overwhelmingly better than that of the other predictors
for the same purpose, particularly in dealing with multi-label systems in which some proteins, called “multiplex
proteins”, may simultaneously occur in two or more subcellular locations. Although it is indeed a very powerful
predictor, more efforts are definitely needed to further improve it. This is because pLoc-mPlant was trained by an
extremely skewed dataset in which some subsets (i.e., the protein numbers for some subcellular locations) were
more than 10 times larger than the others. Accordingly, it cannot avoid the biased consequence caused by such an
uneven training dataset. To overcome such biased consequence, we have developed a new and bias-free predictor
called pLoc_bal-mPlant by balancing the training dataset. Cross-validation tests on exactly the same experimentconfirmed
dataset have indicated that the proposed new predictor is remarkably superior to pLoc-mPlant, the
existing state-of-the-art predictor in identifying the subcellular localization of plant proteins. To maximize the
convenience for the majority of experimental scientists, a user-friendly web-server for the new predictor has been
established at http://www.jci-bioinfo.cn/pLoc_bal-mPlant/, by which users can easily get their desired results
without the need to go through the detailed mathematics.