Plant-mPLoc: A Top-Down Strategy to Augment the Power for Predicting Plant Protein Subcellular Localization

The accurate prediction of protein localization is a critical step in any functional genome annotation process. This paper proposes an improved strategy for protein subcellular localization prediction in plants based on multiple classifiers, to improve prediction results in terms of both accuracy and reliability. The prediction of plant protein subcellular localization is challenging because the underlying problem is not only a multiclass, but also a multilabel problem. Generally, plant proteins can be found in 10–14 locations/compartments. The number of proteins in some compartments (nucleus, cytoplasm, and mitochondria) is generally much greater than that in other compartments (vacuole, peroxisome, Golgi, and cell wall). Therefore, the problem of imbalanced data usually arises. Therefore, we propose an ensemble machine learning method based on average voting among heterogeneous classifiers. We first extracted various types of features suitable for each type of protein localization to form a total of 479 feature spaces. Then, feature selection methods were used to reduce the dimensions of the features into smaller informative feature subsets. This reduced feature subset was then used to train/build three different individual models. In the process of combining the three distinct classifier models, we used an average voting approach to combine the results of these three different classifiers that we constructed to return the final probability prediction. The method could predict subcellular localizations in both single- and multilabel locations, based on the voting probability. Experimental results indicated that the proposed ensemble method could achieve correct classification with an overall accuracy of 84.58% for 11 compartments, on the basis of the testing dataset.

Download Full-text

PlantLoc: an accurate web server for predicting plant protein subcellular localization by substantiality motif

Nucleic Acids Research ◽

10.1093/nar/gkt428 ◽

2013 ◽

Vol 41 (W1) ◽

pp. W441-W447 ◽

Cited By ~ 13

Author(s):

Shengnan Tang ◽

Tonghua Li ◽

Peisheng Cong ◽

Wenwei Xiong ◽

Zhiheng Wang ◽

...

Keyword(s):

Subcellular Localization ◽

Web Server ◽

Plant Protein ◽

Protein Subcellular Localization

Download Full-text

Inside or outside? A new collection of Gateway vectors allowing plant protein subcellular localization or over-expression

Plasmid ◽

10.1016/j.plasmid.2019.102436 ◽

2019 ◽

Vol 105 ◽

pp. 102436 ◽

Cited By ~ 2

Author(s):

François Berthold ◽

David Roujol ◽

Caroline Hemmer ◽

Elisabeth Jamet ◽

Christophe Ritzenthaler ◽

...

Keyword(s):

Subcellular Localization ◽

Plant Protein ◽

Protein Subcellular Localization ◽

Over Expression ◽

Gateway Vectors

Download Full-text

A top-down approach to enhance the power of predicting human protein subcellular localization: Hum-mPLoc 2.0

Analytical Biochemistry ◽

10.1016/j.ab.2009.07.046 ◽

2009 ◽

Vol 394 (2) ◽

pp. 269-274 ◽

Cited By ~ 117

Author(s):

Hong-Bin Shen ◽

Kuo-Chen Chou

Keyword(s):

Subcellular Localization ◽

Human Protein ◽

Protein Subcellular Localization ◽

Top Down

Download Full-text

Weighted Ensemble for Plant Protein Subcellular Localization Using Particle Swarm Optimization

2021 18th International Conference on Electrical Engineering/Electronics, Computer, Telecommunications and Information Technology (ECTI-CON) ◽

10.1109/ecti-con51831.2021.9454763 ◽

2021 ◽

Author(s):

Warin Wattanapornprom ◽

Thanagorn Glomrit ◽

Tinnaphop Prayongsup ◽

Pitchayanin Suwanthanarat ◽

Supatcha Lertampaiporn

Keyword(s):

Particle Swarm Optimization ◽

Subcellular Localization ◽

Particle Swarm ◽

Plant Protein ◽

Protein Subcellular Localization ◽

Swarm Optimization

Download Full-text

Reuse Recipe Document for: A robust fractionation method for protein subcellular localization studies in Escherichia coli

10.23942/biotechniques.1559047365000 ◽

2019 ◽

Keyword(s):

Escherichia Coli ◽

Subcellular Localization ◽

Protein Subcellular Localization ◽

Fractionation Method

Download Full-text

An efficient transient assays system using Agrobacterium-mediated transformation of onion (Allium cepa) epidermal cells

Indian Journal of Genetics and Plant Breeding (The) ◽

10.31742/ijgpb.80.3.17 ◽

2020 ◽

Vol 80 (03) ◽

Author(s):

Yu-Miao Zhang ◽

Jun Wang ◽

Tao Wu

Keyword(s):

Subcellular Localization ◽

Protein Interaction ◽

Protein Interactions ◽

Epidermal Cells ◽

Cyclin Dependent Kinase ◽

Protein Subcellular Localization ◽

Protein Protein Interactions ◽

Efficient System ◽

Protein Protein Interaction ◽

Onion Epidermal Cells

In this study, the Agrobacterium infection medium, infection duration, detergent, and cell density were optimized. The sorghum-based infection medium (SbIM), 10-20 min infection time, addition of 0.01% Silwet L-77, and Agrobacterium optical density at 600 nm (OD600), improved the competence of onion epidermal cells to support Agrobacterium infection at >90% efficiency. Cyclin-dependent kinase D-2 (CDKD-2) and cytochrome c-type biogenesis protein (CYCH), protein-protein interactions were localized. The optimized procedure is a quick and efficient system for examining protein subcellular localization and protein-protein interaction.

Download Full-text

pLoc_bal-mPlant: Predict Subcellular Localization of Plant Proteins by General PseAAC and Balancing Training Dataset

Current Pharmaceutical Design ◽

10.2174/1381612824666181119145030 ◽

2019 ◽

Vol 24 (34) ◽

pp. 4013-4022 ◽

Cited By ~ 28

Author(s):

Xiang Cheng ◽

Xuan Xiao ◽

Kuo-Chen Chou

Keyword(s):

Subcellular Localization ◽

Basic Research ◽

Training Dataset ◽

Sequence Information ◽

Plant Proteins ◽

Protein Subcellular Localization ◽

Computational Tools ◽

Validation Tests ◽

User Friendly ◽

Better Than

Knowledge of protein subcellular localization is vitally important for both basic research and drug development. With the avalanche of protein sequences emerging in the post-genomic age, it is highly desired to develop computational tools for timely and effectively identifying their subcellular localization based on the sequence information alone. Recently, a predictor called “pLoc-mPlant” was developed for identifying the subcellular localization of plant proteins. Its performance is overwhelmingly better than that of the other predictors for the same purpose, particularly in dealing with multi-label systems in which some proteins, called “multiplex proteins”, may simultaneously occur in two or more subcellular locations. Although it is indeed a very powerful predictor, more efforts are definitely needed to further improve it. This is because pLoc-mPlant was trained by an extremely skewed dataset in which some subsets (i.e., the protein numbers for some subcellular locations) were more than 10 times larger than the others. Accordingly, it cannot avoid the biased consequence caused by such an uneven training dataset. To overcome such biased consequence, we have developed a new and bias-free predictor called pLoc_bal-mPlant by balancing the training dataset. Cross-validation tests on exactly the same experimentconfirmed dataset have indicated that the proposed new predictor is remarkably superior to pLoc-mPlant, the existing state-of-the-art predictor in identifying the subcellular localization of plant proteins. To maximize the convenience for the majority of experimental scientists, a user-friendly web-server for the new predictor has been established at http://www.jci-bioinfo.cn/pLoc_bal-mPlant/, by which users can easily get their desired results without the need to go through the detailed mathematics.

Download Full-text