Bioinformatics Application: Predicting Protein Subcellular Localization by Applying Machine Learning

Author(s):  
Pingzhao Hu ◽  
Clement Chung ◽  
Hui Jiang ◽  
Andrew Emili
2020 ◽  
Vol 21 (7) ◽  
pp. 546-557
Author(s):  
Rahul Semwal ◽  
Pritish Kumar Varadwaj

Aims: To develop a tool that can annotate subcellular localization of human proteins. Background: With the progression of high throughput human proteomics projects, an enormous amount of protein sequence data has been discovered in the recent past. All these raw sequence data require precise mapping and annotation for their respective biological role and functional attributes. The functional characteristics of protein molecules are highly dependent on the subcellular localization/ compartment. Therefore, a fully automated and reliable protein subcellular localization prediction system would be very useful for current proteomic research. Objective: To develop a machine learning-based predictive model that can annotate the subcellular localization of human proteins with high accuracy and precision. Methods: In this study, we used the PSI-CD-HIT homology criterion and utilized the sequence-based features of protein sequences to develop a powerful subcellular localization predictive model. The dataset used to train the HumDLoc model was extracted from a reliable data source, Uniprot knowledge base, which helps the model to generalize on the unseen dataset. Result : The proposed model, HumDLoc, was compared with two of the most widely used techniques: CELLO and DeepLoc, and other machine learning-based tools. The result demonstrated promising predictive performance of HumDLoc model based on various machine learning parameters such as accuracy (≥97.00%), precision (≥0.86), recall (≥0.89), MCC score (≥0.86), ROC curve (0.98 square unit), and precision-recall curve (0.93 square unit). Conclusion: In conclusion, HumDLoc was able to outperform several alternative tools for correctly predicting subcellular localization of human proteins. The HumDLoc has been hosted as a web-based tool at https://bioserver.iiita.ac.in/HumDLoc/.


2011 ◽  
Vol 27 (16) ◽  
pp. 2224-2230 ◽  
Author(s):  
Castrense Savojardo ◽  
Piero Fariselli ◽  
Monther Alhamdoosh ◽  
Pier Luigi Martelli ◽  
Andrea Pierleoni ◽  
...  

2009 ◽  
Vol 9 (1,2) ◽  
pp. 35-44 ◽  
Author(s):  
Prabha Garg ◽  
Virag Sharma ◽  
Pradeep Chaudhari ◽  
Nilanjan Roy

2018 ◽  
Author(s):  
◽  
Ning Zhang

[ACCESS RESTRICTED TO THE UNIVERSITY OF MISSOURI AT AUTHOR'S REQUEST.] Eukaryotic cells contain diverse subcellular organelles. These organelles form distinct functional cellular compartments where different biological processes and functions are carried out. The accurate translocation of a protein is crucial to establish and maintain cellular organization and function. Newly synthesized proteins are transported to different cellular components with the assistance of protein transport machineries and complex targeting signals. Mis-localization of proteins is often associated with metabolic disorders and diseases. Compared with experimental methods, computational prediction of protein localization, utilizing different machine learning methods, provides an efficient and effective way for studying the protein subcellular localization on the whole-proteome level. Here, we present in this dissertation the bioinformatics methods for studying protein subcellular localization. We reviewed the studies of protein subcellular transport and machine learning methods in bioinformatics, presented our work on mitochondrial protein targeting prediction in plants, summarized the ongoing development of a web-resource for protein subcellular localization, and discussed the future work and development.


Author(s):  
Yu-Miao Zhang ◽  
Jun Wang ◽  
Tao Wu

In this study, the Agrobacterium infection medium, infection duration, detergent, and cell density were optimized. The sorghum-based infection medium (SbIM), 10-20 min infection time, addition of 0.01% Silwet L-77, and Agrobacterium optical density at 600 nm (OD600), improved the competence of onion epidermal cells to support Agrobacterium infection at >90% efficiency. Cyclin-dependent kinase D-2 (CDKD-2) and cytochrome c-type biogenesis protein (CYCH), protein-protein interactions were localized. The optimized procedure is a quick and efficient system for examining protein subcellular localization and protein-protein interaction.


2019 ◽  
Vol 24 (34) ◽  
pp. 4013-4022 ◽  
Author(s):  
Xiang Cheng ◽  
Xuan Xiao ◽  
Kuo-Chen Chou

Knowledge of protein subcellular localization is vitally important for both basic research and drug development. With the avalanche of protein sequences emerging in the post-genomic age, it is highly desired to develop computational tools for timely and effectively identifying their subcellular localization based on the sequence information alone. Recently, a predictor called “pLoc-mPlant” was developed for identifying the subcellular localization of plant proteins. Its performance is overwhelmingly better than that of the other predictors for the same purpose, particularly in dealing with multi-label systems in which some proteins, called “multiplex proteins”, may simultaneously occur in two or more subcellular locations. Although it is indeed a very powerful predictor, more efforts are definitely needed to further improve it. This is because pLoc-mPlant was trained by an extremely skewed dataset in which some subsets (i.e., the protein numbers for some subcellular locations) were more than 10 times larger than the others. Accordingly, it cannot avoid the biased consequence caused by such an uneven training dataset. To overcome such biased consequence, we have developed a new and bias-free predictor called pLoc_bal-mPlant by balancing the training dataset. Cross-validation tests on exactly the same experimentconfirmed dataset have indicated that the proposed new predictor is remarkably superior to pLoc-mPlant, the existing state-of-the-art predictor in identifying the subcellular localization of plant proteins. To maximize the convenience for the majority of experimental scientists, a user-friendly web-server for the new predictor has been established at http://www.jci-bioinfo.cn/pLoc_bal-mPlant/, by which users can easily get their desired results without the need to go through the detailed mathematics.


Sign in / Sign up

Export Citation Format

Share Document