MIC_Locator: a novel image-based protein subcellular location multi-label prediction model based on multi-scale monogenic signal representation and intensity encoding strategy

Abstract Background Protein subcellular localization plays a crucial role in understanding cell function. Proteins need to be in the right place at the right time, and combine with the corresponding molecules to fulfill their functions. Furthermore, prediction of protein subcellular location not only should be a guiding role in drug design and development due to potential molecular targets but also be an essential role in genome annotation. Taking the current status of image-based protein subcellular localization as an example, there are three common drawbacks, i.e., obsolete datasets without updating label information, stereotypical feature descriptor on spatial domain or grey level, and single-function prediction algorithm’s limited capacity of handling single-label database. Results In this paper, a novel human protein subcellular localization prediction model MIC_Locator is proposed. Firstly, the latest datasets are collected and collated as our benchmark dataset instead of obsolete data while training prediction model. Secondly, Fourier transformation, Riesz transformation, Log-Gabor filter and intensity coding strategy are employed to obtain frequency feature based on three components of monogenic signal with different frequency scales. Thirdly, a chained prediction model is proposed to handle multi-label instead of single-label datasets. The experiment results showed that the MIC_Locator can achieve 60.56% subset accuracy and outperform the existing majority of prediction models, and the frequency feature and intensity coding strategy can be conducive to improving the classification accuracy. Conclusions Our results demonstrate that the frequency feature is more beneficial for improving the performance of model compared to features extracted from spatial domain, and the MIC_Locator proposed in this paper can speed up validation of protein annotation, knowledge of protein function and proteomics research.

Download Full-text

Integrating Second-order Moving Average and Over-sampling Algorithm to Predict Apoptosis Protein Subcellular Localization

Current Bioinformatics ◽

10.2174/1574893614666190902155811 ◽

2020 ◽

Vol 15 (6) ◽

pp. 517-527

Author(s):

Yunyun Liang ◽

Shengli Zhang

Keyword(s):

Subcellular Localization ◽

Moving Average ◽

Subcellular Location ◽

Second Order ◽

Test Method ◽

Support Vector ◽

Protein Subcellular Localization ◽

Protein Subcellular Location ◽

Apoptosis Protein ◽

Leibler Divergence

Background: Apoptosis proteins have a key role in the development and the homeostasis of the organism, and are very important to understand the mechanism of cell proliferation and death. The function of apoptosis protein is closely related to its subcellular location. Objective: Prediction of apoptosis protein subcellular localization is a meaningful task. Methods: In this study, we predict the apoptosis protein subcellular location by using the PSSMbased second-order moving average descriptor, nonnegative matrix factorization based on Kullback-Leibler divergence and over-sampling algorithms. This model is named by SOMAPKLNMF- OS and constructed on the ZD98, ZW225 and CL317 benchmark datasets. Then, the support vector machine is adopted as the classifier, and the bias-free jackknife test method is used to evaluate the accuracy. Results: Our prediction system achieves the favorable and promising performance of the overall accuracy on the three datasets and also outperforms the other listed models. Conclusion: The results show that our model offers a high throughput tool for the identification of apoptosis protein subcellular localization.

Download Full-text

Protein subcellular localization based on deep image features and criterion learning strategy

Briefings in Bioinformatics ◽

10.1093/bib/bbaa313 ◽

2020 ◽

Author(s):

Ran Su ◽

Linlin He ◽

Tianling Liu ◽

Xiaofeng Liu ◽

Leyi Wei

Keyword(s):

Neural Networks ◽

Subcellular Localization ◽

Learning Strategy ◽

Subcellular Location ◽

Image Features ◽

Protein Subcellular Localization ◽

Protein Subcellular Location ◽

Protein Functions ◽

Deep Image ◽

Criterion Learning

Abstract The spatial distribution of proteome at subcellular levels provides clues for protein functions, thus is important to human biology and medicine. Imaging-based methods are one of the most important approaches for predicting protein subcellular location. Although deep neural networks have shown impressive performance in a number of imaging tasks, its application to protein subcellular localization has not been sufficiently explored. In this study, we developed a deep imaging-based approach to localize the proteins at subcellular levels. Based on deep image features extracted from convolutional neural networks (CNNs), both single-label and multi-label locations can be accurately predicted. Particularly, the multi-label prediction is quite a challenging task. Here we developed a criterion learning strategy to exploit the label–attribute relevancy and label–label relevancy. A criterion that was used to determine the final label set was automatically obtained during the learning procedure. We concluded an optimal CNN architecture that could give the best results. Besides, experiments show that compared with the hand-crafted features, the deep features present more accurate prediction with less features. The implementation for the proposed method is available at https://github.com/RanSuLab/ProteinSubcellularLocation.

Download Full-text

PSL-Recommender: Protein Subcellular Localization Prediction using Recommender System

10.1101/462812 ◽

2018 ◽

Cited By ~ 1

Author(s):

Ruhollah Jamali ◽

Changiz Eslahchi ◽

Soheil Jahangiri-Tazehkand

Keyword(s):

Subcellular Localization ◽

Recommender System ◽

State Of The Art ◽

Subcellular Location ◽

Protein Subcellular Localization ◽

Protein Subcellular Location ◽

Wet Lab ◽

And Behavior ◽

Protein Subcellular Localization Prediction ◽

Localization Prediction

AbstractIdentifying a protein’s subcellular location is of great interest for understanding its function and behavior within the cell. In the last decade, many computational approaches have been proposed as a surrogate for expensive and inefficient wet-lab methods that are used for protein subcellular localization. Yet, there is still much room for improving the prediction accuracy of these methods.PSL-Recommender (Protein subcellular location recommender) is a method that employs neighborhood regularized logistic matrix factorization to build a recommender system for protein subcellular localization. The effectiveness of PSL-Recommender method is benchmarked on one human and three animals datasets. The results indicate that the PSL-Recommender significantly outperforms state-of-the-art methods, improving the previous best method up to 31% in F1 – mean, up to 28% in ACC, and up to 47% in AVG. The source of datasets and codes are available at:https://github.com/RJamali/PSL-Recommender

Download Full-text

PSL-Recommender: Protein Subcellular Localization Prediction using Recommender System

10.21203/rs.3.rs-878139/v1 ◽

2021 ◽

Author(s):

Ruhollah Jamali ◽

Soheil Jahangiri-Tazehkand ◽

Changiz Eslahchi

Keyword(s):

Subcellular Localization ◽

Subcellular Location ◽

Computational Method ◽

Protein Subcellular Localization ◽

Protein Subcellular Location ◽

Wet Lab ◽

Protein Subcellular Location Prediction ◽

And Behavior ◽

Protein Subcellular Localization Prediction ◽

Localization Prediction

Abstract Identifying a protein’s subcellular location is of great interest for understanding its function and behavior within the cell. In the last decade, many computational approaches have been proposed as a surrogate for expensive and labor-intensive wet-lab methods that are used for protein subcellular localization. Yet, there is still much room for improving the prediction accuracy of these methods. In this article, we meant to develop a customized computational method rather than using common machine learning predictors, which are used in the majority of computational research on this topic. The neighbourhood regularized logistic matrix factorization technique was used to create PSL-Recommender (Protein subcellular location recommender), a GO-based predictor. We declared statistical inference as the driving force behind the PSL-Recommender here. Following that, it was benchmarked against twelve well-known methods using five different datasets, demonstrating outstanding performance. Finally, we discussed potential research avenues for developing a comprehensive prediction tool for protein subcellular location prediction. The datasets and codes are available at: https://github.com/RJamali/PSL-Recommender

Download Full-text

Protein Subcellular Localization Based on Evolutionary Information and Segmented Distribution

Mathematical Problems in Engineering ◽

10.1155/2021/8629776 ◽

2021 ◽

Vol 2021 ◽

pp. 1-14

Author(s):

Danyu Jin ◽

Ping Zhu

Keyword(s):

Subcellular Localization ◽

Conditional Entropy ◽

Subcellular Location ◽

New Drugs ◽

Experimental Comparison ◽

Evolutionary Information ◽

Support Vector ◽

Sequence Information ◽

Protein Subcellular Localization ◽

Protein Subcellular Location

The prediction of protein subcellular localization not only is important for the study of protein structure and function but also can facilitate the design and development of new drugs. In recent years, feature extraction methods based on protein evolution information have attracted much attention and made good progress. Based on the protein position-specific score matrix (PSSM) obtained by PSI-BLAST, PSSM-GSD method is proposed according to the data distribution characteristics. In order to reflect the protein sequence information as much as possible, AAO method, PSSM-AAO method, and PSSM-GSD method are fused together. Then, conditional entropy-based classifier chain algorithm and support vector machine are used to locate multilabel proteins. Finally, we test Gpos-mPLoc and Gneg-mPLoc datasets, considering the severe imbalance of data, and select SMOTE algorithm to expand a few sample; the experiment shows that the AAO + PSSM ∗ method in the paper achieved 83.1% and 86.8% overall accuracy, respectively. After experimental comparison of different methods, AAO + PSSM ∗ has good performance and can effectively predict protein subcellular location.

Download Full-text

Human protein subcellular localization prediction based on error correcting output coding strategy while combining immunohistochemistry image and amino acid sequence

10.1109/aemcse51986.2021.00178 ◽

2021 ◽

Author(s):

Yanbing Wang ◽

Fan Yang ◽

Quanchao Ma ◽

Ziqian Wang ◽

Simeng Wang ◽

...

Keyword(s):

Amino Acid ◽

Amino Acid Sequence ◽

Subcellular Localization ◽

Human Protein ◽

Protein Subcellular Localization ◽

Subcellular Localization Prediction ◽

Coding Strategy ◽

Protein Subcellular Localization Prediction ◽

Localization Prediction ◽

Output Coding

Download Full-text

Advances in Predicting Subcellular Localization of Multi-label Proteins and its Implication for Developing Multi-target Drugs

Current Medicinal Chemistry ◽

10.2174/0929867326666190507082559 ◽

2019 ◽

Vol 26 (26) ◽

pp. 4918-4943 ◽

Cited By ~ 39

Author(s):

Kuo-Chen Chou

Keyword(s):

Drug Development ◽

Subcellular Localization ◽

Cell Biology ◽

Subcellular Location ◽

Cellular Level ◽

Protein Molecules ◽

A Cell ◽

Entire Cell ◽

The Right ◽

A Current

The smallest unit of life is a cell, which contains numerous protein molecules. Most of the functions critical to the cell’s survival are performed by these proteins located in its different organelles, usually called ‘‘subcellular locations”. Information of subcellular localization for a protein can provide useful clues about its function. To reveal the intricate pathways at the cellular level, knowledge of the subcellular localization of proteins in a cell is prerequisite. Therefore, one of the fundamental goals in molecular cell biology and proteomics is to determine the subcellular locations of proteins in an entire cell. It is also indispensable for prioritizing and selecting the right targets for drug development. Unfortunately, it is both timeconsuming and costly to determine the subcellular locations of proteins purely based on experiments. With the avalanche of protein sequences generated in the post-genomic age, it is highly desired to develop computational methods for rapidly and effectively identifying the subcellular locations of uncharacterized proteins based on their sequences information alone. Actually, considerable progresses have been achieved in this regard. This review is focused on those methods, which have the capacity to deal with multi-label proteins that may simultaneously exist in two or more subcellular location sites. Protein molecules with this kind of characteristic are vitally important for finding multi-target drugs, a current hot trend in drug development. Focused in this review are also those methods that have use-friendly web-servers established so that the majority of experimental scientists can use them to get the desired results without the need to go through the detailed mathematics involved.

Download Full-text

Automated classification of protein subcellular localization in immunohistochemistry images to reveal biomarkers in colon cancer

BMC Bioinformatics ◽

10.1186/s12859-020-03731-y ◽

2020 ◽

Vol 21 (1) ◽

Author(s):

Zhen-Zhen Xue ◽

Yanxia Wu ◽

Qing-Zu Gao ◽

Liang Zhao ◽

Ying-Ying Xu

Keyword(s):

Colon Cancer ◽

Subcellular Localization ◽

Subcellular Location ◽

Human Colon ◽

Automated Classification ◽

Protein Subcellular Localization ◽

Protein Biomarkers ◽

Image Patches ◽

Protein Subcellular Locations

Abstract Background Protein biomarkers play important roles in cancer diagnosis. Many efforts have been made on measuring abnormal expression intensity in biological samples to identity cancer types and stages. However, the change of subcellular location of proteins, which is also critical for understanding and detecting diseases, has been rarely studied. Results In this work, we developed a machine learning model to classify protein subcellular locations based on immunohistochemistry images of human colon tissues, and validated the ability of the model to detect subcellular location changes of biomarker proteins related to colon cancer. The model uses representative image patches as inputs, and integrates feature engineering and deep learning methods. It achieves 92.69% accuracy in classification of new proteins. Two validation datasets of colon cancer biomarkers derived from published literatures and the human protein atlas database respectively are employed. It turns out that 81.82 and 65.66% of the biomarker proteins can be identified to change locations. Conclusions Our results demonstrate that using image patches and combining predefined and deep features can improve the performance of protein subcellular localization, and our model can effectively detect biomarkers based on protein subcellular translocations. This study is anticipated to be useful in annotating unknown subcellular localization for proteins and discovering new potential location biomarkers.

Download Full-text

SCLpred-EMS: subcellular localization prediction of endomembrane system and secretory pathway proteins by Deep N-to-1 Convolutional Neural Networks

Bioinformatics ◽

10.1093/bioinformatics/btaa156 ◽

2020 ◽

Vol 36 (11) ◽

pp. 3343-3349 ◽

Cited By ~ 2

Author(s):

Manaz Kaleel ◽

Yandan Zheng ◽

Jialiang Chen ◽

Xuanming Feng ◽

Jeremy C Simpson ◽

...

Keyword(s):

Neural Networks ◽

Subcellular Localization ◽

Convolutional Neural Networks ◽

Protein Function ◽

Secretory Pathway ◽

Protein Function Prediction ◽

Subcellular Location ◽

Machine Learning Algorithms ◽

Endomembrane System ◽

Protein Subcellular Location

Abstract Motivation The subcellular location of a protein can provide useful information for protein function prediction and drug design. Experimentally determining the subcellular location of a protein is an expensive and time-consuming task. Therefore, various computer-based tools have been developed, mostly using machine learning algorithms, to predict the subcellular location of proteins. Results Here, we present a neural network-based algorithm for protein subcellular location prediction. We introduce SCLpred-EMS a subcellular localization predictor powered by an ensemble of Deep N-to-1 Convolutional Neural Networks. SCLpred-EMS predicts the subcellular location of a protein into two classes, the endomembrane system and secretory pathway versus all others, with a Matthews correlation coefficient of 0.75–0.86 outperforming the other state-of-the-art web servers we tested. Availability and implementation SCLpred-EMS is freely available for academic users at http://distilldeep.ucd.ie/SCLpred2/. Contact [email protected]

Download Full-text

PROTEIN SUBCELLULAR LOCALIZATION BASED ON PSI-BLAST AND MACHINE LEARNING

Journal of Bioinformatics and Computational Biology ◽

10.1142/s0219720006002405 ◽

2006 ◽

Vol 04 (06) ◽

pp. 1181-1195 ◽

Cited By ~ 2

Author(s):

JIAN GUO ◽

XIAN PU ◽

YUANLIE LIN ◽

HOWARD LEUNG

Keyword(s):

Amino Acid ◽

Subcellular Localization ◽

Large Scale ◽

Probabilistic Neural Network ◽

Protein Profile ◽

Subcellular Location ◽

Amino Acid Sequences ◽

Sequence Information ◽

Protein Subcellular Localization ◽

Benchmark Datasets

Subcellular location is an important functional annotation of proteins. An automatic, reliable and efficient prediction system for protein subcellular localization is necessary for large-scale genome analysis. This paper describes a protein subcellular localization method which extracts features from protein profiles rather than from amino acid sequences. The protein profile represents a protein family, discards part of the sequence information that is not conserved throughout the family and therefore is more sensitive than the amino acid sequence. The amino acid compositions of whole profile and the N-terminus of the profile are extracted, respectively, to train and test the probabilistic neural network classifiers. On two benchmark datasets, the overall accuracies of the proposed method reach 89.1% and 68.9%, respectively. The prediction results show that the proposed method perform better than those methods based on amino acid sequences. The prediction results of the proposed method are also compared with Subloc on two redundance-reduced datasets.

Download Full-text