Protein subcellular localization based on deep image features and criterion learning strategy

Abstract The spatial distribution of proteome at subcellular levels provides clues for protein functions, thus is important to human biology and medicine. Imaging-based methods are one of the most important approaches for predicting protein subcellular location. Although deep neural networks have shown impressive performance in a number of imaging tasks, its application to protein subcellular localization has not been sufficiently explored. In this study, we developed a deep imaging-based approach to localize the proteins at subcellular levels. Based on deep image features extracted from convolutional neural networks (CNNs), both single-label and multi-label locations can be accurately predicted. Particularly, the multi-label prediction is quite a challenging task. Here we developed a criterion learning strategy to exploit the label–attribute relevancy and label–label relevancy. A criterion that was used to determine the final label set was automatically obtained during the learning procedure. We concluded an optimal CNN architecture that could give the best results. Besides, experiments show that compared with the hand-crafted features, the deep features present more accurate prediction with less features. The implementation for the proposed method is available at https://github.com/RanSuLab/ProteinSubcellularLocation.

Download Full-text

Integrating Second-order Moving Average and Over-sampling Algorithm to Predict Apoptosis Protein Subcellular Localization

Current Bioinformatics ◽

10.2174/1574893614666190902155811 ◽

2020 ◽

Vol 15 (6) ◽

pp. 517-527

Author(s):

Yunyun Liang ◽

Shengli Zhang

Keyword(s):

Subcellular Localization ◽

Moving Average ◽

Subcellular Location ◽

Second Order ◽

Test Method ◽

Support Vector ◽

Protein Subcellular Localization ◽

Protein Subcellular Location ◽

Apoptosis Protein ◽

Leibler Divergence

Background: Apoptosis proteins have a key role in the development and the homeostasis of the organism, and are very important to understand the mechanism of cell proliferation and death. The function of apoptosis protein is closely related to its subcellular location. Objective: Prediction of apoptosis protein subcellular localization is a meaningful task. Methods: In this study, we predict the apoptosis protein subcellular location by using the PSSMbased second-order moving average descriptor, nonnegative matrix factorization based on Kullback-Leibler divergence and over-sampling algorithms. This model is named by SOMAPKLNMF- OS and constructed on the ZD98, ZW225 and CL317 benchmark datasets. Then, the support vector machine is adopted as the classifier, and the bias-free jackknife test method is used to evaluate the accuracy. Results: Our prediction system achieves the favorable and promising performance of the overall accuracy on the three datasets and also outperforms the other listed models. Conclusion: The results show that our model offers a high throughput tool for the identification of apoptosis protein subcellular localization.

Download Full-text

ImPLoc: a multi-instance deep learning model for the prediction of protein subcellular localization based on immunohistochemistry images

Bioinformatics ◽

10.1093/bioinformatics/btz909 ◽

2019 ◽

Vol 36 (7) ◽

pp. 2244-2250 ◽

Cited By ~ 5

Author(s):

Wei Long ◽

Yang Yang ◽

Hong-Bin Shen

Keyword(s):

Subcellular Localization ◽

Tissue Level ◽

Image Features ◽

Supplementary Information ◽

Protein Distribution ◽

Protein Subcellular Localization ◽

Significance Level ◽

Protein Functions ◽

Human Protein Atlas ◽

Cancer Tissues

Abstract Motivation The tissue atlas of the human protein atlas (HPA) houses immunohistochemistry (IHC) images visualizing the protein distribution from the tissue level down to the cell level, which provide an important resource to study human spatial proteome. Especially, the protein subcellular localization patterns revealed by these images are helpful for understanding protein functions, and the differential localization analysis across normal and cancer tissues lead to new cancer biomarkers. However, computational tools for processing images in this database are highly underdeveloped. The recognition of the localization patterns suffers from the variation in image quality and the difficulty in detecting microscopic targets. Results We propose a deep multi-instance multi-label model, ImPLoc, to predict the subcellular locations from IHC images. In this model, we employ a deep convolutional neural network-based feature extractor to represent image features, and design a multi-head self-attention encoder to aggregate multiple feature vectors for subsequent prediction. We construct a benchmark dataset of 1186 proteins including 7855 images from HPA and 6 subcellular locations. The experimental results show that ImPLoc achieves significant enhancement on the prediction accuracy compared with the current computational methods. We further apply ImPLoc to a test set of 889 proteins with images from both normal and cancer tissues, and obtain 8 differentially localized proteins with a significance level of 0.05. Availability and implementation https://github.com/yl2019lw/ImPloc. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

MIC_Locator: a novel image-based protein subcellular location multi-label prediction model based on multi-scale monogenic signal representation and intensity encoding strategy

BMC Bioinformatics ◽

10.1186/s12859-019-3136-3 ◽

2019 ◽

Vol 20 (1) ◽

Cited By ~ 1

Author(s):

Fan Yang ◽

Yang Liu ◽

Yanbin Wang ◽

Zhijian Yin ◽

Zhen Yang

Keyword(s):

Subcellular Localization ◽

Prediction Model ◽

Subcellular Location ◽

Protein Subcellular Localization ◽

Monogenic Signal ◽

Protein Subcellular Location ◽

Intensity Coding ◽

Coding Strategy ◽

The Right ◽

Frequency Feature

Abstract Background Protein subcellular localization plays a crucial role in understanding cell function. Proteins need to be in the right place at the right time, and combine with the corresponding molecules to fulfill their functions. Furthermore, prediction of protein subcellular location not only should be a guiding role in drug design and development due to potential molecular targets but also be an essential role in genome annotation. Taking the current status of image-based protein subcellular localization as an example, there are three common drawbacks, i.e., obsolete datasets without updating label information, stereotypical feature descriptor on spatial domain or grey level, and single-function prediction algorithm’s limited capacity of handling single-label database. Results In this paper, a novel human protein subcellular localization prediction model MIC_Locator is proposed. Firstly, the latest datasets are collected and collated as our benchmark dataset instead of obsolete data while training prediction model. Secondly, Fourier transformation, Riesz transformation, Log-Gabor filter and intensity coding strategy are employed to obtain frequency feature based on three components of monogenic signal with different frequency scales. Thirdly, a chained prediction model is proposed to handle multi-label instead of single-label datasets. The experiment results showed that the MIC_Locator can achieve 60.56% subset accuracy and outperform the existing majority of prediction models, and the frequency feature and intensity coding strategy can be conducive to improving the classification accuracy. Conclusions Our results demonstrate that the frequency feature is more beneficial for improving the performance of model compared to features extracted from spatial domain, and the MIC_Locator proposed in this paper can speed up validation of protein annotation, knowledge of protein function and proteomics research.

Download Full-text

SCLpred-EMS: subcellular localization prediction of endomembrane system and secretory pathway proteins by Deep N-to-1 Convolutional Neural Networks

Bioinformatics ◽

10.1093/bioinformatics/btaa156 ◽

2020 ◽

Vol 36 (11) ◽

pp. 3343-3349 ◽

Cited By ~ 2

Author(s):

Manaz Kaleel ◽

Yandan Zheng ◽

Jialiang Chen ◽

Xuanming Feng ◽

Jeremy C Simpson ◽

...

Keyword(s):

Neural Networks ◽

Subcellular Localization ◽

Convolutional Neural Networks ◽

Protein Function ◽

Secretory Pathway ◽

Protein Function Prediction ◽

Subcellular Location ◽

Machine Learning Algorithms ◽

Endomembrane System ◽

Protein Subcellular Location

Abstract Motivation The subcellular location of a protein can provide useful information for protein function prediction and drug design. Experimentally determining the subcellular location of a protein is an expensive and time-consuming task. Therefore, various computer-based tools have been developed, mostly using machine learning algorithms, to predict the subcellular location of proteins. Results Here, we present a neural network-based algorithm for protein subcellular location prediction. We introduce SCLpred-EMS a subcellular localization predictor powered by an ensemble of Deep N-to-1 Convolutional Neural Networks. SCLpred-EMS predicts the subcellular location of a protein into two classes, the endomembrane system and secretory pathway versus all others, with a Matthews correlation coefficient of 0.75–0.86 outperforming the other state-of-the-art web servers we tested. Availability and implementation SCLpred-EMS is freely available for academic users at http://distilldeep.ucd.ie/SCLpred2/. Contact [email protected]

Download Full-text

PSL-Recommender: Protein Subcellular Localization Prediction using Recommender System

10.1101/462812 ◽

2018 ◽

Cited By ~ 1

Author(s):

Ruhollah Jamali ◽

Changiz Eslahchi ◽

Soheil Jahangiri-Tazehkand

Keyword(s):

Subcellular Localization ◽

Recommender System ◽

State Of The Art ◽

Subcellular Location ◽

Protein Subcellular Localization ◽

Protein Subcellular Location ◽

Wet Lab ◽

And Behavior ◽

Protein Subcellular Localization Prediction ◽

Localization Prediction

AbstractIdentifying a protein’s subcellular location is of great interest for understanding its function and behavior within the cell. In the last decade, many computational approaches have been proposed as a surrogate for expensive and inefficient wet-lab methods that are used for protein subcellular localization. Yet, there is still much room for improving the prediction accuracy of these methods.PSL-Recommender (Protein subcellular location recommender) is a method that employs neighborhood regularized logistic matrix factorization to build a recommender system for protein subcellular localization. The effectiveness of PSL-Recommender method is benchmarked on one human and three animals datasets. The results indicate that the PSL-Recommender significantly outperforms state-of-the-art methods, improving the previous best method up to 31% in F1 – mean, up to 28% in ACC, and up to 47% in AVG. The source of datasets and codes are available at:https://github.com/RJamali/PSL-Recommender

Download Full-text

PSL-Recommender: Protein Subcellular Localization Prediction using Recommender System

10.21203/rs.3.rs-878139/v1 ◽

2021 ◽

Author(s):

Ruhollah Jamali ◽

Soheil Jahangiri-Tazehkand ◽

Changiz Eslahchi

Keyword(s):

Subcellular Localization ◽

Subcellular Location ◽

Computational Method ◽

Protein Subcellular Localization ◽

Protein Subcellular Location ◽

Wet Lab ◽

Protein Subcellular Location Prediction ◽

And Behavior ◽

Protein Subcellular Localization Prediction ◽

Localization Prediction

Abstract Identifying a protein’s subcellular location is of great interest for understanding its function and behavior within the cell. In the last decade, many computational approaches have been proposed as a surrogate for expensive and labor-intensive wet-lab methods that are used for protein subcellular localization. Yet, there is still much room for improving the prediction accuracy of these methods. In this article, we meant to develop a customized computational method rather than using common machine learning predictors, which are used in the majority of computational research on this topic. The neighbourhood regularized logistic matrix factorization technique was used to create PSL-Recommender (Protein subcellular location recommender), a GO-based predictor. We declared statistical inference as the driving force behind the PSL-Recommender here. Following that, it was benchmarked against twelve well-known methods using five different datasets, demonstrating outstanding performance. Finally, we discussed potential research avenues for developing a comprehensive prediction tool for protein subcellular location prediction. The datasets and codes are available at: https://github.com/RJamali/PSL-Recommender

Download Full-text

Protein Subcellular Localization Based on Evolutionary Information and Segmented Distribution

Mathematical Problems in Engineering ◽

10.1155/2021/8629776 ◽

2021 ◽

Vol 2021 ◽

pp. 1-14

Author(s):

Danyu Jin ◽

Ping Zhu

Keyword(s):

Subcellular Localization ◽

Conditional Entropy ◽

Subcellular Location ◽

New Drugs ◽

Experimental Comparison ◽

Evolutionary Information ◽

Support Vector ◽

Sequence Information ◽

Protein Subcellular Localization ◽

Protein Subcellular Location

The prediction of protein subcellular localization not only is important for the study of protein structure and function but also can facilitate the design and development of new drugs. In recent years, feature extraction methods based on protein evolution information have attracted much attention and made good progress. Based on the protein position-specific score matrix (PSSM) obtained by PSI-BLAST, PSSM-GSD method is proposed according to the data distribution characteristics. In order to reflect the protein sequence information as much as possible, AAO method, PSSM-AAO method, and PSSM-GSD method are fused together. Then, conditional entropy-based classifier chain algorithm and support vector machine are used to locate multilabel proteins. Finally, we test Gpos-mPLoc and Gneg-mPLoc datasets, considering the severe imbalance of data, and select SMOTE algorithm to expand a few sample; the experiment shows that the AAO + PSSM ∗ method in the paper achieved 83.1% and 86.8% overall accuracy, respectively. After experimental comparison of different methods, AAO + PSSM ∗ has good performance and can effectively predict protein subcellular location.

Download Full-text

Incorporating label correlations into deep neural networks to classify protein subcellular location patterns in immunohistochemistry images

Proteins Structure Function and Bioinformatics ◽

10.1002/prot.26244 ◽

2021 ◽

Author(s):

Jin‐Xian Hu ◽

Yang Yang ◽

Ying‐Ying Xu ◽

Hong‐Bin Shen

Keyword(s):

Neural Networks ◽

Deep Neural Networks ◽

Subcellular Location ◽

Protein Subcellular Location ◽

Location Patterns ◽

Label Correlations

Download Full-text

Protein subcellular localization of fluorescence microscopy images: Employing new statistical and Texton based image features and SVM based ensemble classification

Information Sciences ◽

10.1016/j.ins.2016.01.064 ◽

2016 ◽

Vol 345 ◽

pp. 65-80 ◽

Cited By ~ 14

Author(s):

Muhammad Tahir ◽

Asifullah Khan

Keyword(s):

Fluorescence Microscopy ◽

Subcellular Localization ◽

Image Features ◽

Ensemble Classification ◽

Protein Subcellular Localization ◽

Microscopy Images

Download Full-text

MirLocPredictor: A ConvNet-Based Multi-Label MicroRNA Subcellular Localization Predictor by Incorporating k-Mer Positional Information

Genes ◽

10.3390/genes11121475 ◽

2020 ◽

Vol 11 (12) ◽

pp. 1475

Author(s):

Muhammad Nabeel Asim ◽

Muhammad Imran Malik ◽

Christoph Zehe ◽

Johan Trygg ◽

Andreas Dengel ◽

...

Keyword(s):

Neural Networks ◽

Subcellular Localization ◽

Convolutional Neural Networks ◽

Noncoding Rna ◽

State Of The Art ◽

Subcellular Location ◽

Positional Information ◽

Nucleotide Position ◽

Discriminative Power ◽

Rna Sequences

MicroRNAs (miRNA) are small noncoding RNA sequences consisting of about 22 nucleotides that are involved in the regulation of almost 60% of mammalian genes. Presently, there are very limited approaches for the visualization of miRNA locations present inside cells to support the elucidation of pathways and mechanisms behind miRNA function, transport, and biogenesis. MIRLocator, a state-of-the-art tool for the prediction of subcellular localization of miRNAs makes use of a sequence-to-sequence model along with pretrained k-mer embeddings. Existing pretrained k-mer embedding generation methodologies focus on the extraction of semantics of k-mers. However, in RNA sequences, positional information of nucleotides is more important because distinct positions of the four nucleotides define the function of an RNA molecule. Considering the importance of the nucleotide position, we propose a novel approach (kmerPR2vec) which is a fusion of positional information of k-mers with randomly initialized neural k-mer embeddings. In contrast to existing k-mer-based representation, the proposed kmerPR2vec representation is much more rich in terms of semantic information and has more discriminative power. Using novel kmerPR2vec representation, we further present an end-to-end system (MirLocPredictor) which couples the discriminative power of kmerPR2vec with Convolutional Neural Networks (CNNs) for miRNA subcellular location prediction. The effectiveness of the proposed kmerPR2vec approach is evaluated with deep learning-based topologies (i.e., Convolutional Neural Networks (CNN) and Recurrent Neural Network (RNN)) and by using 9 different evaluation measures. Analysis of the results reveals that MirLocPredictor outperform state-of-the-art methods with a significant margin of 18% and 19% in terms of precision and recall.

Download Full-text