Protein subcellular localization based on deep image features and criterion learning strategy

Author(s):  
Ran Su ◽  
Linlin He ◽  
Tianling Liu ◽  
Xiaofeng Liu ◽  
Leyi Wei

Abstract The spatial distribution of proteome at subcellular levels provides clues for protein functions, thus is important to human biology and medicine. Imaging-based methods are one of the most important approaches for predicting protein subcellular location. Although deep neural networks have shown impressive performance in a number of imaging tasks, its application to protein subcellular localization has not been sufficiently explored. In this study, we developed a deep imaging-based approach to localize the proteins at subcellular levels. Based on deep image features extracted from convolutional neural networks (CNNs), both single-label and multi-label locations can be accurately predicted. Particularly, the multi-label prediction is quite a challenging task. Here we developed a criterion learning strategy to exploit the label–attribute relevancy and label–label relevancy. A criterion that was used to determine the final label set was automatically obtained during the learning procedure. We concluded an optimal CNN architecture that could give the best results. Besides, experiments show that compared with the hand-crafted features, the deep features present more accurate prediction with less features. The implementation for the proposed method is available at https://github.com/RanSuLab/ProteinSubcellularLocation.

2020 ◽  
Vol 15 (6) ◽  
pp. 517-527
Author(s):  
Yunyun Liang ◽  
Shengli Zhang

Background: Apoptosis proteins have a key role in the development and the homeostasis of the organism, and are very important to understand the mechanism of cell proliferation and death. The function of apoptosis protein is closely related to its subcellular location. Objective: Prediction of apoptosis protein subcellular localization is a meaningful task. Methods: In this study, we predict the apoptosis protein subcellular location by using the PSSMbased second-order moving average descriptor, nonnegative matrix factorization based on Kullback-Leibler divergence and over-sampling algorithms. This model is named by SOMAPKLNMF- OS and constructed on the ZD98, ZW225 and CL317 benchmark datasets. Then, the support vector machine is adopted as the classifier, and the bias-free jackknife test method is used to evaluate the accuracy. Results: Our prediction system achieves the favorable and promising performance of the overall accuracy on the three datasets and also outperforms the other listed models. Conclusion: The results show that our model offers a high throughput tool for the identification of apoptosis protein subcellular localization.


2019 ◽  
Vol 36 (7) ◽  
pp. 2244-2250 ◽  
Author(s):  
Wei Long ◽  
Yang Yang ◽  
Hong-Bin Shen

Abstract Motivation The tissue atlas of the human protein atlas (HPA) houses immunohistochemistry (IHC) images visualizing the protein distribution from the tissue level down to the cell level, which provide an important resource to study human spatial proteome. Especially, the protein subcellular localization patterns revealed by these images are helpful for understanding protein functions, and the differential localization analysis across normal and cancer tissues lead to new cancer biomarkers. However, computational tools for processing images in this database are highly underdeveloped. The recognition of the localization patterns suffers from the variation in image quality and the difficulty in detecting microscopic targets. Results We propose a deep multi-instance multi-label model, ImPLoc, to predict the subcellular locations from IHC images. In this model, we employ a deep convolutional neural network-based feature extractor to represent image features, and design a multi-head self-attention encoder to aggregate multiple feature vectors for subsequent prediction. We construct a benchmark dataset of 1186 proteins including 7855 images from HPA and 6 subcellular locations. The experimental results show that ImPLoc achieves significant enhancement on the prediction accuracy compared with the current computational methods. We further apply ImPLoc to a test set of 889 proteins with images from both normal and cancer tissues, and obtain 8 differentially localized proteins with a significance level of 0.05. Availability and implementation https://github.com/yl2019lw/ImPloc. Supplementary information Supplementary data are available at Bioinformatics online.


2019 ◽  
Vol 20 (1) ◽  
Author(s):  
Fan Yang ◽  
Yang Liu ◽  
Yanbin Wang ◽  
Zhijian Yin ◽  
Zhen Yang

Abstract Background Protein subcellular localization plays a crucial role in understanding cell function. Proteins need to be in the right place at the right time, and combine with the corresponding molecules to fulfill their functions. Furthermore, prediction of protein subcellular location not only should be a guiding role in drug design and development due to potential molecular targets but also be an essential role in genome annotation. Taking the current status of image-based protein subcellular localization as an example, there are three common drawbacks, i.e., obsolete datasets without updating label information, stereotypical feature descriptor on spatial domain or grey level, and single-function prediction algorithm’s limited capacity of handling single-label database. Results In this paper, a novel human protein subcellular localization prediction model MIC_Locator is proposed. Firstly, the latest datasets are collected and collated as our benchmark dataset instead of obsolete data while training prediction model. Secondly, Fourier transformation, Riesz transformation, Log-Gabor filter and intensity coding strategy are employed to obtain frequency feature based on three components of monogenic signal with different frequency scales. Thirdly, a chained prediction model is proposed to handle multi-label instead of single-label datasets. The experiment results showed that the MIC_Locator can achieve 60.56% subset accuracy and outperform the existing majority of prediction models, and the frequency feature and intensity coding strategy can be conducive to improving the classification accuracy. Conclusions Our results demonstrate that the frequency feature is more beneficial for improving the performance of model compared to features extracted from spatial domain, and the MIC_Locator proposed in this paper can speed up validation of protein annotation, knowledge of protein function and proteomics research.


2020 ◽  
Vol 36 (11) ◽  
pp. 3343-3349 ◽  
Author(s):  
Manaz Kaleel ◽  
Yandan Zheng ◽  
Jialiang Chen ◽  
Xuanming Feng ◽  
Jeremy C Simpson ◽  
...  

Abstract Motivation The subcellular location of a protein can provide useful information for protein function prediction and drug design. Experimentally determining the subcellular location of a protein is an expensive and time-consuming task. Therefore, various computer-based tools have been developed, mostly using machine learning algorithms, to predict the subcellular location of proteins. Results Here, we present a neural network-based algorithm for protein subcellular location prediction. We introduce SCLpred-EMS a subcellular localization predictor powered by an ensemble of Deep N-to-1 Convolutional Neural Networks. SCLpred-EMS predicts the subcellular location of a protein into two classes, the endomembrane system and secretory pathway versus all others, with a Matthews correlation coefficient of 0.75–0.86 outperforming the other state-of-the-art web servers we tested. Availability and implementation SCLpred-EMS is freely available for academic users at http://distilldeep.ucd.ie/SCLpred2/. Contact [email protected]


2018 ◽  
Author(s):  
Ruhollah Jamali ◽  
Changiz Eslahchi ◽  
Soheil Jahangiri-Tazehkand

AbstractIdentifying a protein’s subcellular location is of great interest for understanding its function and behavior within the cell. In the last decade, many computational approaches have been proposed as a surrogate for expensive and inefficient wet-lab methods that are used for protein subcellular localization. Yet, there is still much room for improving the prediction accuracy of these methods.PSL-Recommender (Protein subcellular location recommender) is a method that employs neighborhood regularized logistic matrix factorization to build a recommender system for protein subcellular localization. The effectiveness of PSL-Recommender method is benchmarked on one human and three animals datasets. The results indicate that the PSL-Recommender significantly outperforms state-of-the-art methods, improving the previous best method up to 31% in F1 – mean, up to 28% in ACC, and up to 47% in AVG. The source of datasets and codes are available at:https://github.com/RJamali/PSL-Recommender


2021 ◽  
Author(s):  
Ruhollah Jamali ◽  
Soheil Jahangiri-Tazehkand ◽  
Changiz Eslahchi

Abstract Identifying a protein’s subcellular location is of great interest for understanding its function and behavior within the cell. In the last decade, many computational approaches have been proposed as a surrogate for expensive and labor-intensive wet-lab methods that are used for protein subcellular localization. Yet, there is still much room for improving the prediction accuracy of these methods. In this article, we meant to develop a customized computational method rather than using common machine learning predictors, which are used in the majority of computational research on this topic. The neighbourhood regularized logistic matrix factorization technique was used to create PSL-Recommender (Protein subcellular location recommender), a GO-based predictor. We declared statistical inference as the driving force behind the PSL-Recommender here. Following that, it was benchmarked against twelve well-known methods using five different datasets, demonstrating outstanding performance. Finally, we discussed potential research avenues for developing a comprehensive prediction tool for protein subcellular location prediction. The datasets and codes are available at: https://github.com/RJamali/PSL-Recommender


2021 ◽  
Vol 2021 ◽  
pp. 1-14
Author(s):  
Danyu Jin ◽  
Ping Zhu

The prediction of protein subcellular localization not only is important for the study of protein structure and function but also can facilitate the design and development of new drugs. In recent years, feature extraction methods based on protein evolution information have attracted much attention and made good progress. Based on the protein position-specific score matrix (PSSM) obtained by PSI-BLAST, PSSM-GSD method is proposed according to the data distribution characteristics. In order to reflect the protein sequence information as much as possible, AAO method, PSSM-AAO method, and PSSM-GSD method are fused together. Then, conditional entropy-based classifier chain algorithm and support vector machine are used to locate multilabel proteins. Finally, we test Gpos-mPLoc and Gneg-mPLoc datasets, considering the severe imbalance of data, and select SMOTE algorithm to expand a few sample; the experiment shows that the AAO + PSSM ∗ method in the paper achieved 83.1% and 86.8% overall accuracy, respectively. After experimental comparison of different methods, AAO + PSSM ∗ has good performance and can effectively predict protein subcellular location.


Genes ◽  
2020 ◽  
Vol 11 (12) ◽  
pp. 1475
Author(s):  
Muhammad Nabeel Asim ◽  
Muhammad Imran Malik ◽  
Christoph Zehe ◽  
Johan Trygg ◽  
Andreas Dengel ◽  
...  

MicroRNAs (miRNA) are small noncoding RNA sequences consisting of about 22 nucleotides that are involved in the regulation of almost 60% of mammalian genes. Presently, there are very limited approaches for the visualization of miRNA locations present inside cells to support the elucidation of pathways and mechanisms behind miRNA function, transport, and biogenesis. MIRLocator, a state-of-the-art tool for the prediction of subcellular localization of miRNAs makes use of a sequence-to-sequence model along with pretrained k-mer embeddings. Existing pretrained k-mer embedding generation methodologies focus on the extraction of semantics of k-mers. However, in RNA sequences, positional information of nucleotides is more important because distinct positions of the four nucleotides define the function of an RNA molecule. Considering the importance of the nucleotide position, we propose a novel approach (kmerPR2vec) which is a fusion of positional information of k-mers with randomly initialized neural k-mer embeddings. In contrast to existing k-mer-based representation, the proposed kmerPR2vec representation is much more rich in terms of semantic information and has more discriminative power. Using novel kmerPR2vec representation, we further present an end-to-end system (MirLocPredictor) which couples the discriminative power of kmerPR2vec with Convolutional Neural Networks (CNNs) for miRNA subcellular location prediction. The effectiveness of the proposed kmerPR2vec approach is evaluated with deep learning-based topologies (i.e., Convolutional Neural Networks (CNN) and Recurrent Neural Network (RNN)) and by using 9 different evaluation measures. Analysis of the results reveals that MirLocPredictor outperform state-of-the-art methods with a significant margin of 18% and 19% in terms of precision and recall.


Sign in / Sign up

Export Citation Format

Share Document